Proxy Indicators for the Quality of Open-domain Dialogues

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

The automatic evaluation of open-domain dialogues remains a largely unsolved challenge. Thus, despite the abundance of work done in the field, human judges have to evaluate dialogues' quality. As a consequence, performing such evaluations at scale is usually expensive. This work investigates using a deep-learning model trained on the General Language Understanding Evaluation (GLUE) benchmark to serve as a quality indication of open-domain dialogues. The aim is to use the various GLUE tasks as different perspectives on judging the quality of conversation, thus reducing the need for additional training data or responses that serve as quality references. Due to this nature, the method can infer various quality metrics and derive a component-based overall score. We achieve statistically significant correlation coefficients of up to 0.7.

Original languageEnglish
Title of host publicationEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Number of pages22
PublisherAssociation for Computational Linguistics (ACL)
Publication date01.01.2021
Pages7834-7855
ISBN (electronic)9781955917094
DOIs
Publication statusPublished - 01.01.2021
Externally publishedYes
Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - ONLINE, Punta Cana, Dominican Republic
Duration: 07.11.202111.11.2021
https://2021.emnlp.org

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics

Recently viewed

Publications

  1. Requests for mathematical reasoning in textbooks for primary-level students
  2. Combining flatness based feedforward action with a fractional PI regulator to control the intake valve engine
  3. Clashing Values
  4. Entangled – But How?
  5. A highly transparent method of assessing the contribution of incentives to meet various technical challenges in distributed energy systems
  6. Short-arc measurement and fitting based on the bidirectional prediction of observed data
  7. Exploring the Uncanny-Valley-Effect in Affective Human-Robot Interaction
  8. Stressing the Relevance of Differentiating between Systematic and Random Measurement Errors in Ultrasound Muscle Thickness Diagnostics
  9. Application of design of experiments for laser shock peening process optimization
  10. I&EC 18-Small particle size magnesium in one-pot Grignard-Zerewitinoff reactions: Kinetics of and practical application to reductive dechlorination of persistent organic pollutants
  11. Statistical precipitation bias correction of gridded model data using point measurements
  12. Analytic reproducibility in articles receiving open data badges at the journal Psychological Science
  13. Optimal trajectory generation for camless internal combustion engine valve control
  14. Integrating teacher and student workspaces in a technology-enhanced mathematics lecture
  15. rSOESGOPE Method Applied to Four-Tank System Modeling
  16. Increased auditor independence by external rotation and separating audit and non audit duties?
  17. Exploding Images
  18. Effect of internal defects on tensile properties of A356 casting alloys
  19. Article 11 Formal Validity
  20. The relationship between resilience and sustainable development of ecological-economic systems
  21. Enhancing the structural diversity between forest patches — A concept and real-world experiment to study biodiversity, multifunctionality and forest resilience across spatial scales
  22. Joint Proceedings of Scholarly QALD 2023 and SemREC 2023 co-located with 22nd International Semantic Web Conference ISWC 2023
  23. Repatriation, Public Programming, and the DEAI Toolkit
  24. Sustainability and management control. Exploring and theorizing control patterns in large European firms
  25. Magnesium recycling: State-of-the-Art developments, part II
  26. Taming a Wicked Problem
  27. Wavelet characterizations for anisotropic Besov spaces
  28. Robust and Optimal Control Designed for Autonomous Surface Vessel Prototypes
  29. Cognitive load and science text comprehension
  30. Modeling of cooperative tasks in business-IT management - A proposal for a domain-specific extension of BPMN 2.0
  31. Comparison of Panel Cointegration Tests
  32. Germination performance of native and non-native Ulmus pumila populations
  33. Detecting Various Road Damage Types in Global Countries Utilizing Faster R-CNN
  34. Impacts of offshore wind farms on sediment structure and the water column during construction, and changes in bottom topography during the operation phase
  35. Self-guided internet-based and mobile-based stress management for employees