Proxy Indicators for the Quality of Open-domain Dialogues

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

The automatic evaluation of open-domain dialogues remains a largely unsolved challenge. Thus, despite the abundance of work done in the field, human judges have to evaluate dialogues' quality. As a consequence, performing such evaluations at scale is usually expensive. This work investigates using a deep-learning model trained on the General Language Understanding Evaluation (GLUE) benchmark to serve as a quality indication of open-domain dialogues. The aim is to use the various GLUE tasks as different perspectives on judging the quality of conversation, thus reducing the need for additional training data or responses that serve as quality references. Due to this nature, the method can infer various quality metrics and derive a component-based overall score. We achieve statistically significant correlation coefficients of up to 0.7.

Original languageEnglish
Title of host publicationEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Number of pages22
PublisherAssociation for Computational Linguistics (ACL)
Publication date01.01.2021
Pages7834-7855
ISBN (electronic)9781955917094
DOIs
Publication statusPublished - 01.01.2021
Externally publishedYes
Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - ONLINE, Punta Cana, Dominican Republic
Duration: 07.11.202111.11.2021
https://2021.emnlp.org

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics

Recently viewed

Publications

  1. Influence of Long-Lasting Static Stretching Intervention on Functional and Morphological Parameters in the Plantar Flexors
  2. Action Errors, Error Management, and Learning in Organizations
  3. Soil conditions modify species diversity effects on tree functional trait expression
  4. A latent state-trait analysis of current achievement motivation across different tasks of cognitive ability
  5. Efficacy of a Web-Based Intervention With Mobile Phone Support in Treating Depressive Symptoms in Adults With Type 1 and Type 2 Diabetes
  6. Use of design methods, team leaders' goal orientation, and team effectiveness: A follow-up study in software development projects
  7. Industry 4.0 more than a challenge in modeling, identification, and control for cyber-physical systems
  8. Optimal trajectory generation for camless internal combustion engine valve control
  9. Using Conjoint Analysis to Elicit Preferences for Occupational Health Services in Small and Microenterprises
  10. The Relation of Children's Performances in Spatial Tasks at Two Different Scales of Space
  11. Pushing the Envelope: Creating Public Value in the Labor Market
  12. Language and Mathematics - Key Factors influencing the Comprehension Process in reality-based Tasks
  13. Making mutual learning tangible
  14. Earnings Less Risk-Free Interest Charge (ERIC) and Stock Returns—A Value-Based Management Perspective on ERIC’s Relative and Incremental Information Content
  15. Intraindividual variability in identity centrality
  16. Errors in Organizations
  17. Serendipity as a Mechanism of Change and its Potential for Explaining Change Processes
  18. Variational pragmatics in the foreign language classroom
  19. Hybrid models for future event prediction
  20. Watershed groundwater balance estimation using streamflow recession analysis and baseflow separation
  21. Automatic three-dimensional geometry and mesh generation of periodic representative volume elements for matrix-inclusion composites
  22. Forging of Mg–3Sn–2Ca–0.4Al Alloy Assisted by Its Processing Map and Validation Through Analytical Modeling
  23. Creep behavior of AE42 based hybrid composites