Proxy Indicators for the Quality of Open-domain Dialogues

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

The automatic evaluation of open-domain dialogues remains a largely unsolved challenge. Thus, despite the abundance of work done in the field, human judges have to evaluate dialogues' quality. As a consequence, performing such evaluations at scale is usually expensive. This work investigates using a deep-learning model trained on the General Language Understanding Evaluation (GLUE) benchmark to serve as a quality indication of open-domain dialogues. The aim is to use the various GLUE tasks as different perspectives on judging the quality of conversation, thus reducing the need for additional training data or responses that serve as quality references. Due to this nature, the method can infer various quality metrics and derive a component-based overall score. We achieve statistically significant correlation coefficients of up to 0.7.

Original languageEnglish
Title of host publicationEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Number of pages22
PublisherAssociation for Computational Linguistics (ACL)
Publication date01.01.2021
Pages7834-7855
ISBN (electronic)9781955917094
DOIs
Publication statusPublished - 01.01.2021
Externally publishedYes
Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - ONLINE, Punta Cana, Dominican Republic
Duration: 07.11.202111.11.2021
https://2021.emnlp.org

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics

Recently viewed

Researchers

  1. Heinz-Roland Möhle

Publications

  1. German Utilities and Distributed PV
  2. Development and evaluation of Open Educational Resources to improve teacher's knowledge on spatial abilities
  3. Nonautonomous control of stable and unstable manifolds in two-dimensional flows
  4. Assessing mire-specific biodiversity with an indicator based approach
  5. Experimentally validated multi-step simulation strategy to predict the fatigue crack propagation rate in residual stress fields after laser shock peening
  6. Reframing the technosphere
  7. Variational pragmatics in the foreign language classroom
  8. Development and Validation of the Short Form of the Later Life Workplace Index (LLWI-SF)
  9. Indicators for relational values of nature’s contributions to good quality of life
  10. Mapping water ecosystem services: Evaluating InVEST model predictions in data scarce regions
  11. A transdisciplinary evaluation framework for the assessment of integration in boundary-crossing collaborations in teacher education
  12. Introduction
  13. Responsible Research is also concerned with generalizability
  14. Environmental heterogeneity modulates the effect of plant diversity on the spatial variability of grassland biomass
  15. The micro-processes during repatriate knowledge transfer
  16. Amplifying actions for food system transformation: insights from the Stockholm region
  17. Resettlement as a temporal border
  18. Pathways to Implementation: Evidence on How Participation in Environmental Governance Impacts on Environmental Outcomes
  19. Overview of a Proposed Ecological Risk Assessment Process for Honey bees (Apis mellifera) and Non‐Apis Bees
  20. Velocity-free friction compensation for motion systems with actuator constraint
  21. The self-sabotage of conservation
  22. Effects of samarium content on microstructure and mechanical properties of Mg–0.5Zn–0.5Zr alloy
  23. To help or not to help an outgroup member
  24. Team Ambidexterity and its Prerequisites: An Exploratory Study of an IT Service Management Team
  25. Carbon fluxes within tree-crop-grass agroforestry system
  26. Bats in a Farming Landscape Benefit from Linear Remnants and Unimproved Pastures
  27. User Participation in the Quality Assurance of Requirements Specifications
  28. CSR and tax avoidance: A review of empirical research
  29. From visual projections to visionary locations
  30. Earnings less risk-free interest charge (ERIC) and stock returns: ERIC’s relative and incremental information content in a European sample
  31. High with low
  32. Interpersonal Physiological Synchrony Predicts Group Cohesion
  33. Inklusion – aber wie?
  34. An interpretive perspective on co-production in supporting refugee families’ access to childcare in Germany
  35. Interactive priming effect of labile carbon and crop residues on SOM depends on residue decomposition stage
  36. Alternating forms of lock-in: Publishing digital news in the path of a free content culture.
  37. Editorial overview
  38. What is the ‘problem’ of gender inequality represented to be in the Swedish forest sector?
  39. Über Franz Hessel
  40. Rapid upwards spread of non-native plants in mountains across continents
  41. Analysis of Kinetic Dynamics of the Multipole Resonance Probe