Language Model Transformers as Evaluators for Open-domain Dialogues
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
COLING 2020 - 28th International Conference on Computational Linguistics: Proceedings of the Conference. ed. / Donia Scott; Nuria Bel; Chengqing Zong. Association for Computational Linguistics (ACL), 2020. p. 6797-6808 (COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference).
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Language Model Transformers as Evaluators for Open-domain Dialogues
AU - Nedelchev, Rostislav
AU - Lehmann, Jens
AU - Usbeck, Ricardo
N1 - We acknowledge the support of the EU projects Cleopatra (GA 812997) and TAILOR (GA 952215), the Federal Ministry for Economic Affairs and Energy (BMWi) project SPEAKER (FKZ 01MK20011A), the German Federal Ministry of Education and Research (BMBF) projects and excellence clusters ML2R (FKZ 01 15 18038 A/B/C), MLwin (01S18050 D/F), ScaDS.AI (01/S18026A) as well as the Fraunhofer Zukunftsstiftung project JOSEPH. Publisher Copyright: © 2020 COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference. All rights reserved.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Computer-based systems for communication with humans are a cornerstone of AI research since the 1950s. So far, the most effective way to assess the quality of the dialogues produced by these systems is to use resource-intensive manual labor instead of automated means. In this work, we investigate whether language models (LM) based on transformer neural networks can indicate the quality of a conversation. In a general sense, language models are methods that learn to predict one or more words based on an already given context. Due to their unsupervised nature, they are candidates for efficient, automatic indication of dialogue quality. We demonstrate that human evaluators have a positive correlation between the output of the language models and scores. We also provide some insights into their behavior and inner-working in a conversational context.
AB - Computer-based systems for communication with humans are a cornerstone of AI research since the 1950s. So far, the most effective way to assess the quality of the dialogues produced by these systems is to use resource-intensive manual labor instead of automated means. In this work, we investigate whether language models (LM) based on transformer neural networks can indicate the quality of a conversation. In a general sense, language models are methods that learn to predict one or more words based on an already given context. Due to their unsupervised nature, they are candidates for efficient, automatic indication of dialogue quality. We demonstrate that human evaluators have a positive correlation between the output of the language models and scores. We also provide some insights into their behavior and inner-working in a conversational context.
KW - Informatics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=85108285068&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/0f9694bb-370d-3c37-bb25-8347d9aac64a/
U2 - 10.18653/v1/2020.coling-main.599
DO - 10.18653/v1/2020.coling-main.599
M3 - Article in conference proceedings
AN - SCOPUS:85108285068
T3 - COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
SP - 6797
EP - 6808
BT - COLING 2020 - 28th International Conference on Computational Linguistics
A2 - Scott, Donia
A2 - Bel, Nuria
A2 - Zong, Chengqing
PB - Association for Computational Linguistics (ACL)
T2 - 28th International Conference on Computational Linguistics, COLING 2020
Y2 - 8 December 2020 through 13 December 2020
ER -