Proxy Indicators for the Quality of Open-domain Dialogues
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings. ed. / Marie-Francine Moens; Xuanjing Huang; Lucia Specia; Scott Wen-tau Yih. Association for Computational Linguistics (ACL), 2021. p. 7834-7855 (EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings).
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Proxy Indicators for the Quality of Open-domain Dialogues
AU - Nedelchev, Rostislav
AU - Lehmann, Jens
AU - Usbeck, Ricardo
N1 - Funding Information: Turning to Maintains Context, we see the inverse perspective. The pair-wise sentence proxy indicators applied to the dialogue context, and target response demonstrate the best ability, while the single sentence is the worst. Furthermore, the observation is partially supported by the pair-wise tasks applied to the dialogue facts. Publisher Copyright: © 2021 Association for Computational Linguistics
PY - 2021/1/1
Y1 - 2021/1/1
N2 - The automatic evaluation of open-domain dialogues remains a largely unsolved challenge. Thus, despite the abundance of work done in the field, human judges have to evaluate dialogues' quality. As a consequence, performing such evaluations at scale is usually expensive. This work investigates using a deep-learning model trained on the General Language Understanding Evaluation (GLUE) benchmark to serve as a quality indication of open-domain dialogues. The aim is to use the various GLUE tasks as different perspectives on judging the quality of conversation, thus reducing the need for additional training data or responses that serve as quality references. Due to this nature, the method can infer various quality metrics and derive a component-based overall score. We achieve statistically significant correlation coefficients of up to 0.7.
AB - The automatic evaluation of open-domain dialogues remains a largely unsolved challenge. Thus, despite the abundance of work done in the field, human judges have to evaluate dialogues' quality. As a consequence, performing such evaluations at scale is usually expensive. This work investigates using a deep-learning model trained on the General Language Understanding Evaluation (GLUE) benchmark to serve as a quality indication of open-domain dialogues. The aim is to use the various GLUE tasks as different perspectives on judging the quality of conversation, thus reducing the need for additional training data or responses that serve as quality references. Due to this nature, the method can infer various quality metrics and derive a component-based overall score. We achieve statistically significant correlation coefficients of up to 0.7.
KW - Informatics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=85127432288&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/87ca9f87-497d-31ee-9b3d-c0876c35cb07/
U2 - 10.18653/v1/2021.emnlp-main.618
DO - 10.18653/v1/2021.emnlp-main.618
M3 - Article in conference proceedings
AN - SCOPUS:85127432288
T3 - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 7834
EP - 7855
BT - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
A2 - Moens, Marie-Francine
A2 - Huang, Xuanjing
A2 - Specia, Lucia
A2 - Wen-tau Yih, Scott
PB - Association for Computational Linguistics (ACL)
T2 - 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Y2 - 7 November 2021 through 11 November 2021
ER -