Treating dialogue quality evaluation as an anomaly detection problem

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Dialogue systems for interaction with humans have been enjoying increased popularity in the research and industry fields. To this day, the best way to estimate their success is through means of human evaluation and not automated approaches, despite the abundance of work done in the field. In this paper, we investigate the effectiveness of perceiving dialogue evaluation as an anomaly detection task. The paper looks into four dialogue modeling approaches and how their objective functions correlate with human annotation scores. A high-level perspective exhibits negative results. However, a more in-depth look shows limited potential for using anomaly detection for evaluating dialogues.

Original languageEnglish
Title of host publicationLREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
EditorsNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Number of pages5
PublisherEuropean Language Resources Association (ELRA)
Publication date2020
Pages508-512
ISBN (electronic)9791095546344
Publication statusPublished - 2020
Externally publishedYes
Event12th International Conference on Language Resources and Evaluation, LREC 2020 - Le Palais du Pharao, Marseille, France
Duration: 11.05.202016.05.2020
https://lrec2020.lrec-conf.org/en/about/organizers/index.html

Bibliographical note

Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC

Links

Recently viewed

Publications

  1. Identification of conductive fiber parameters with transcutaneous electrical nerve stimulation signal using RLS algorithm
  2. Effects Of Different Order Processing Strategies On Operating Curves Of Logistic Models
  3. Are Acute Effects of Foam-Rolling Attributed to Dynamic Warm Up Effects? A Comparative Study
  4. Robust Control of Excavation Mobile Robot with Dynamic Triangulation Vision
  5. Developing robust field survey protocols in landscape ecology
  6. Chapter 9: Particular Remedies for Non-performance: Section 1: Right to Performance
  7. The Making of MEZ - Multilingual Development:
  8. Bifactor Models for Predicting Criteria by General and Specific Factors
  9. Is the Y/F Index Suitable for Population Genetic Studies?
  10. More Evidence for Three Types of Cognitive Style
  11. Development and application of a simplified sampling method for volatile polyfluorinated alkyl substances in indoor and environmental air
  12. An empirical investigation of experiences and the link between a servicedominant logic mindset, competitive advantage, and performance of nonprofit organizations
  13. oREV: An item response theory-based open receptive vocabulary task for 3- to 8-year-old children
  14. Friedenspraxis
  15. Privacy-Preserving Localization and Social Distance Monitoring with Low-Resolution Thermal Imaging and Deep Learning
  16. Video-, Text- oder Live-Coaching?
  17. A Soft Alignment Model for Bug Deduplication
  18. The effects of an Internet based self-help course for reducing panic symptoms-Don't Panic Online
  19. Young children spontaneously recreate core properties of language in a new modality
  20. Standing Still
  21. SH-CoDE: Scholarly Hybrid Complex Question Decomposition and Execution
  22. What restricts generative rejuvenation of Calluna vulgaris in continental, dry heathland ecosystems
  23. Uncertainty, Pluralism, and the Knowledge-based Theory of the Firm
  24. Does Board Composition Influence CSR Reporting?
  25. Context, contexts and appropriateness
  26. Chapter 9: Particular Remedies for Non-performance: Section 3: Termination of Contract
  27. Ähnlichkeit mit unähnlichen Mitteln
  28. In the name of God and Christianity
  29. A focus group for operationalizing software sustainability with the MEASURE platform
  30. Digital Workplace Transformation Triggers a Shift in the HR Identity
  31. Forest Ecosystems: A functional and biodiversity perspective
  32. Delegitimisation through Evaluation: Discursive Appraisal of the National Grazing Reserve Bill in Online Media Discourse
  33. Global maps of soil temperature
  34. Prologue: Analyzing the Fine Details of Political Commitment
  35. Digital identity building: