QALD-9-ES: A Spanish Dataset for Question Answering Systems

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Knowledge Graph Question Answering (KGQA) systems enable access to semantic information for any user who can compose a question in natural language. KGQA systems are now a core component of many industrial applications, including chatbots and conversational search applications. Although distinct worldwide cultures speak different languages, the number of languages covered by KGQA systems and its resources is mainly limited to English. To implement KGQA systems worldwide, we need to expand the current KGQA resources to languages other than English. Taking into account the recent popularity that LargeScale Language Models are receiving, we believe that providing quality resources is key to the development of future pipelines. One of these resources is the datasets used to train and test KGQA systems. Among the few multilingual KGQA datasets available, only one covers Spanish, i.e., QALD-9. We reviewed the Spanish translations in the QALD-9 dataset and confirmed several issues that may affect the KGQA system’s quality. Taking this into account, we created new Spanish translations for this dataset and reviewed them manually with the help of native speakers. This dataset provides newly created, high-quality translations for QALD-9; we call this extension QALD-9-ES. We merged these translations into the QALD-9-plus dataset, which provides trustworthy native translations for QALD-9 in nine languages, intending to create one complete source of high-quality translations. We compared the new translations with the QALD-9 original ones using Languageagnostic quantitative text analysis measures and found improvements in the results of the new translations. Finally, we compared both translations using the GERBIL QA benchmark framework using a KGQA system that supports Spanish. Although the question-answering scores only improved slightly, we believe that improving the quality of the existing translations will result in better KGQA systems and therefore increase the applicability of KGQA w.r.t. the Spanish language domain.
OriginalspracheEnglisch
TitelKnowledge Graphs: Semantics, Machine Learning, and Languages : Proceedings of the 19th International Conference on Semantic Systems, 20-22 September 2023, Leipzig, Germany
HerausgeberMaribel Acosta, Silvio Peroni, Sahar Vahdati, Anna Lisa Gentile, Tassilo Pellegrini, Jan-Christoph Kalo
Anzahl der Seiten15
ErscheinungsortAmsterdam
VerlagIOS Press BV
Erscheinungsdatum11.09.2023
Seiten38-52
ISBN (Print)978-1-64368-424-6
ISBN (elektronisch)978-1-64368-425-3
DOIs
PublikationsstatusErschienen - 11.09.2023
Extern publiziertJa
Veranstaltung19th International Conference on Semantic Systems - HYPERION Hotel Leipzig , Leipzig, Deutschland
Dauer: 20.09.202322.09.2023
Konferenznummer: 19
https://2023-eu.semantics.cc

DOI

Zuletzt angesehen

Publikationen

  1. Dani Bunten Wants to Play
  2. Resilience, Entrepreneurship and ICT
  3. Microstructure and corrosion behavior of Mg-Sn-Ca alloys after extrusion
  4. Raúl Prebisch & Hans W. Singer
  5. Multi-Level Governance in Universities: Strategy, Structure, Control
  6. Sprache, Flucht, Migration, Musik
  7. Deformation-induced dynamic precipitation during creep in magnesium-tin alloys
  8. Re-Introducing Walther Schücking
  9. Foundations of Management & Entrepreneurship
  10. Generative Probleme als transzendentaler Leitfaden?
  11. Aufbruch
  12. Career adaptability development in adolescence
  13. "Technisches Leben"
  14. Cooperative Internationalization of SMEs: Self-commitment as a Success Factor for International Entrepreneurship
  15. Towards more sustainable curricula
  16. Influence of Solution Heat Treatment on the Microstructure, Hardness and Stress Corrosion Behavior of Extruded Resoloy®
  17. Multitrophic effects of experimental changes in plant diversity on cavity-nesting bees, wasps, and their parasitoids
  18. Führungsstrukturen in KMU
  19. Unterrichtsqualität an Gemeinschaftsschulen
  20. The impact of digitisation and big data analysis on the sustainable development of tourism and its environmental impact
  21. Resultant (moral) luck: Post hoc decision evaluation as dependent on belief truth, belief justification, and outcome in moral and prudential situations
  22. Correction to
  23. Unterrichtsqualität an Hamburger Grundschulen
  24. Flüssige Technokratie
  25. Design and Development of a Livestock Food Monitoring System
  26. Gamification
  27. Aspekte der Maschinenästhetik
  28. Re/working Affect - Queer Feminist Engagements
  29. Functional complementarity and specialisation
  30. On the Western Narrative of Empowerment Through ICT
  31. Justifying Theatre in Organizational Analysis
  32. Public Value and Happiness
  33. Potenziale beruflich Qualifizierter nutzen
  34. Reflecting trends in the academic landscape of sustainable energy using probabilistic topic modeling
  35. Transdisciplinary case studies as a means of sustainability learning
  36. Implicit safety-critical attitudes within safety culture
  37. Reframing Business Sustainability Decision-Making with Value-Focussed Thinking
  38. Development of recyclable Mg-based alloys
  39. The grey haired gaming generation
  40. Induced technological change in a multi-regional, multi-sectoral, integrated assessment model (WIAGEM)
  41. Advances in recovery research
  42. The influence of sustainability knowledge and attitude on sustainable intention and behaviour of Malaysian and Indonesian undergraduate students
  43. Article 27 Relationship with Other Provisions of Community Law
  44. Zur Beschäftigungsentwicklung in der Region Hannover