QALD-9-ES: A Spanish Dataset for Question Answering Systems

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Knowledge Graph Question Answering (KGQA) systems enable access to semantic information for any user who can compose a question in natural language. KGQA systems are now a core component of many industrial applications, including chatbots and conversational search applications. Although distinct worldwide cultures speak different languages, the number of languages covered by KGQA systems and its resources is mainly limited to English. To implement KGQA systems worldwide, we need to expand the current KGQA resources to languages other than English. Taking into account the recent popularity that LargeScale Language Models are receiving, we believe that providing quality resources is key to the development of future pipelines. One of these resources is the datasets used to train and test KGQA systems. Among the few multilingual KGQA datasets available, only one covers Spanish, i.e., QALD-9. We reviewed the Spanish translations in the QALD-9 dataset and confirmed several issues that may affect the KGQA system’s quality. Taking this into account, we created new Spanish translations for this dataset and reviewed them manually with the help of native speakers. This dataset provides newly created, high-quality translations for QALD-9; we call this extension QALD-9-ES. We merged these translations into the QALD-9-plus dataset, which provides trustworthy native translations for QALD-9 in nine languages, intending to create one complete source of high-quality translations. We compared the new translations with the QALD-9 original ones using Languageagnostic quantitative text analysis measures and found improvements in the results of the new translations. Finally, we compared both translations using the GERBIL QA benchmark framework using a KGQA system that supports Spanish. Although the question-answering scores only improved slightly, we believe that improving the quality of the existing translations will result in better KGQA systems and therefore increase the applicability of KGQA w.r.t. the Spanish language domain.
Original languageEnglish
Title of host publicationKnowledge Graphs: Semantics, Machine Learning, and Languages : Proceedings of the 19th International Conference on Semantic Systems, 20-22 September 2023, Leipzig, Germany
EditorsMaribel Acosta, Silvio Peroni, Sahar Vahdati, Anna Lisa Gentile, Tassilo Pellegrini, Jan-Christoph Kalo
Number of pages15
Place of PublicationAmsterdam
PublisherIOS Press BV
Publication date11.09.2023
Pages38-52
ISBN (print)978-1-64368-424-6
ISBN (electronic)978-1-64368-425-3
DOIs
Publication statusPublished - 11.09.2023
Externally publishedYes
Event19th International Conference on Semantic Systems - HYPERION Hotel Leipzig , Leipzig, Germany
Duration: 20.09.202322.09.2023
Conference number: 19
https://2023-eu.semantics.cc

Bibliographical note

This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0)

DOI

Recently viewed

Publications

  1. Entrepreneurship as a key element in advancing the psychology of competitive advantage.
  2. Communication Assumptions in Consumer Research
  3. Wasser
  4. Topologies of judgement
  5. How General is Trust in "Most People" ?
  6. Improved dam operation in the Amu Darya river basin including transboundary aspects
  7. Capability of social life cycle assessment for analyzing the artisanal small-scale gold mining sector—case study in the Amazonian rainforest in Brazil
  8. Kunsten og 11. september 2001
  9. Minor keywords of political theory
  10. The Future of Provenance: Digital Cataloguing as Reparative Practice
  11. Perceptions of nature and its non-material contributions to people at Mount Kilimanjaro
  12. Miscellaneous Articles
  13. "How Bad Was He? Let Me Count the Ways"
  14. Evidence-based policy-making?
  15. European railway deregulation
  16. Development of a Sustainability Balanced Scorecard
  17. The Challenges of Gamifying Sustainability Communication.
  18. Modeling risk contagion in the Italian zonal electricity market
  19. Normative Orientierungen
  20. Embedding
  21. Flexibility, dual labour markets, and temporary employment – Empirical evidence from German establishment data
  22. Choreographen der Gewalt
  23. Cutting Across Lines: Lil Picard and the Reorienting Effects of Collage
  24. Mit Steckwürfel und Geobrett
  25. THE POLITICS OF SMELL AND THE MORALITY OF SIGHT
  26. Chemistry of POPs in the Atmosphere
  27. Does cognitive behaviour therapy have an enduring effect that is superior to keeping patients on continuation pharmacotherapy?
  28. Economic analysis of trade-offs between justices
  29. Integration von Gender-Aspekten in gestufte Jahrgänge - eine Handreichung
  30. Updating inflation expectations
  31. Business Trips. Features, Occasions, Effects
  32. Judgement Practices in the Artistic Field
  33. Less Populist in Power Online Communication of Populist Parties in Coalition Governments
  34. Regulating Exceptions for Research and Exploratory Fishing in Southern Ocean Marine Protected Areas
  35. Institutional rearrangements in the north Luangwa ecosystem