QALD-9-ES: A Spanish Dataset for Question Answering Systems

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Standard

QALD-9-ES: A Spanish Dataset for Question Answering Systems. / Soruco, Javier; Collarana, Diego; Both, Andreas et al.
Knowledge Graphs: Semantics, Machine Learning, and Languages: Proceedings of the 19th International Conference on Semantic Systems, 20-22 September 2023, Leipzig, Germany. Hrsg. / Maribel Acosta; Silvio Peroni; Sahar Vahdati; Anna Lisa Gentile; Tassilo Pellegrini; Jan-Christoph Kalo. Amsterdam: IOS Press BV, 2023. S. 38-52 (Studies on the Semantic Web; Band 56).

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Harvard

Soruco, J, Collarana, D, Both, A & Usbeck, R 2023, QALD-9-ES: A Spanish Dataset for Question Answering Systems. in M Acosta, S Peroni, S Vahdati, AL Gentile, T Pellegrini & J-C Kalo (Hrsg.), Knowledge Graphs: Semantics, Machine Learning, and Languages: Proceedings of the 19th International Conference on Semantic Systems, 20-22 September 2023, Leipzig, Germany. Studies on the Semantic Web, Bd. 56, IOS Press BV, Amsterdam, S. 38-52, 19th International Conference on Semantic Systems, Leipzig, Sachsen, Deutschland, 20.09.23. https://doi.org/10.3233/SSW230004

APA

Soruco, J., Collarana, D., Both, A., & Usbeck, R. (2023). QALD-9-ES: A Spanish Dataset for Question Answering Systems. In M. Acosta, S. Peroni, S. Vahdati, A. L. Gentile, T. Pellegrini, & J.-C. Kalo (Hrsg.), Knowledge Graphs: Semantics, Machine Learning, and Languages: Proceedings of the 19th International Conference on Semantic Systems, 20-22 September 2023, Leipzig, Germany (S. 38-52). (Studies on the Semantic Web; Band 56). IOS Press BV. https://doi.org/10.3233/SSW230004

Vancouver

Soruco J, Collarana D, Both A, Usbeck R. QALD-9-ES: A Spanish Dataset for Question Answering Systems. in Acosta M, Peroni S, Vahdati S, Gentile AL, Pellegrini T, Kalo JC, Hrsg., Knowledge Graphs: Semantics, Machine Learning, and Languages: Proceedings of the 19th International Conference on Semantic Systems, 20-22 September 2023, Leipzig, Germany. Amsterdam: IOS Press BV. 2023. S. 38-52. (Studies on the Semantic Web). doi: 10.3233/SSW230004

Bibtex

@inbook{6a5e966327d946eebcb7956e68139a16,
title = "QALD-9-ES: A Spanish Dataset for Question Answering Systems",
abstract = "Knowledge Graph Question Answering (KGQA) systems enable access to semantic information for any user who can compose a question in natural language. KGQA systems are now a core component of many industrial applications, including chatbots and conversational search applications. Although distinct worldwide cultures speak different languages, the number of languages covered by KGQA systems and its resources is mainly limited to English. To implement KGQA systems worldwide, we need to expand the current KGQA resources to languages other than English. Taking into account the recent popularity that LargeScale Language Models are receiving, we believe that providing quality resources is key to the development of future pipelines. One of these resources is the datasets used to train and test KGQA systems. Among the few multilingual KGQA datasets available, only one covers Spanish, i.e., QALD-9. We reviewed the Spanish translations in the QALD-9 dataset and confirmed several issues that may affect the KGQA system{\textquoteright}s quality. Taking this into account, we created new Spanish translations for this dataset and reviewed them manually with the help of native speakers. This dataset provides newly created, high-quality translations for QALD-9; we call this extension QALD-9-ES. We merged these translations into the QALD-9-plus dataset, which provides trustworthy native translations for QALD-9 in nine languages, intending to create one complete source of high-quality translations. We compared the new translations with the QALD-9 original ones using Languageagnostic quantitative text analysis measures and found improvements in the results of the new translations. Finally, we compared both translations using the GERBIL QA benchmark framework using a KGQA system that supports Spanish. Although the question-answering scores only improved slightly, we believe that improving the quality of the existing translations will result in better KGQA systems and therefore increase the applicability of KGQA w.r.t. the Spanish language domain.",
keywords = "Business informatics, Knowledge Graphs, Informatics, Question Answering, Dataset",
author = "Javier Soruco and Diego Collarana and Andreas Both and Ricardo Usbeck",
note = "This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0); 19th International Conference on Semantic Systems ; Conference date: 20-09-2023 Through 22-09-2023",
year = "2023",
month = sep,
day = "11",
doi = "10.3233/SSW230004",
language = "English",
isbn = "978-1-64368-424-6",
series = "Studies on the Semantic Web",
publisher = "IOS Press BV",
pages = "38--52",
editor = "Maribel Acosta and Silvio Peroni and Sahar Vahdati and Gentile, {Anna Lisa} and Tassilo Pellegrini and Jan-Christoph Kalo",
booktitle = "Knowledge Graphs: Semantics, Machine Learning, and Languages",
address = "Netherlands",
url = "https://2023-eu.semantics.cc",

}

RIS

TY - CHAP

T1 - QALD-9-ES: A Spanish Dataset for Question Answering Systems

AU - Soruco, Javier

AU - Collarana, Diego

AU - Both, Andreas

AU - Usbeck, Ricardo

N1 - Conference code: 19

PY - 2023/9/11

Y1 - 2023/9/11

N2 - Knowledge Graph Question Answering (KGQA) systems enable access to semantic information for any user who can compose a question in natural language. KGQA systems are now a core component of many industrial applications, including chatbots and conversational search applications. Although distinct worldwide cultures speak different languages, the number of languages covered by KGQA systems and its resources is mainly limited to English. To implement KGQA systems worldwide, we need to expand the current KGQA resources to languages other than English. Taking into account the recent popularity that LargeScale Language Models are receiving, we believe that providing quality resources is key to the development of future pipelines. One of these resources is the datasets used to train and test KGQA systems. Among the few multilingual KGQA datasets available, only one covers Spanish, i.e., QALD-9. We reviewed the Spanish translations in the QALD-9 dataset and confirmed several issues that may affect the KGQA system’s quality. Taking this into account, we created new Spanish translations for this dataset and reviewed them manually with the help of native speakers. This dataset provides newly created, high-quality translations for QALD-9; we call this extension QALD-9-ES. We merged these translations into the QALD-9-plus dataset, which provides trustworthy native translations for QALD-9 in nine languages, intending to create one complete source of high-quality translations. We compared the new translations with the QALD-9 original ones using Languageagnostic quantitative text analysis measures and found improvements in the results of the new translations. Finally, we compared both translations using the GERBIL QA benchmark framework using a KGQA system that supports Spanish. Although the question-answering scores only improved slightly, we believe that improving the quality of the existing translations will result in better KGQA systems and therefore increase the applicability of KGQA w.r.t. the Spanish language domain.

AB - Knowledge Graph Question Answering (KGQA) systems enable access to semantic information for any user who can compose a question in natural language. KGQA systems are now a core component of many industrial applications, including chatbots and conversational search applications. Although distinct worldwide cultures speak different languages, the number of languages covered by KGQA systems and its resources is mainly limited to English. To implement KGQA systems worldwide, we need to expand the current KGQA resources to languages other than English. Taking into account the recent popularity that LargeScale Language Models are receiving, we believe that providing quality resources is key to the development of future pipelines. One of these resources is the datasets used to train and test KGQA systems. Among the few multilingual KGQA datasets available, only one covers Spanish, i.e., QALD-9. We reviewed the Spanish translations in the QALD-9 dataset and confirmed several issues that may affect the KGQA system’s quality. Taking this into account, we created new Spanish translations for this dataset and reviewed them manually with the help of native speakers. This dataset provides newly created, high-quality translations for QALD-9; we call this extension QALD-9-ES. We merged these translations into the QALD-9-plus dataset, which provides trustworthy native translations for QALD-9 in nine languages, intending to create one complete source of high-quality translations. We compared the new translations with the QALD-9 original ones using Languageagnostic quantitative text analysis measures and found improvements in the results of the new translations. Finally, we compared both translations using the GERBIL QA benchmark framework using a KGQA system that supports Spanish. Although the question-answering scores only improved slightly, we believe that improving the quality of the existing translations will result in better KGQA systems and therefore increase the applicability of KGQA w.r.t. the Spanish language domain.

KW - Business informatics

KW - Knowledge Graphs

KW - Informatics

KW - Question Answering

KW - Dataset

UR - https://www.iospress.com/catalog/books/knowledge-graphs-semantics-machine-learning-and-languages

UR - https://www.mendeley.com/catalogue/863f1283-9035-351a-bc12-b1ae046d0649/

U2 - 10.3233/SSW230004

DO - 10.3233/SSW230004

M3 - Article in conference proceedings

SN - 978-1-64368-424-6

T3 - Studies on the Semantic Web

SP - 38

EP - 52

BT - Knowledge Graphs: Semantics, Machine Learning, and Languages

A2 - Acosta, Maribel

A2 - Peroni, Silvio

A2 - Vahdati, Sahar

A2 - Gentile, Anna Lisa

A2 - Pellegrini, Tassilo

A2 - Kalo, Jan-Christoph

PB - IOS Press BV

CY - Amsterdam

T2 - 19th International Conference on Semantic Systems

Y2 - 20 September 2023 through 22 September 2023

ER -

DOI

Zuletzt angesehen

Publikationen

  1. Einführung
  2. Do We Need to Use an Accountant? The Sales Growth and Survival Benefits to Family SMEs
  3. Mathematical Prerequisites for STEM Programs
  4. Professional Service Firms, Knowledge-based Competition and the Heterarchical Organization Form
  5. Informatik und Nachhaltigkeitsmanagement
  6. Environmental Management Accounting and the Opportunity Cost of Neglecting Environmental Protection
  7. Die Problematik Granels
  8. Collaborative epistemic writing and writing-to-learn in mathematics
  9. IGLU
  10. Cultural Practices, Norms, and Values
  11. Weitblick statt Glaskugel?
  12. Ergebnisse einer Validierungsstudie zum DaZKom-Testinstrument
  13. The Maternal in Drag
  14. Towards a green and sustainable fruit waste valorisation model in Brazil
  15. New ideas for modern phytosociological monographs
  16. §43 VwGO [Feststellungsklage]
  17. Wie WIR unsere Welt globalisieren
  18. Ombuds- und Beschwerdestellen in der Kinder- und Jugendhilfe
  19. Normative Balance and Electoral Reform
  20. Zukünftiges Engagement der Studierendeninitiative Greening the University - wie es weiter geht
  21. Organizational ambidexterity and student achievement
  22. Imagined Geography
  23. Enhanced granular medium-based tube and hollow profile press hardening
  24. The Role of AI in Serious Games and Gamification for Health
  25. Abschied von gestern
  26. Bodensaure Eichen- und Eichenmischwälder Europas
  27. Overview of the Aggregate Results of the International Corporate Sustainability Barometer
  28. Emotionale Kompetenz und Emotionsarbeit in der Personalentwicklung
  29. What Kind of Veto Player Is the Italian Senate?
  30. Crowdfunding
  31. Antibiotics and sweeteners in the aquatic environment
  32. Anaerobic biodegradation of organochlorine pesticides in contaminated soil
  33. Website premia for extensive margins of international firm activities
  34. Principles for the application of life cycle sustainability assessment
  35. Die Wahrnehmung von temporaler Textkohäsion durch Schüler/-innen am Beispiel eines Sachtextes