QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Standard

QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers. / Perevalov, Aleksandr; Diefenbach, Dennis; Usbeck, Ricardo et al.
Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022. Institute of Electrical and Electronics Engineers Inc., 2022. S. 229-234 (Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022).

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Harvard

Perevalov, A, Diefenbach, D, Usbeck, R & Both, A 2022, QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers. in Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022. Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022, Institute of Electrical and Electronics Engineers Inc., S. 229-234, 16th IEEE International Conference on Semantic Computing, ICSC 2022, Virtual, Online, USA / Vereinigte Staaten, 26.01.22. https://doi.org/10.48550/arxiv.2202.00120, https://doi.org/10.1109/ICSC52841.2022.00045

APA

Perevalov, A., Diefenbach, D., Usbeck, R., & Both, A. (2022). QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers. In Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022 (S. 229-234). (Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.48550/arxiv.2202.00120, https://doi.org/10.1109/ICSC52841.2022.00045

Vancouver

Perevalov A, Diefenbach D, Usbeck R, Both A. QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers. in Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022. Institute of Electrical and Electronics Engineers Inc. 2022. S. 229-234. (Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022). doi: 10.48550/arxiv.2202.00120, 10.1109/ICSC52841.2022.00045

Bibtex

@inbook{e7c01674f3c641618070a759f46d8b46,
title = "QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers",
abstract = "The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems, we identified several ongoing challenges. One of them is the lack of multilingual KGQA benchmarks. In this work, we extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages provided by native speakers, and transferring the SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and relevance of the dataset is strongly increased. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before. The latter two of the languages are considered as 'endangered' by UNESCO. We call the extended dataset QALD-9-plus and made it available online11Figshare: https://doi.org/10.6084/m9.figshare.16864273. GitHub: https://github.com/Perevalov/qald-9-plus.",
keywords = "multilingual question answering, question answering dataset, question answering over knowledge graphs, Informatics, Business informatics",
author = "Aleksandr Perevalov and Dennis Diefenbach and Ricardo Usbeck and Andreas Both",
note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 16th IEEE International Conference on Semantic Computing, ICSC 2022 ; Conference date: 26-01-2022 Through 28-01-2022",
year = "2022",
doi = "10.48550/arxiv.2202.00120",
language = "English",
isbn = "978-1-6654-3419-5",
series = "Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "229--234",
booktitle = "Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022",
address = "United States",
url = "http://pa.icar.cnr.it/scsn22/",

}

RIS

TY - CHAP

T1 - QALD-9-plus

T2 - 16th IEEE International Conference on Semantic Computing, ICSC 2022

AU - Perevalov, Aleksandr

AU - Diefenbach, Dennis

AU - Usbeck, Ricardo

AU - Both, Andreas

N1 - Publisher Copyright: © 2022 IEEE.

PY - 2022

Y1 - 2022

N2 - The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems, we identified several ongoing challenges. One of them is the lack of multilingual KGQA benchmarks. In this work, we extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages provided by native speakers, and transferring the SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and relevance of the dataset is strongly increased. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before. The latter two of the languages are considered as 'endangered' by UNESCO. We call the extended dataset QALD-9-plus and made it available online11Figshare: https://doi.org/10.6084/m9.figshare.16864273. GitHub: https://github.com/Perevalov/qald-9-plus.

AB - The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems, we identified several ongoing challenges. One of them is the lack of multilingual KGQA benchmarks. In this work, we extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages provided by native speakers, and transferring the SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and relevance of the dataset is strongly increased. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before. The latter two of the languages are considered as 'endangered' by UNESCO. We call the extended dataset QALD-9-plus and made it available online11Figshare: https://doi.org/10.6084/m9.figshare.16864273. GitHub: https://github.com/Perevalov/qald-9-plus.

KW - multilingual question answering

KW - question answering dataset

KW - question answering over knowledge graphs

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85127617150&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/8126760b-6314-36b1-9f9c-c20003d56bba/

U2 - 10.48550/arxiv.2202.00120

DO - 10.48550/arxiv.2202.00120

M3 - Article in conference proceedings

AN - SCOPUS:85127617150

SN - 978-1-6654-3419-5

T3 - Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022

SP - 229

EP - 234

BT - Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 26 January 2022 through 28 January 2022

ER -

DOI