QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Standard
Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022. Institute of Electrical and Electronics Engineers Inc., 2022. S. 229-234 (Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022).
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - QALD-9-plus
T2 - 16th IEEE International Conference on Semantic Computing, ICSC 2022
AU - Perevalov, Aleksandr
AU - Diefenbach, Dennis
AU - Usbeck, Ricardo
AU - Both, Andreas
N1 - Publisher Copyright: © 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems, we identified several ongoing challenges. One of them is the lack of multilingual KGQA benchmarks. In this work, we extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages provided by native speakers, and transferring the SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and relevance of the dataset is strongly increased. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before. The latter two of the languages are considered as 'endangered' by UNESCO. We call the extended dataset QALD-9-plus and made it available online11Figshare: https://doi.org/10.6084/m9.figshare.16864273. GitHub: https://github.com/Perevalov/qald-9-plus.
AB - The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems, we identified several ongoing challenges. One of them is the lack of multilingual KGQA benchmarks. In this work, we extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages provided by native speakers, and transferring the SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and relevance of the dataset is strongly increased. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before. The latter two of the languages are considered as 'endangered' by UNESCO. We call the extended dataset QALD-9-plus and made it available online11Figshare: https://doi.org/10.6084/m9.figshare.16864273. GitHub: https://github.com/Perevalov/qald-9-plus.
KW - multilingual question answering
KW - question answering dataset
KW - question answering over knowledge graphs
KW - Informatics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=85127617150&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/8126760b-6314-36b1-9f9c-c20003d56bba/
U2 - 10.48550/arxiv.2202.00120
DO - 10.48550/arxiv.2202.00120
M3 - Article in conference proceedings
AN - SCOPUS:85127617150
SN - 978-1-6654-3419-5
T3 - Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022
SP - 229
EP - 234
BT - Proceedings - 16th IEEE International Conference on Semantic Computing, ICSC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 26 January 2022 through 28 January 2022
ER -