AmQA: Amharic Question Answering Dataset

Tilahun Abedissa; Ricardo Usbeck; Yaregal Assabie

doi:10.48550/arXiv.2303.03290

AmQA: Amharic Question Answering Dataset

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung

Authors

Tilahun Abedissa
Ricardo Usbeck
Yaregal Assabie

Question Answering (QA) returns concise answers or answer lists from natural language text given a context document. Many resources go into curating QA datasets to advance robust models' development. There is a surge of QA datasets for languages like English, however, this is not true for Amharic. Amharic, the official language of Ethiopia, is the second most spoken Semitic language in the world. There is no published or publicly available Amharic QA dataset. Hence, to foster the research in Amharic QA, we present the first Amharic QA (AmQA) dataset. We crowdsourced 2628 question-answer pairs over 378 Wikipedia articles. Additionally, we run an XLMR Large-based baseline model to spark open-domain QA research interest. The best-performing baseline achieves an F-score of 69.58 and 71.74 in reader-retriever QA and reading comprehension settings respectively.

Originalsprache	Englisch
Titel	Conference XXX
Anzahl der Seiten	7
DOIs	https://doi.org/10.48550/arXiv.2303.03290
Publikationsstatus	In Vorbereitung - 06.03.2023
Extern publiziert	Ja

Fachgebiete

Informatik

Weitere Publikationen dieser Person(en)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R. & Panchenko, A., 2026, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Ichise, R. (Hrsg.). Springer Science and Business Media Deutschland, S. 95-110 16 S. (Lecture Notes in Computer Science; Band 15836 LNCS).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

Möller, C. & Usbeck, R., 2025, The Semantic Web: 22nd European Semantic Web Conference, ESWC 2025 Portoroz, Slovenia, June 1–5, 2025 Proceedings, Part I. Curry, E., Acosta, M., Poveda-Villalón, M., van Erp, M., Ojo, A., Hose, K., Shimizu, C. & Lisena, P. (Hrsg.). Cham: Springer Nature Switzerland AG, Band 1. S. 460-480 21 S. (Lecture Notes in Computer Science ; Band 15718).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

ASK-DBLP: Answering Questions over DBLP

Taffa, T., Neises, P., Ollinger, S., Westphal, P., Ackermann, M. R., Banerjee, D. & Usbeck, R., 02.11.2025, ISWC-C 2025, Industry, Doctoral Consortium, Posters and Demos at ISWC 2025: Joint Proceedings of Industry, Doctoral Consortium, Posters and Demos of the 24th International Semantic Web Conference (ISWC-C 2025), ISWC 2025 Companion Volume. Celino, I., Hassanzadeh, O., Bernstein, A., Noy, N., Cheng, G., Wang, S., Ferrada, S., Soulard, T., Kozaki, K., Takeda, H. & Gentile, A. L. (Hrsg.). Aachen: Sun Site Central Europe (RWTH Aachen University), S. 435-440 6 S. D13. (CEUR Workshop Proceedings; Band 4085).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, M. C., Banerjee, D. & Usbeck, R., 14.07.2025, Linking Meaning: Semantic Technologies Shaping the Future of AI: Cover 74617 Proceedings of the 21st International Conference on Semantic Systems, 3-5 September 2025, Vienna, Austria. Spahiu, B., Vahdati, S., Salatino, A., Pellegrini, T. & Havur, G. (Hrsg.). IOS Press BV, S. 176-193 18 S. (Studies on the Semantic Web; Band 62).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung

Best Practices in AI and Data Science Models Evaluation

Banerjee, D., Taffa, T. A. & Usbeck, R., 2025, INFORMATIK 2025 : The Wide Open - Offenheit von Source bis Science, 16.-19.September 2025 Potsdam. Lucke, U., Stieglitz, S., Uebernickel, F., Lamprecht, A.-L. & Klein, M. (Hrsg.). Bonn: Gesellschaft für Informatik, Bonn, S. 1211-1219 9 S. (Lecture Notes in Informatics; Band P366).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

DOI

https://doi.org/10.48550/arXiv.2303.03290
Eingereichtes Manuskript

AmQA: Amharic Question Answering Dataset

Authors

Fachgebiete

Weitere Publikationen dieser Person(en)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

ASK-DBLP: Answering Questions over DBLP

Automating SPARQL Query Translations between DBpedia and Wikidata

Best Practices in AI and Data Science Models Evaluation

DOI

Zuletzt angesehen

Forschende

Projekte

Publikationen