Low Resource Question Answering: An Amharic Benchmarking Dataset

Tilahun Abedissa Taffa; Yaregal Assabie; Ricardo Usbeck

Low Resource Question Answering: An Amharic Benchmarking Dataset

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Authors

Professorship for Information Systems, in particular Artificial Intelligence and Explainability

Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

Original language	English
Title of host publication	5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings
Editors	Rooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Number of pages	9
Publisher	European Language Resources Association (ELRA)
Publication date	2024
Pages	124-132
ISBN (electronic)	9782493814401
Publication status	Published - 2024
Event	5th Workshop on Resources for African Indigenous Languages, RAIL 2024 - Torino, Italy Duration: 25.05.2024 → …

Bibliographical note

Publisher Copyright:
© 2024 ELRA Language Resource Association.

Research areas

Amh-QuAD, Amharic Question Answering Dataset, Amharic Reading Comprehension, Low Resource Question Answering
Informatics

Other publications by the same author(s)

DBLP-QuAD: A Question Answering Dataset over the DBLP Scholarly Knowledge Graph

Banerjee, D., Awale, S., Usbeck, R. & Biemann, C., 17.01.2024, BIR 2023 - Bibliometric-enhanced Information Retrieval: Proceedings of the 13th International Workshop on Bibliometric-enhanced Information Retrieval co-located with 45th European Conference on Information Retrieval (ECIR 2023). Frommholz, I., Mayr, P., Cabanac, G., Verberne, S. & Brennan, J. (eds.). Aachen: Sun Site Central Europe (RWTH Aachen University), 15 p. 5. (CUER Workshop Proceedings; vol. 3617).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Event Extraction Alone Is Not Enough

Huang, J., Jiang, L., Möller, C. & Usbeck, R., 05.2024, Narrative Extraction From Texts 2024: Proceedings of Text2Story — Seventh Workshop on Narrative Extraction From Texts held in conjunction with the 46th European Conference on Information Retrieval (ECIR 2024). Campos, R., Jorge, A. M., Jatowt, A., Bhatia, S. & Litvak, M. (eds.). Aachen: Rheinisch-Westfaelische Technische Hochschule Aachen, Vol. 3671. p. 105-114 10 p. (CEUR Workshop Proceedings; vol. 3671).

Research output: Contributions to collected editions/works › Conference contribution › peer-review

Master of Disaster: A Disaster-Related Event Monitoring System From News Streams

Huang, J. & Usbeck, R., 2024

Research output: other publications › Other › Research

Proceedings of the Third International Workshop on Linked Data-driven Resilience Research (D2R2'24) co-located with European Semantic Web Conference 2024 (ESWC 2024), May 27, 2024

Holze, J. (ed.), Tramp, S. (ed.), Martin, M. (ed.), Auer, S. (ed.), Usbeck, R. (ed.) & Krdzavac, N. (ed.), 2024, CEUR-WS.org. (CEUR Workshop Proceedings)

Research output: Books and anthologies › Conference proceedings › Research

Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

Giglou, H. B., Taffa, T. A., Abdullah, R., Usmanova, A., Usbeck, R., D'Souza, J. & Auer, S., 11.06.2024

Research output: other publications › Other › Research