Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset

Tilahun Abedissa Taffa; Yaregal Assabie; Ricardo Usbeck

Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Standard

Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. / Taffa, Tilahun Abedissa; Assabie, Yaregal; Usbeck, Ricardo.
The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings. ed. / Rooweither Mabuya; Muzi Matfunjwa; Mmasibidi Setaka; Menno van Zaanen. Paris: European Language Resources Association (ELRA), 2024. p. 124-132 (LREC proceedings), ( International conference on computational linguistics).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Harvard

Taffa, TA, Assabie, Y & Usbeck, R 2024, Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. in R Mabuya, M Matfunjwa, M Setaka & M van Zaanen (eds), The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings. LREC proceedings, International conference on computational linguistics, European Language Resources Association (ELRA), Paris, pp. 124-132, 5th Workshop on Resources for African Indigenous Languages - RAIL 2024, Torino, Italy, 25.05.24. <https://aclanthology.org/2024.rail-1.14.pdf>

APA

Taffa, T. A., Assabie, Y., & Usbeck, R. (2024). Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. In R. Mabuya, M. Matfunjwa, M. Setaka, & M. van Zaanen (Eds.), The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings (pp. 124-132). (LREC proceedings), ( International conference on computational linguistics). European Language Resources Association (ELRA). https://aclanthology.org/2024.rail-1.14.pdf

Vancouver

Taffa TA, Assabie Y, Usbeck R. Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. In Mabuya R, Matfunjwa M, Setaka M, van Zaanen M, editors, The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings. Paris: European Language Resources Association (ELRA). 2024. p. 124-132. (LREC proceedings). ( International conference on computational linguistics).

Bibtex

@inbook{ad146cf0d42c4ce1bb5ad3852a1b82a1,

title = "Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset",

abstract = "Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.",

keywords = "Amh-QuAD, Amharic Question Answering Dataset, Amharic Reading Comprehension, Low Resource Question Answering, Informatics",

author = "Taffa, {Tilahun Abedissa} and Yaregal Assabie and Ricardo Usbeck",

note = "Publisher Copyright: {\textcopyright} 2024 ELRA Language Resource Association.; 5th Workshop on Resources for African Indigenous Languages - RAIL 2024, RAIL 2024 ; Conference date: 25-05-2024 Through 25-05-2024",

year = "2024",

language = "English",

isbn = "9782493814401",

series = "LREC proceedings",

publisher = "European Language Resources Association (ELRA)",

pages = "124--132",

editor = "Rooweither Mabuya and Muzi Matfunjwa and Mmasibidi Setaka and {van Zaanen}, Menno",

booktitle = "The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL)",

address = "Luxembourg",

url = "https://bit.ly/rail2024",

}

RIS

TY - CHAP

T1 - Low Resource Question Answering: An Amharic Benchmarking Dataset

T2 - 5th Workshop on Resources for African Indigenous Languages - RAIL 2024

AU - Taffa, Tilahun Abedissa

AU - Assabie, Yaregal

AU - Usbeck, Ricardo

N1 - Conference code: 5

PY - 2024

Y1 - 2024

N2 - Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

AB - Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

KW - Amh-QuAD

KW - Amharic Question Answering Dataset

KW - Amharic Reading Comprehension

KW - Low Resource Question Answering

KW - Informatics

UR - http://www.scopus.com/inward/record.url?scp=85195211713&partnerID=8YFLogxK

UR - https://aclanthology.org/2024.rail-1.0.pdf

M3 - Article in conference proceedings

AN - SCOPUS:85195211713

SN - 9782493814401

T3 - LREC proceedings

SP - 124

EP - 132

BT - The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL)

A2 - Mabuya, Rooweither

A2 - Matfunjwa, Muzi

A2 - Setaka, Mmasibidi

A2 - van Zaanen, Menno

PB - European Language Resources Association (ELRA)

CY - Paris

Y2 - 25 May 2024 through 25 May 2024

ER -

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R. & Panchenko, A., 2026, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Ichise, R. (ed.). Springer Science and Business Media Deutschland, p. 95-110 16 p. (Lecture Notes in Computer Science; vol. 15836 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

Möller, C. & Usbeck, R., 2025, The Semantic Web: 22nd European Semantic Web Conference, ESWC 2025 Portoroz, Slovenia, June 1–5, 2025 Proceedings, Part I. Curry, E., Acosta, M., Poveda-Villalón, M., van Erp, M., Ojo, A., Hose, K., Shimizu, C. & Lisena, P. (eds.). Cham: Springer Nature Switzerland AG, Vol. 1. p. 460-480 21 p. (Lecture Notes in Computer Science ; vol. 15718).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, M. C., Banerjee, D. & Usbeck, R., 14.07.2025, SEMANTiCS Conference 2025.

Research output: Contributions to collected editions/works › Article in conference proceedings › Research

Bridge-Generate: Scholarly Hybrid Question Answering

Taffa, T. A. & Usbeck, R., 23.05.2025, WWW Companion 2025 - Companion Proceedings of the ACM Web Conference 2025: Companion Proceedings of the ACM Web Conference 2025, April 28-May 2, 2025 Sydney, NSW, Australia. Long, G., Blumestein, M., Chang, Y., Lewin-Eytan, L., Huang, H. & Yom-Tov, E. (eds.). New York: Association for Computing Machinery, Inc, p. 1321-1325 5 p.

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Junior fellows and distinguished dissertation of the GI and AI for crisis

Usbeck, R., Kraft, A. & Westphal, P., 01.02.2025, In: IT - Information Technology. 67, 1, p. 1-2 2 p.

Research output: Journal contributions › Other (editorial matter etc.) › Research