Low Resource Question Answering: An Amharic Benchmarking Dataset
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Standard
5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings. Hrsg. / Rooweither Mabuya; Muzi Matfunjwa; Mmasibidi Setaka; Menno van Zaanen. European Language Resources Association (ELRA), 2024. S. 124-132 (5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings).
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Low Resource Question Answering
T2 - 5th Workshop on Resources for African Indigenous Languages, RAIL 2024
AU - Taffa, Tilahun Abedissa
AU - Assabie, Yaregal
AU - Usbeck, Ricardo
N1 - Publisher Copyright: © 2024 ELRA Language Resource Association.
PY - 2024
Y1 - 2024
N2 - Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.
AB - Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.
KW - Amh-QuAD
KW - Amharic Question Answering Dataset
KW - Amharic Reading Comprehension
KW - Low Resource Question Answering
KW - Informatics
UR - http://www.scopus.com/inward/record.url?scp=85195211713&partnerID=8YFLogxK
M3 - Article in conference proceedings
AN - SCOPUS:85195211713
T3 - 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings
SP - 124
EP - 132
BT - 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings
A2 - Mabuya, Rooweither
A2 - Matfunjwa, Muzi
A2 - Setaka, Mmasibidi
A2 - van Zaanen, Menno
PB - European Language Resources Association (ELRA)
Y2 - 25 May 2024
ER -