Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. / Taffa, Tilahun Abedissa; Assabie, Yaregal; Usbeck, Ricardo.
The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings. ed. / Rooweither Mabuya; Muzi Matfunjwa; Mmasibidi Setaka; Menno van Zaanen. Paris: European Language Resources Association (ELRA), 2024. p. 124-132 (LREC proceedings), ( International conference on computational linguistics).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Taffa, TA, Assabie, Y & Usbeck, R 2024, Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. in R Mabuya, M Matfunjwa, M Setaka & M van Zaanen (eds), The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings. LREC proceedings, International conference on computational linguistics, European Language Resources Association (ELRA), Paris, pp. 124-132, 5th Workshop on Resources for African Indigenous Languages, RAIL 2024, Torino, Italy, 25.05.24. <https://aclanthology.org/2024.rail-1.14.pdf>

APA

Taffa, T. A., Assabie, Y., & Usbeck, R. (2024). Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. In R. Mabuya, M. Matfunjwa, M. Setaka, & M. van Zaanen (Eds.), The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings (pp. 124-132). (LREC proceedings), ( International conference on computational linguistics). European Language Resources Association (ELRA). https://aclanthology.org/2024.rail-1.14.pdf

Vancouver

Taffa TA, Assabie Y, Usbeck R. Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. In Mabuya R, Matfunjwa M, Setaka M, van Zaanen M, editors, The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings. Paris: European Language Resources Association (ELRA). 2024. p. 124-132. (LREC proceedings). ( International conference on computational linguistics).

Bibtex

@inbook{ad146cf0d42c4ce1bb5ad3852a1b82a1,
title = "Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset",
abstract = "Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.",
keywords = "Amh-QuAD, Amharic Question Answering Dataset, Amharic Reading Comprehension, Low Resource Question Answering, Informatics",
author = "Taffa, {Tilahun Abedissa} and Yaregal Assabie and Ricardo Usbeck",
note = "Publisher Copyright: {\textcopyright} 2024 ELRA Language Resource Association.; 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 ; Conference date: 25-05-2024",
year = "2024",
language = "English",
series = "LREC proceedings",
publisher = "European Language Resources Association (ELRA)",
pages = "124--132",
editor = "Rooweither Mabuya and Muzi Matfunjwa and Mmasibidi Setaka and {van Zaanen}, Menno",
booktitle = "The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL)",
address = "Luxembourg",

}

RIS

TY - CHAP

T1 - Low Resource Question Answering: An Amharic Benchmarking Dataset

T2 - 5th Workshop on Resources for African Indigenous Languages, RAIL 2024

AU - Taffa, Tilahun Abedissa

AU - Assabie, Yaregal

AU - Usbeck, Ricardo

N1 - Publisher Copyright: © 2024 ELRA Language Resource Association.

PY - 2024

Y1 - 2024

N2 - Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

AB - Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

KW - Amh-QuAD

KW - Amharic Question Answering Dataset

KW - Amharic Reading Comprehension

KW - Low Resource Question Answering

KW - Informatics

UR - http://www.scopus.com/inward/record.url?scp=85195211713&partnerID=8YFLogxK

UR - https://aclanthology.org/2024.rail-1.0.pdf

UR - https://aclanthology.org/events/coling-2024/#2024rail-1

M3 - Article in conference proceedings

AN - SCOPUS:85195211713

T3 - LREC proceedings

SP - 124

EP - 132

BT - The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL)

A2 - Mabuya, Rooweither

A2 - Matfunjwa, Muzi

A2 - Setaka, Mmasibidi

A2 - van Zaanen, Menno

PB - European Language Resources Association (ELRA)

CY - Paris

Y2 - 25 May 2024

ER -