Low Resource Question Answering: An Amharic Benchmarking Dataset

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Standard

Low Resource Question Answering: An Amharic Benchmarking Dataset. / Taffa, Tilahun Abedissa; Assabie, Yaregal; Usbeck, Ricardo.
5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings. Hrsg. / Rooweither Mabuya; Muzi Matfunjwa; Mmasibidi Setaka; Menno van Zaanen. European Language Resources Association (ELRA), 2024. S. 124-132 (5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings).

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Harvard

Taffa, TA, Assabie, Y & Usbeck, R 2024, Low Resource Question Answering: An Amharic Benchmarking Dataset. in R Mabuya, M Matfunjwa, M Setaka & M van Zaanen (Hrsg.), 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings. 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings, European Language Resources Association (ELRA), S. 124-132, 5th Workshop on Resources for African Indigenous Languages, RAIL 2024, Torino, Italien, 25.05.24.

APA

Taffa, T. A., Assabie, Y., & Usbeck, R. (2024). Low Resource Question Answering: An Amharic Benchmarking Dataset. In R. Mabuya, M. Matfunjwa, M. Setaka, & M. van Zaanen (Hrsg.), 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings (S. 124-132). (5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings). European Language Resources Association (ELRA).

Vancouver

Taffa TA, Assabie Y, Usbeck R. Low Resource Question Answering: An Amharic Benchmarking Dataset. in Mabuya R, Matfunjwa M, Setaka M, van Zaanen M, Hrsg., 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings. European Language Resources Association (ELRA). 2024. S. 124-132. (5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings).

Bibtex

@inbook{ad146cf0d42c4ce1bb5ad3852a1b82a1,
title = "Low Resource Question Answering: An Amharic Benchmarking Dataset",
abstract = "Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.",
keywords = "Amh-QuAD, Amharic Question Answering Dataset, Amharic Reading Comprehension, Low Resource Question Answering, Informatics",
author = "Taffa, {Tilahun Abedissa} and Yaregal Assabie and Ricardo Usbeck",
note = "Publisher Copyright: {\textcopyright} 2024 ELRA Language Resource Association.; 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 ; Conference date: 25-05-2024",
year = "2024",
language = "English",
series = "5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings",
publisher = "European Language Resources Association (ELRA)",
pages = "124--132",
editor = "Rooweither Mabuya and Muzi Matfunjwa and Mmasibidi Setaka and {van Zaanen}, Menno",
booktitle = "5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings",
address = "Luxembourg",

}

RIS

TY - CHAP

T1 - Low Resource Question Answering

T2 - 5th Workshop on Resources for African Indigenous Languages, RAIL 2024

AU - Taffa, Tilahun Abedissa

AU - Assabie, Yaregal

AU - Usbeck, Ricardo

N1 - Publisher Copyright: © 2024 ELRA Language Resource Association.

PY - 2024

Y1 - 2024

N2 - Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

AB - Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

KW - Amh-QuAD

KW - Amharic Question Answering Dataset

KW - Amharic Reading Comprehension

KW - Low Resource Question Answering

KW - Informatics

UR - http://www.scopus.com/inward/record.url?scp=85195211713&partnerID=8YFLogxK

M3 - Article in conference proceedings

AN - SCOPUS:85195211713

T3 - 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings

SP - 124

EP - 132

BT - 5th Workshop on Resources for African Indigenous Languages, RAIL 2024 at LREC-COLING 2024 - Workshop Proceedings

A2 - Mabuya, Rooweither

A2 - Matfunjwa, Muzi

A2 - Setaka, Mmasibidi

A2 - van Zaanen, Menno

PB - European Language Resources Association (ELRA)

Y2 - 25 May 2024

ER -