AmQA: Amharic Question Answering Dataset
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung
Standard
Conference XXX. 2023.
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - AmQA
T2 - Amharic Question Answering Dataset
AU - Abedissa, Tilahun
AU - Usbeck, Ricardo
AU - Assabie, Yaregal
PY - 2023/3/6
Y1 - 2023/3/6
N2 - Question Answering (QA) returns concise answers or answer lists from natural language text given a context document. Many resources go into curating QA datasets to advance robust models' development. There is a surge of QA datasets for languages like English, however, this is not true for Amharic. Amharic, the official language of Ethiopia, is the second most spoken Semitic language in the world. There is no published or publicly available Amharic QA dataset. Hence, to foster the research in Amharic QA, we present the first Amharic QA (AmQA) dataset. We crowdsourced 2628 question-answer pairs over 378 Wikipedia articles. Additionally, we run an XLMR Large-based baseline model to spark open-domain QA research interest. The best-performing baseline achieves an F-score of 69.58 and 71.74 in reader-retriever QA and reading comprehension settings respectively.
AB - Question Answering (QA) returns concise answers or answer lists from natural language text given a context document. Many resources go into curating QA datasets to advance robust models' development. There is a surge of QA datasets for languages like English, however, this is not true for Amharic. Amharic, the official language of Ethiopia, is the second most spoken Semitic language in the world. There is no published or publicly available Amharic QA dataset. Hence, to foster the research in Amharic QA, we present the first Amharic QA (AmQA) dataset. We crowdsourced 2628 question-answer pairs over 378 Wikipedia articles. Additionally, we run an XLMR Large-based baseline model to spark open-domain QA research interest. The best-performing baseline achieves an F-score of 69.58 and 71.74 in reader-retriever QA and reading comprehension settings respectively.
KW - cs.CL
KW - cs.AI
KW - cs.IR
KW - Informatics
U2 - 10.48550/arXiv.2303.03290
DO - 10.48550/arXiv.2303.03290
M3 - Article in conference proceedings
BT - Conference XXX
ER -