AmQA: Amharic Question Answering Dataset

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearch

Standard

AmQA : Amharic Question Answering Dataset. / Abedissa, Tilahun; Usbeck, Ricardo; Assabie, Yaregal.

Conference XXX. 2023.

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearch

Harvard

APA

Abedissa, T., Usbeck, R., & Assabie, Y. (2023). AmQA: Amharic Question Answering Dataset. Manuscript in preparation. In Conference XXX https://doi.org/10.48550/arXiv.2303.03290

Vancouver

Abedissa T, Usbeck R, Assabie Y. AmQA: Amharic Question Answering Dataset. In Conference XXX. 2023 doi: 10.48550/arXiv.2303.03290

Bibtex

@inbook{5abb1ef212ab4e4ba203ca5560e52c5e,
title = "AmQA: Amharic Question Answering Dataset",
abstract = " Question Answering (QA) returns concise answers or answer lists from natural language text given a context document. Many resources go into curating QA datasets to advance robust models' development. There is a surge of QA datasets for languages like English, however, this is not true for Amharic. Amharic, the official language of Ethiopia, is the second most spoken Semitic language in the world. There is no published or publicly available Amharic QA dataset. Hence, to foster the research in Amharic QA, we present the first Amharic QA (AmQA) dataset. We crowdsourced 2628 question-answer pairs over 378 Wikipedia articles. Additionally, we run an XLMR Large-based baseline model to spark open-domain QA research interest. The best-performing baseline achieves an F-score of 69.58 and 71.74 in reader-retriever QA and reading comprehension settings respectively. ",
keywords = "cs.CL, cs.AI, cs.IR, Informatics",
author = "Tilahun Abedissa and Ricardo Usbeck and Yaregal Assabie",
year = "2023",
month = mar,
day = "6",
doi = "10.48550/arXiv.2303.03290",
language = "English",
booktitle = "Conference XXX",

}

RIS

TY - CHAP

T1 - AmQA

T2 - Amharic Question Answering Dataset

AU - Abedissa, Tilahun

AU - Usbeck, Ricardo

AU - Assabie, Yaregal

PY - 2023/3/6

Y1 - 2023/3/6

N2 - Question Answering (QA) returns concise answers or answer lists from natural language text given a context document. Many resources go into curating QA datasets to advance robust models' development. There is a surge of QA datasets for languages like English, however, this is not true for Amharic. Amharic, the official language of Ethiopia, is the second most spoken Semitic language in the world. There is no published or publicly available Amharic QA dataset. Hence, to foster the research in Amharic QA, we present the first Amharic QA (AmQA) dataset. We crowdsourced 2628 question-answer pairs over 378 Wikipedia articles. Additionally, we run an XLMR Large-based baseline model to spark open-domain QA research interest. The best-performing baseline achieves an F-score of 69.58 and 71.74 in reader-retriever QA and reading comprehension settings respectively.

AB - Question Answering (QA) returns concise answers or answer lists from natural language text given a context document. Many resources go into curating QA datasets to advance robust models' development. There is a surge of QA datasets for languages like English, however, this is not true for Amharic. Amharic, the official language of Ethiopia, is the second most spoken Semitic language in the world. There is no published or publicly available Amharic QA dataset. Hence, to foster the research in Amharic QA, we present the first Amharic QA (AmQA) dataset. We crowdsourced 2628 question-answer pairs over 378 Wikipedia articles. Additionally, we run an XLMR Large-based baseline model to spark open-domain QA research interest. The best-performing baseline achieves an F-score of 69.58 and 71.74 in reader-retriever QA and reading comprehension settings respectively.

KW - cs.CL

KW - cs.AI

KW - cs.IR

KW - Informatics

U2 - 10.48550/arXiv.2303.03290

DO - 10.48550/arXiv.2303.03290

M3 - Article in conference proceedings

BT - Conference XXX

ER -