Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. / Taffa, Tilahun Abedissa; Assabie, Yaregal; Usbeck, Ricardo.
The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings. ed. / Rooweither Mabuya; Muzi Matfunjwa; Mmasibidi Setaka; Menno van Zaanen. Paris: European Language Resources Association (ELRA), 2024. p. 124-132 (LREC proceedings), ( International conference on computational linguistics).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Taffa, TA, Assabie, Y & Usbeck, R 2024, Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. in R Mabuya, M Matfunjwa, M Setaka & M van Zaanen (eds), The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings. LREC proceedings, International conference on computational linguistics, European Language Resources Association (ELRA), Paris, pp. 124-132, 5th Workshop on Resources for African Indigenous Languages - RAIL 2024, Torino, Italy, 25.05.24. <https://aclanthology.org/2024.rail-1.14.pdf>

APA

Taffa, T. A., Assabie, Y., & Usbeck, R. (2024). Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. In R. Mabuya, M. Matfunjwa, M. Setaka, & M. van Zaanen (Eds.), The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings (pp. 124-132). (LREC proceedings), ( International conference on computational linguistics). European Language Resources Association (ELRA). https://aclanthology.org/2024.rail-1.14.pdf

Vancouver

Taffa TA, Assabie Y, Usbeck R. Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset. In Mabuya R, Matfunjwa M, Setaka M, van Zaanen M, editors, The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL): Workshop Proceedings. Paris: European Language Resources Association (ELRA). 2024. p. 124-132. (LREC proceedings). ( International conference on computational linguistics).

Bibtex

@inbook{ad146cf0d42c4ce1bb5ad3852a1b82a1,
title = "Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset",
abstract = "Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.",
keywords = "Amh-QuAD, Amharic Question Answering Dataset, Amharic Reading Comprehension, Low Resource Question Answering, Informatics",
author = "Taffa, {Tilahun Abedissa} and Yaregal Assabie and Ricardo Usbeck",
note = "Publisher Copyright: {\textcopyright} 2024 ELRA Language Resource Association.; 5th Workshop on Resources for African Indigenous Languages - RAIL 2024, RAIL 2024 ; Conference date: 25-05-2024 Through 25-05-2024",
year = "2024",
language = "English",
isbn = "9782493814401",
series = "LREC proceedings",
publisher = "European Language Resources Association (ELRA)",
pages = "124--132",
editor = "Rooweither Mabuya and Muzi Matfunjwa and Mmasibidi Setaka and {van Zaanen}, Menno",
booktitle = "The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL)",
address = "Luxembourg",
url = "https://bit.ly/rail2024",

}

RIS

TY - CHAP

T1 - Low Resource Question Answering: An Amharic Benchmarking Dataset

T2 - 5th Workshop on Resources for African Indigenous Languages - RAIL 2024

AU - Taffa, Tilahun Abedissa

AU - Assabie, Yaregal

AU - Usbeck, Ricardo

N1 - Conference code: 5

PY - 2024

Y1 - 2024

N2 - Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

AB - Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

KW - Amh-QuAD

KW - Amharic Question Answering Dataset

KW - Amharic Reading Comprehension

KW - Low Resource Question Answering

KW - Informatics

UR - http://www.scopus.com/inward/record.url?scp=85195211713&partnerID=8YFLogxK

UR - https://aclanthology.org/2024.rail-1.0.pdf

M3 - Article in conference proceedings

AN - SCOPUS:85195211713

SN - 9782493814401

T3 - LREC proceedings

SP - 124

EP - 132

BT - The Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL)

A2 - Mabuya, Rooweither

A2 - Matfunjwa, Muzi

A2 - Setaka, Mmasibidi

A2 - van Zaanen, Menno

PB - European Language Resources Association (ELRA)

CY - Paris

Y2 - 25 May 2024 through 25 May 2024

ER -

Recently viewed

Publications

  1. Managing information in the case of opinion spamming
  2. Thermal Conductivity Measurement of Salt Hydrates as Porous Material using Calorimetric (DSC) Method
  3. EU Migration and Asylum Policies
  4. Working hour arrangements and working hours
  5. Armed to Kill
  6. Effect of a Web-Based Guided Self-help Intervention for Prevention of Major Depression in Adults With Subthreshold Depression A Randomized Clinical Trial
  7. Web-based guided self-help for employees with depressive symptoms (Happy@Work)
  8. The impact of emotions, moods, and other affect-related variables on creativity, innovation and initiative
  9. Improved cytotoxicity testing of magnesium materials
  10. Atomkraft international
  11. L'agenda 21 locale
  12. Transformational ethics to bridge the void between facts and truths
  13. Understanding Similarities and Differences of Digital Health Platforms
  14. Qualitative system analysis as a means for sustainable governance of emerging technologies
  15. Utilizing Synchrotron Radiation for Phase Identification in Mg Alloys
  16. Can Geodesign Be Used to Facilitate Boundary Management for Planning and Implementation of Nature-based Solutions?
  17. Mechanical properties and corrosion performance of AZ-Mg alloy modified with Ca and Sr
  18. Credit constraints and margins of import
  19. An empirically tested overlap between indigenous and scientific knowledge of a changing climate in Bolivian Amazonia
  20. Two Mediterranean annuals feature high within-population trait variability and respond differently to a precipitation gradient
  21. Generalizing Trust
  22. Toward a modular evaluation approach of real-world laboratories
  23. Pragmatic acts of humour in family discourse in selected Maryam Apaokagi’s comedy skits
  24. The case survey method and applications in political science
  25. Learning to collaborate while collaborating
  26. Exports and profitability
  27. Dimension theory of linear solenoids