Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

Original languageEnglish
Title of host publicationThe Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL) : Workshop Proceedings
EditorsRooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Number of pages9
Place of PublicationParis
PublisherEuropean Language Resources Association (ELRA)
Publication date2024
Pages124-132
ISBN (print)9782493814401
ISBN (electronic)978-2-493814-40-1
Publication statusPublished - 2024
Event5th Workshop on Resources for African Indigenous Languages - RAIL 2024 - Lingotto Conference Centre, Torino (Italy), Torino, Italy
Duration: 25.05.202425.05.2024
Conference number: 5
https://bit.ly/rail2024

Bibliographical note

Publisher Copyright:
© 2024 ELRA Language Resource Association.

    Research areas

  • Amh-QuAD, Amharic Question Answering Dataset, Amharic Reading Comprehension, Low Resource Question Answering
  • Informatics

Recently viewed

Publications

  1. Heterogenität
  2. Are the terms “Socio-economic status” and “Class status” a warped form of reasoning for Max Weber?
  3. Microsatellites and allozymes as the genetic memory of habitat fragmentation and defragmentation in populations of the ground beetle Carabus auronitens (Col., Carabidae)
  4. A practical perspective on repatriate knowledge transfer
  5. Semi-polar root exudates in natural grassland communities
  6. Sol-gel technology for greener and more sustainable antimicrobial textiles that use silica matrices with C, and Ag and ZnO as biocides
  7. Fallstudie
  8. Actor analysis as a tool for exploring the decision-making processes in environmental governance
  9. Carbocyclic cis-[1.1.1]-tris-σ-homobenzenes - Syntheses by triple epoxide → cyclopropane conversions, structural data, [σ2s+σ2s+σ2s] cycloreversions
  10. Interventionen im Datenraum
  11. Time and Income Poverty – An Interdependent Multidimensional Poverty Approach with German Time Use Diary Data
  12. Plants, Androids and Operators
  13. The Use of Media in Intercultural Dialogue "dialogo_dialog"!
  14. SAP exchange infrastructure for developers
  15. Fallstudie
  16. Customer Orientation of Service Employees—Toward a Conceptual Framework of a Key Relationship Marketing Construct
  17. Study of digital morphing tools in the architectural design process
  18. Does the introduction of the Euro have an effect on subjective hypotheses about the price-quality relationship?
  19. Performance Saga: Interview 03
  20. External State-Building and Why Norms Matter
  21. Combined experimental–numerical study on residual stresses induced by a single impact as elementary process of mechanical peening
  22. Conceptual frameworks and methods for advancing invasion ecology
  23. Welteis