Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

Original languageEnglish
Title of host publicationThe Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL) : Workshop Proceedings
EditorsRooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Number of pages9
Place of PublicationParis
PublisherEuropean Language Resources Association (ELRA)
Publication date2024
Pages124-132
ISBN (print)9782493814401
ISBN (electronic)978-2-493814-40-1
Publication statusPublished - 2024
Event5th Workshop on Resources for African Indigenous Languages - RAIL 2024 - Lingotto Conference Centre, Torino (Italy), Torino, Italy
Duration: 25.05.202425.05.2024
Conference number: 5
https://bit.ly/rail2024

Bibliographical note

Publisher Copyright:
© 2024 ELRA Language Resource Association.

    Research areas

  • Amh-QuAD, Amharic Question Answering Dataset, Amharic Reading Comprehension, Low Resource Question Answering
  • Informatics

Recently viewed

Researchers

  1. Sebastian Wüst

Publications

  1. CETUS – a baseline approach to type extraction
  2. Bitcoin und Blockchain
  3. Cross-Channel Real-Time Response Analysis
  4. Was gibt´s heute?
  5. Red mason bees cannot compete with honey bees for floral resources in a cage experiment
  6. A comparative study on the microstructure, mechanical properties, and hot deformation of magnesium alloys containing zinc, calcium and yttrium
  7. Healthier and Sustainable Food Systems: Integrating Underutilised Crops in a ‘Theory of Change Approach’
  8. Power centres
  9. Toxic Waste
  10. 11. Methoden-Muster
  11. New ways in engineering education for a sustainable and smart future
  12. Experimentieren mit Modellen
  13. Schrogl, Kai-Uwe (et. al.), Handbook of Space Security - Policies, Applications and Programs, Springer, 2015
  14. Risk preferences under heterogeneous environmental risk
  15. Combating Climate Change through Organisational Innovation
  16. Fieldwork meets crisis: Introduction
  17. Protecting older workers' employability
  18. Article 28 Relationship with Existing International Conventions
  19. 2D QSAR of PPARγ agonist binding and transactivation.
  20. Mythos
  21. Itineraria Picta: Itineraria Scripta
  22. Absenteeism as a Reaction to Harmful Behavior in the Workplace from a Stress Theory Point of View
  23. Climate change and environmental hazards related to shipping
  24. Cross-cultural differences in consumers' perception of the credibility of cause-related marketing (CRM) campaigns
  25. Deformation-induced dynamic precipitation during creep in magnesium-tin alloys
  26. Experimental investigation of temperature distribution during wire-based laser metal deposition of the Al-Mg alloy 5087
  27. Standards und Kompetenzentwicklung in Fremd- und Zweitsprachen
  28. Stanislaw Przybyszewski: Kommentarband
  29. Toward a Design Compendium for Metal Binder Jetting
  30. Modeling Interregional Patient Mobility: Theory and Evidence from Spatially Explicit Data
  31. Schule nach PISA