Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

Original languageEnglish
Title of host publicationThe Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL) : Workshop Proceedings
EditorsRooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Number of pages9
Place of PublicationParis
PublisherEuropean Language Resources Association (ELRA)
Publication date2024
Pages124-132
ISBN (print)9782493814401
ISBN (electronic)978-2-493814-40-1
Publication statusPublished - 2024
Event5th Workshop on Resources for African Indigenous Languages - RAIL 2024 - Lingotto Conference Centre, Torino (Italy), Torino, Italy
Duration: 25.05.202425.05.2024
Conference number: 5
https://bit.ly/rail2024

Bibliographical note

Publisher Copyright:
© 2024 ELRA Language Resource Association.

    Research areas

  • Amh-QuAD, Amharic Question Answering Dataset, Amharic Reading Comprehension, Low Resource Question Answering
  • Informatics

Recently viewed

Publications

  1. Genetically based differentiation in growth of multiple non-native plant species along a steep environmental gradient
  2. End-users’ perspective on digitalization
  3. The self-sabotage of conservation
  4. The Role of Assessment and Quality Management in Transformations towards Sustainable Development
  5. Exploring the motivations of protesters in contingent valuation
  6. Armed to Kill
  7. Web-based guided self-help for employees with depressive symptoms (Happy@Work)
  8. Corrosion behavior and microstructure of a broad range of Mg-Sn-X alloys
  9. A flexible semi-empirical model for estimating ammonia volatilization from field-applied slurry
  10. Conceptual frameworks and methods for advancing invasion ecology
  11. Using Large N Longitudinal Comparison to Explain Political Recruitment in Changing Democracies
  12. Article 3 Universal Application
  13. Walking Text and Writing Space
  14. Nachhaltigkeitsethik
  15. An interpretive perspective on co-production in supporting refugee families’ access to childcare in Germany
  16. Effekte inter-organisationaler Balanced Scorecards
  17. Fremde Töpfe
  18. Virtual-exchange collaboration timeline planner
  19. Fieldwork meets crisis: Introduction
  20. Das Reflexivitätsproblem und die Kategorienlehre
  21. Universität hat Zukunft
  22. Political Representation in the EU
  23. Globalisierung
  24. From the environmental state to the sustainability state? Conceptualization, indicators, and examples
  25. Amtsmenschen
  26. When (and how) ideas become arguments
  27. Fallstudie
  28. How environmental and social orientations influence the funding success of investment-based crowdfunding
  29. Fallstudie
  30. Average wage, qualification of the workforce and export performance in German enterprises: evidence from KombiFiD data
  31. How to measure energy-efficiency of software
  32. Small Particle Size Magnesium in One-pot Grignard-Zerewitinoff-like Reactions under Mechanochemical Conditions
  33. Bifunctional recombinant protein SDF1-GPVI as a new therapeutic concept for improved regeneration
  34. Can't Stop The Feeling
  35. Deformation-induced dynamic precipitation during creep in magnesium-tin alloys
  36. The attenuating effect of mortality salience on dishonest behavior
  37. Entrepreneurship and the "theory of planned behavior"
  38. Over here and over there
  39. Principles and perspectives of European criminal procedure
  40. Editorial Overview
  41. Der RADIUS eines Verlages
  42. Globalization’s limits to the environmental state? Integrating telecoupling into global environmental governance
  43. A leverage point perspective on serious games for sustainability transformation
  44. Christine Helmer: The trinity and Martin Luther