Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

Original languageEnglish
Title of host publicationThe Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL) : Workshop Proceedings
EditorsRooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Number of pages9
Place of PublicationParis
PublisherEuropean Language Resources Association (ELRA)
Publication date2024
Pages124-132
ISBN (print)9782493814401
ISBN (electronic)978-2-493814-40-1
Publication statusPublished - 2024
Event5th Workshop on Resources for African Indigenous Languages - RAIL 2024 - Lingotto Conference Centre, Torino (Italy), Torino, Italy
Duration: 25.05.202425.05.2024
Conference number: 5
https://bit.ly/rail2024

Bibliographical note

Publisher Copyright:
© 2024 ELRA Language Resource Association.

    Research areas

  • Amh-QuAD, Amharic Question Answering Dataset, Amharic Reading Comprehension, Low Resource Question Answering
  • Informatics

Recently viewed

Activities

  1. 12. Tag der Hydrologie - 2010 (Veranstaltung)
  2. "Is your slam based on facts, or on gags and slapstick?”– How problems of contemporary science communication concepts become visible in new public genres like the science slam
  3. Study Programme “International Comparative Education Research“ - ICER 2016
  4. Moderator im Panel „Dimensions of learning for a degrowth society”
  5. Learning and Instruction (Fachzeitschrift)
  6. Control transfers and remediation across the Upper Rhine. Scientific and technical conference in the framework of the Science Week Upper Rhine 2012
  7. „Don't forget: the archive!“ – Collecting Non-Archives for the Post-Media Condition - 2013
  8. Legal Expertise: From Above and From Below
  9. The power and peril of precise numbers
  10. Introducing the Teacher Education Network In Lüneburg. Theory-Practice-Interrelation through Transdisciplinary Cooperation
  11. Impulsvortrag „Was benötigen Universitäten, um erfolgreich zu sein?“
  12. Contesting the rules of the game? Political conflict, polarization and procedural consensus
  13. Photodegradation of chlorprothixene in aqueous medium: identification, biodegradation, and toxicity assessment of the formed transformation products
  14. 13th CIRP Conference on INTELLIGENT COMPUTATION IN MANUFACTURING ENGINEERING - CIRP ICME ’19
  15. How do students process different feedback? A study in German inclusive mathe-matics education
  16. Komplexe Systeme transformieren I - Interdisziplinäre Zusammenarbeit (Nachhaltiger Konsum)
  17. Adaptive teacher speech: An investigation of student directed speech from primary to secondary school
  18. 15th Internation Conference on Renewable Resources and Biorefineries
  19. Water Resources Management (Zeitschrift)

Publications

  1. When Testing Becomes Learning—Underscoring the Relevance of Habituation to Improve Internal Validity of Common Neurocognitive Tests
  2. Hommage to the unknown viewers
  3. Glancing into the Applied Tool Box
  4. New incremental methods for springback compensation by stress superposition
  5. Feedforward and repetitive control of a servo piezo-mechanical hydraulic actuator
  6. The rise and decline of regional power
  7. Probing turbulent superstructures in Rayleigh-Bénard convection by Lagrangian trajectory clusters
  8. Modeling Interactions and Dependencies in Production Planning and Control
  9. Structure as Infrastructure: The Interrelation of Fiber and Construction
  10. Rechtschreiben
  11. On the Epistemology of Computer Simulation
  12. X Machina and the World of Tomorrow
  13. Importance of timing
  14. Managing Green Business Model Transformations
  15. Increasing personal initiative in small business managers or owners leads to entrepreneurial success: A theory-based controlled randomized field intervention for evidence-based management
  16. Sustainable Development and Quality Assurance in Higher Education
  17. Mapping Amazon's logistical footprint on the Ruhr
  18. The same, but different
  19. Cyclic and non-cyclic crew rostering problems in public bus transit
  20. Toward supervised anomaly detection
  21. Effects of strategy instructions on learning from text and pictures
  22. Prior entry explains order reversals in the attentional blink
  23. Foundational Aspects of Polycentric Governance
  24. Determination of the antifungal agent posaconazole in human serum by HPLC with parallel column-switching technique
  25. Welteis
  26. Ästhetikkolumne
  27. States of Comparability