Low Resource Question Answering: An Amharic Benchmarking Dataset: An Amharic Benchmarking Dataset

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Question Answering (QA) systems return concise answers or answer lists based on natural language text, which uses a given context document. Many resources go into curating QA datasets to advance the development of robust QA models. There is a surge in QA datasets for languages such as English; this is different for low-resource languages like Amharic. Indeed, there is no published or publicly available Amharic QA dataset. Hence, to foster further research in low-resource QA, we present the first publicly available benchmarking Amharic Question Answering Dataset (Amh-QuAD). We crowdsource 2,628 question-answer pairs from over 378 Amharic Wikipedia articles. Using the training set, we fine-tune an XLM-R-based language model and introduce a new reader model. Leveraging our newly fine-tuned reader run a baseline model to spark open-domain Amharic QA research interest. The best-performing baseline QA achieves an F-score of 80.3 and 81.34 in retriever-reader and reading comprehension settings.

OriginalspracheEnglisch
TitelThe Fifth Workshop on Resources for African Indigenous Languages @LREC-COLING-2024 (RAIL) : Workshop Proceedings
HerausgeberRooweither Mabuya, Muzi Matfunjwa, Mmasibidi Setaka, Menno van Zaanen
Anzahl der Seiten9
ErscheinungsortParis
VerlagEuropean Language Resources Association (ELRA)
Erscheinungsdatum2024
Seiten124-132
ISBN (Print)9782493814401
ISBN (elektronisch)978-2-493814-40-1
PublikationsstatusErschienen - 2024
Veranstaltung5th Workshop on Resources for African Indigenous Languages - RAIL 2024 - Lingotto Conference Centre, Torino (Italy), Torino, Italien
Dauer: 25.05.202425.05.2024
Konferenznummer: 5
https://bit.ly/rail2024

Bibliographische Notiz

Publisher Copyright:
© 2024 ELRA Language Resource Association.

Links

Zuletzt angesehen

Publikationen

  1. Lessons learned — The case of CROCUS
  2. Separating Cognitive and Content Domains in Mathematical Competence
  3. Is a severe clinical profile an effect modifier in a web-based depression treatment for adults with type 1 or type 2 diabetes ?
  4. Schematism, Imagination, and Pure Intuition in Kant
  5. Degradation of β-blockers in hospital wastewater by means of ozonation and Fe2+/ozonation
  6. Simulation wilder Spekulationen. Oder: Wie einmal Paul Baran mit einem falschen Modell das Internet erfand
  7. Working time flexibility and work-life balance
  8. Discovering Workscapes
  9. A Systematic Review and Meta-analysis of the Impact of Mindfulness-Based Interventions on the Well-Being of Healthcare Professionals
  10. Crossing borders
  11. Symbole unserer Zeit und ihre formierenden Elemente
  12. Developing sustainable business experimentation capability – A case study
  13. Cyberspace Battleground
  14. Spurred Emulation
  15. Testing Cort-Fitness and Cort-Adaptation hypotheses in a habitat suitability gradient for roe deer
  16. Serum Bactericidal Activity of Piperacillin/Tazobactam against Staphylococcus Aureus, Piperacillin-Susceptible and Piperacillin-Resistant Escherichia Coli and Pseudomonas Aeruginosa
  17. Alltag des Weisen
  18. Managing Stress During Long-Term Internships
  19. Victor Man
  20. Nordic game subcultures
  21. Ludus non tollit abusum
  22. Arbeit am Bild
  23. Martin Arnold und die Fortschreibung der Avantgarden
  24. Innovative Popular Science Communication? Materiality, Aesthetics and Gender in Science Slams
  25. A Hybrid Extended Kalman Filter as an Observer for a Pot-Electro-Magnetic Actuator
  26. „Das scheint mir das Widersprüchliche in der Kybernetik zu sein: Im gleichen Atemzug mit einer De-Anthropologisierung wird ein neuer Humanismus gefeiert.“