FaQuAD: Reading comprehension dataset in the domain of brazilian higher education

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Academic secretaries and faculty members of higher education institutions face a common problem: the abundance of questions sent by academics whose answers are found in available institutional documents. The official documents produced by Brazilian public universities are vast and disperse, which discourage students to further search for answers in such sources. In order to lessen this problem, we present FaQuAD: a novel machine reading comprehension dataset in the domain of Brazilian higher education institutions. FaQuAD follows the format of SQuAD (Stanford Question Answering Dataset) [Rajpurkar et al.2016]. It comprises 900 questions about 249 reading passages(paragraphs), which were taken from 18 official documents of a computer science college from a Brazilian federal university and 21 Wikipedia articles related to Brazilian higher education system. As far as we know, this is the first Portuguese reading comprehension dataset in this format. We trained a state-of-the-art model on this dataset, which is based on the Bi-Directional Attention Flow model [Seo et al. 2016]. We report on several ablation tests to assess different aspects of both the model and the dataset. For instance, we report learning curves to assess the amount of training data, the use of different levels of pre-trained models, and the use of more than one correct answer for each question.

Original languageEnglish
Title of host publication2019 Brazilian Conference on Intelligent Systems : BRACIS 2019 : 15-18 October 2019, Salvador, Bahia, Brazil : proceedings
Number of pages6
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date10.2019
Pages443-448
Article number8923668
ISBN (print)978-1-7281-4254-8
ISBN (electronic)978-1-7281-4253-1
DOIs
Publication statusPublished - 10.2019
Externally publishedYes
EventBrazilian Conference on Intelligent Systems - BRACIS 2019 - Salvador, Bahia, Brazil
Duration: 15.10.201918.10.2019
Conference number: 8
http://www.bracis2019.ufba.br/#:~:text=The%208th%20Brazilian%20Conference%20on,October%2015%20to%2018%2C%202019.

    Research areas

  • Dataset, Machine Reading Comprehension, Natural Language Processing
  • Business informatics

Recently viewed

Publications

  1. Recurring patterns and blueprints of industrial symbioses as structural units for an it tool
  2. Principled Interpolation in Normalizing Flows
  3. Facing Up to Third Party Liability for Space Activities
  4. Learning with summaries
  5. Optimization of a gaseous multitube detector for soft X-ray detection
  6. Simon Denny
  7. Comparison of Backpropagation and Kalman Filter-based Training for Neural Networks
  8. Extraterritorial Asylum Processing. The Libya-Niger Emergency Transit Mechanism
  9. Assessing empirical research on value-based management
  10. Towards a Comprehensive Framework for Environmental Management Accounting
  11. Early-Career Researchers’ Perceptions of the Prevalence of Questionable Research Practices, Potential Causes, and Open Science
  12. Is Code Law? Kritik in Zeiten algorithmischer Gouvernementalität
  13. ℓp-norm multiple kernel learning
  14. How much can we learn about voluntary climate action from behavior in public goods games?
  15. Disentangling trade-offs and synergies around ecosystem services with the influence network framework
  16. Using Daily Stretching to Counteract Performance Decreases as a Result of Reduced Physical Activity—A Controlled Trial
  17. Halb voll oder halb leer?
  18. When Testing Becomes Learning—Underscoring the Relevance of Habituation to Improve Internal Validity of Common Neurocognitive Tests
  19. New developments in extrusion of profiles with variable curvatures and cross-sections
  20. Typewriting Dynamics
  21. A dynamic perspective on affect and creativity
  22. Plant density modifies root system architecture in spring barley (Hordeum vulgare L.) through a change in nodal root number