FaQuAD: Reading comprehension dataset in the domain of brazilian higher education

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Academic secretaries and faculty members of higher education institutions face a common problem: the abundance of questions sent by academics whose answers are found in available institutional documents. The official documents produced by Brazilian public universities are vast and disperse, which discourage students to further search for answers in such sources. In order to lessen this problem, we present FaQuAD: a novel machine reading comprehension dataset in the domain of Brazilian higher education institutions. FaQuAD follows the format of SQuAD (Stanford Question Answering Dataset) [Rajpurkar et al.2016]. It comprises 900 questions about 249 reading passages(paragraphs), which were taken from 18 official documents of a computer science college from a Brazilian federal university and 21 Wikipedia articles related to Brazilian higher education system. As far as we know, this is the first Portuguese reading comprehension dataset in this format. We trained a state-of-the-art model on this dataset, which is based on the Bi-Directional Attention Flow model [Seo et al. 2016]. We report on several ablation tests to assess different aspects of both the model and the dataset. For instance, we report learning curves to assess the amount of training data, the use of different levels of pre-trained models, and the use of more than one correct answer for each question.

Original languageEnglish
Title of host publication2019 Brazilian Conference on Intelligent Systems : BRACIS 2019 : 15-18 October 2019, Salvador, Bahia, Brazil : proceedings
Number of pages6
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date10.2019
Pages443-448
Article number8923668
ISBN (print)978-1-7281-4254-8
ISBN (electronic)978-1-7281-4253-1
DOIs
Publication statusPublished - 10.2019
Externally publishedYes
EventBrazilian Conference on Intelligent Systems - BRACIS 2019 - Salvador, Bahia, Brazil
Duration: 15.10.201918.10.2019
Conference number: 8
http://www.bracis2019.ufba.br/#:~:text=The%208th%20Brazilian%20Conference%20on,October%2015%20to%2018%2C%202019.

    Research areas

  • Dataset, Machine Reading Comprehension, Natural Language Processing
  • Business informatics

Recently viewed

Publications

  1. Recurring patterns and blueprints of industrial symbioses as structural units for an it tool
  2. Simon Denny
  3. Disentangling trade-offs and synergies around ecosystem services with the influence network framework
  4. New developments in extrusion of profiles with variable curvatures and cross-sections
  5. Development and application of a simplified sampling method for volatile polyfluorinated alkyl substances in indoor and environmental air
  6. Robust Decoupling Control of Contact Forces in Robotic Manipulation
  7. Pathways and mechanisms for catalyzing social impact through Orchestration: Insights from an open social innovation project
  8. Hands in Focus: Sign Language Recognition Via Top-Down Attention
  9. Biodiversity and ecosystem functioning relations in European forests depend on environmental context.
  10. Embracing scale-dependence to achieve a deeper understanding of biodiversity and its change across communities
  11. Combination of a reduced order state observer and an Extended Kalman Filter for Peltier cells
  12. Numerical Investigation of the Effect of Rolling on the Localized Stress and Strain Induction for Wire + Arc Additive Manufactured Structures
  13. Do we fail to exert self-control because we lack resources or motivation? Competing theories to explain a debated phenomenon
  14. What do people do when they use the internet?
  15. Machine Learning and Data Mining for Sports Analytics
  16. Collaborative modelling for active involvement of stakeholders in urban flood risk management
  17. Environmental Shareholder Value Matrix
  18. Online-Beratung für Eltern
  19. Navigating tensions in inclusive conservation
  20. Exploring the potential of using priority effects during ecological restoration to resist biological invasions in the neotropics
  21. Backward Extended Kalman Filter to Estimate and Adaptively Control a PMSM in Saturation Conditions
  22. Uncertainty, Pluralism, and the Knowledge-based Theory of the Firm
  23. ... address unknown?
  24. Toward supervised anomaly detection
  25. A comparison of the strength of biodiversity effects across multiple functions
  26. The Influence of Counterfactual Thinking about Uncontrolled Factors on Moral Judgment
  27. The 'need for speed'
  28. Perceptions of Organizational Downsizing
  29. Credit constraints, endogenous innovations, and price setting in international trade