FaQuAD: Reading comprehension dataset in the domain of brazilian higher education

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

FaQuAD : Reading comprehension dataset in the domain of brazilian higher education. / Sayama, Helio Fonseca; Araujo, Anderson Vicoso; Fernandes, Eraldo Rezende.

2019 Brazilian Conference on Intelligent Systems: BRACIS 2019 : 15-18 October 2019, Salvador, Bahia, Brazil : proceedings. Piscataway : Institute of Electrical and Electronics Engineers Inc., 2019. p. 443-448 8923668 (Proceedings - Brazilian Conference on Intelligent Systems; No. 8).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Sayama, HF, Araujo, AV & Fernandes, ER 2019, FaQuAD: Reading comprehension dataset in the domain of brazilian higher education. in 2019 Brazilian Conference on Intelligent Systems: BRACIS 2019 : 15-18 October 2019, Salvador, Bahia, Brazil : proceedings., 8923668, Proceedings - Brazilian Conference on Intelligent Systems, no. 8, Institute of Electrical and Electronics Engineers Inc., Piscataway, pp. 443-448, Brazilian Conference on Intelligent Systems - BRACIS 2019, Salvador, Bahia, Brazil, 15.10.19. https://doi.org/10.1109/BRACIS.2019.00084

APA

Sayama, H. F., Araujo, A. V., & Fernandes, E. R. (2019). FaQuAD: Reading comprehension dataset in the domain of brazilian higher education. In 2019 Brazilian Conference on Intelligent Systems: BRACIS 2019 : 15-18 October 2019, Salvador, Bahia, Brazil : proceedings (pp. 443-448). [8923668] (Proceedings - Brazilian Conference on Intelligent Systems; No. 8). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BRACIS.2019.00084

Vancouver

Sayama HF, Araujo AV, Fernandes ER. FaQuAD: Reading comprehension dataset in the domain of brazilian higher education. In 2019 Brazilian Conference on Intelligent Systems: BRACIS 2019 : 15-18 October 2019, Salvador, Bahia, Brazil : proceedings. Piscataway: Institute of Electrical and Electronics Engineers Inc. 2019. p. 443-448. 8923668. (Proceedings - Brazilian Conference on Intelligent Systems; 8). doi: 10.1109/BRACIS.2019.00084

Bibtex

@inbook{7d87c40bb9d74e4cab10b27db9f7f079,
title = "FaQuAD: Reading comprehension dataset in the domain of brazilian higher education",
abstract = "Academic secretaries and faculty members of higher education institutions face a common problem: the abundance of questions sent by academics whose answers are found in available institutional documents. The official documents produced by Brazilian public universities are vast and disperse, which discourage students to further search for answers in such sources. In order to lessen this problem, we present FaQuAD: a novel machine reading comprehension dataset in the domain of Brazilian higher education institutions. FaQuAD follows the format of SQuAD (Stanford Question Answering Dataset) [Rajpurkar et al.2016]. It comprises 900 questions about 249 reading passages(paragraphs), which were taken from 18 official documents of a computer science college from a Brazilian federal university and 21 Wikipedia articles related to Brazilian higher education system. As far as we know, this is the first Portuguese reading comprehension dataset in this format. We trained a state-of-the-art model on this dataset, which is based on the Bi-Directional Attention Flow model [Seo et al. 2016]. We report on several ablation tests to assess different aspects of both the model and the dataset. For instance, we report learning curves to assess the amount of training data, the use of different levels of pre-trained models, and the use of more than one correct answer for each question.",
keywords = "Dataset, Machine Reading Comprehension, Natural Language Processing, Business informatics",
author = "Sayama, {Helio Fonseca} and Araujo, {Anderson Vicoso} and Fernandes, {Eraldo Rezende}",
year = "2019",
month = oct,
doi = "10.1109/BRACIS.2019.00084",
language = "English",
isbn = "978-1-7281-4254-8",
series = "Proceedings - Brazilian Conference on Intelligent Systems",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "8",
pages = "443--448",
booktitle = "2019 Brazilian Conference on Intelligent Systems",
address = "United States",
note = "Brazilian Conference on Intelligent Systems - BRACIS 2019 ; Conference date: 15-10-2019 Through 18-10-2019",
url = "http://www.bracis2019.ufba.br/#:~:text=The%208th%20Brazilian%20Conference%20on,October%2015%20to%2018%2C%202019.",

}

RIS

TY - CHAP

T1 - FaQuAD

T2 - Brazilian Conference on Intelligent Systems - BRACIS 2019

AU - Sayama, Helio Fonseca

AU - Araujo, Anderson Vicoso

AU - Fernandes, Eraldo Rezende

N1 - Conference code: 8

PY - 2019/10

Y1 - 2019/10

N2 - Academic secretaries and faculty members of higher education institutions face a common problem: the abundance of questions sent by academics whose answers are found in available institutional documents. The official documents produced by Brazilian public universities are vast and disperse, which discourage students to further search for answers in such sources. In order to lessen this problem, we present FaQuAD: a novel machine reading comprehension dataset in the domain of Brazilian higher education institutions. FaQuAD follows the format of SQuAD (Stanford Question Answering Dataset) [Rajpurkar et al.2016]. It comprises 900 questions about 249 reading passages(paragraphs), which were taken from 18 official documents of a computer science college from a Brazilian federal university and 21 Wikipedia articles related to Brazilian higher education system. As far as we know, this is the first Portuguese reading comprehension dataset in this format. We trained a state-of-the-art model on this dataset, which is based on the Bi-Directional Attention Flow model [Seo et al. 2016]. We report on several ablation tests to assess different aspects of both the model and the dataset. For instance, we report learning curves to assess the amount of training data, the use of different levels of pre-trained models, and the use of more than one correct answer for each question.

AB - Academic secretaries and faculty members of higher education institutions face a common problem: the abundance of questions sent by academics whose answers are found in available institutional documents. The official documents produced by Brazilian public universities are vast and disperse, which discourage students to further search for answers in such sources. In order to lessen this problem, we present FaQuAD: a novel machine reading comprehension dataset in the domain of Brazilian higher education institutions. FaQuAD follows the format of SQuAD (Stanford Question Answering Dataset) [Rajpurkar et al.2016]. It comprises 900 questions about 249 reading passages(paragraphs), which were taken from 18 official documents of a computer science college from a Brazilian federal university and 21 Wikipedia articles related to Brazilian higher education system. As far as we know, this is the first Portuguese reading comprehension dataset in this format. We trained a state-of-the-art model on this dataset, which is based on the Bi-Directional Attention Flow model [Seo et al. 2016]. We report on several ablation tests to assess different aspects of both the model and the dataset. For instance, we report learning curves to assess the amount of training data, the use of different levels of pre-trained models, and the use of more than one correct answer for each question.

KW - Dataset

KW - Machine Reading Comprehension

KW - Natural Language Processing

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85077055916&partnerID=8YFLogxK

U2 - 10.1109/BRACIS.2019.00084

DO - 10.1109/BRACIS.2019.00084

M3 - Article in conference proceedings

AN - SCOPUS:85077055916

SN - 978-1-7281-4254-8

T3 - Proceedings - Brazilian Conference on Intelligent Systems

SP - 443

EP - 448

BT - 2019 Brazilian Conference on Intelligent Systems

PB - Institute of Electrical and Electronics Engineers Inc.

CY - Piscataway

Y2 - 15 October 2019 through 18 October 2019

ER -