Benchmarking question answering systems
Research output: Journal contributions › Journal articles › Research › peer-review
Standard
In: Semantic Web, Vol. 10, No. 2, 2019, p. 293-304.
Research output: Journal contributions › Journal articles › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - JOUR
T1 - Benchmarking question answering systems
AU - Usbeck, Ricardo
AU - Röder, Michael
AU - Hoffmann, Michael
AU - Conrads, Felix
AU - Huthmann, Jonathan
AU - Ngonga-Ngomo, Axel Cyrille
AU - Demmler, Christian
AU - Unger, Christina
N1 - The authors gratefully acknowledge financial support from the German Federal Ministry of Education and Research within Eurostars, a joint programme of EUREKA and the European Community under the project E!9367 DIESEL and E!9725 QAMEL as well as the European Union's H2020 research and innovation action HOBBIT (GA 688227). We thank the QANARY team for inspiring discussions. Furthermore, we want to thank Jin-Dong Kim for his thoughts on the novel QA format. We also want to acknowledge that this project has been supported by the BMVI projects LIMBO (project no. 19F2029C) and OPAL (project no. 19F20284) as well as by the German Federal Ministry of Education and Research (BMBF) within 'KMU-innovativ: Forschung für die zivile Sicherheit' in particular 'Forschung für die zivile Sicherheit' and the project SOLIDE (no. 13N14456). Publisher Copyright: © 2019 - IOS Press and the authors. All rights reserved.
PY - 2019
Y1 - 2019
N2 - The necessity of making the Semantic Web more accessible for lay users, alongside the uptake of interactive systems and smart assistants for the Web, have spawned a new generation of RDF-based question answering systems. However, fair evaluation of these systems remains a challenge due to the different type of answers that they provide. Hence, repeating current published experiments or even benchmarking on the same datasets remains a complex and time-consuming task. We present a novel online benchmarking platform for question answering (QA) that relies on the FAIR principles to support the fine-grained evaluation of question answering systems. We detail how the platform addresses the fair benchmarking platform of question answering systems through the rewriting of URIs and URLs. In addition, we implement different evaluation metrics, measures, datasets and pre-implemented systems as well as methods to work with novel formats for interactive and non-interactive benchmarking of question answering systems. Our analysis of current frameworks shows that most of the current frameworks are tailored towards particular datasets and challenges but do not provide generic models. In addition, while most frameworks perform well in the annotation of entities and properties, the generation of SPARQL queries from annotated text remains a challenge.
AB - The necessity of making the Semantic Web more accessible for lay users, alongside the uptake of interactive systems and smart assistants for the Web, have spawned a new generation of RDF-based question answering systems. However, fair evaluation of these systems remains a challenge due to the different type of answers that they provide. Hence, repeating current published experiments or even benchmarking on the same datasets remains a complex and time-consuming task. We present a novel online benchmarking platform for question answering (QA) that relies on the FAIR principles to support the fine-grained evaluation of question answering systems. We detail how the platform addresses the fair benchmarking platform of question answering systems through the rewriting of URIs and URLs. In addition, we implement different evaluation metrics, measures, datasets and pre-implemented systems as well as methods to work with novel formats for interactive and non-interactive benchmarking of question answering systems. Our analysis of current frameworks shows that most of the current frameworks are tailored towards particular datasets and challenges but do not provide generic models. In addition, while most frameworks perform well in the annotation of entities and properties, the generation of SPARQL queries from annotated text remains a challenge.
KW - Benchmarking
KW - Factoid question answering
KW - Repeatable open research
KW - Informatics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=85060906490&partnerID=8YFLogxK
U2 - 10.3233/SW-180312
DO - 10.3233/SW-180312
M3 - Journal articles
AN - SCOPUS:85060906490
VL - 10
SP - 293
EP - 304
JO - Semantic Web
JF - Semantic Web
SN - 1570-0844
IS - 2
ER -