Knowledge Graph Question Answering Datasets and Their Generalizability: Are They Enough for Future Research?

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Existing approaches on Question Answering over Knowledge Graphs (KGQA) have weak generalizability. That is often due to the standard i.i.d. assumption on the underlying dataset. Recently, three levels of generalization for KGQA were defined, namely i.i.d., compositional, zero-shot. We analyze 25 well-known KGQA datasets for 5 different Knowledge Graphs (KGs). We show that according to this definition many existing and online available KGQA datasets are either not suited to train a generalizable KGQA system or that the datasets are based on discontinued and out-dated KGs. Generating new datasets is a costly process and, thus, is not an alternative to smaller research groups and companies. In this work, we propose a mitigation method for re-splitting available KGQA datasets to enable their applicability to evaluate generalization, without any cost and manual effort. We test our hypothesis on three KGQA datasets, i.e., LC-QuAD, LC-QuAD 2.0 and QALD-9). Experiments on re-splitted KGQA datasets demonstrate its effectiveness towards generalizability. The code and a unified way to access 18 available datasets is online at https: //github.com/semantic-systems/KGQA-datasets as well as https: //github.com/semantic-systems/KGQA-datasets-generalization.

Original languageEnglish
Title of host publicationSIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
EditorsEnrique Amigo, Pablo Castells, Julio Gonzalo
Number of pages10
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date07.07.2022
Pages3209-3218
ISBN (electronic)9781450387323
DOIs
Publication statusPublished - 07.07.2022
Externally publishedYes
Event45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 2022 - Online + Círculo de Bellas Artes (Circle of Beaux Arts), Madrid, Spain
Duration: 11.07.202215.07.2022
Conference number: 45
https://sigir.org/sigir2022/

Bibliographical note

Publisher Copyright:
© 2022 ACM.

Recently viewed

Publications

  1. CETUS – a baseline approach to type extraction
  2. Feature Extraction and Aggregation for Predicting the Euro 2016
  3. Covert and overt automatic imitation are correlated
  4. Indicator model of students' writing skills (IMOSS)
  5. Students’ perceptions of and conclusions from their first assessment experience at university
  6. Studienprogramm Nachhaltigkeit
  7. Why Being Democratic is Just Not Enough
  8. Understanding Records. A Field Guide to Recording Practice
  9. Exploring the Hidden Curriculum in Responsible Management Education
  10. Rolling bones
  11. How to Predict Mood?
  12. COMMONSIM
  13. The Role of Formalisation, Participation and Context in the Success of Public Involvement Mechanisms in Resource Management
  14. Assessing pre-travel online destination experience values of destination websites
  15. Material utilization of organic residues
  16. Reading instruction in 5th grade: teachers’ perspectives on promoting self-regulated reading in language and content area teaching
  17. rudimentäre Schreibung
  18. Sudoko mathematics for and done by younger students
  19. Amplifying actions for food system transformation: insights from the Stockholm region
  20. Deformation and Anchoring of AA 2024-T3 rivets within thin printed circuit boards
  21. Translation
  22. Alignment of the life cycle initiative’s “principles for the application of life cycle sustainability assessment” with the LCSA practice
  23. Das Inverted Classroom Model (ICM) im Kontext kompetenzorientierter Hochschullehre
  24. Structure and Organization of Product Development Projects
  25. Self-regulated learning and self assessment in online mathematics bridging courses
  26. Telling your own stories
  27. One step forward, two steps back
  28. Changing the Rules
  29. From the Precariat to the Multitude
  30. Cascaded Kalman Filters for a Sliding Mode Control in a Peltier Structure for an Innovative Manufacturing System
  31. Reform of the Injunctions Directive and Compensation for Consumers
  32. User Participation in the Quality Assurance of Requirements Specifications
  33. Constitutive views on csr communication
  34. Simulation of composite hot extrusion with high reinforcing Volumes
  35. Effects of extrusion ratio and annealing treatment on the mechanical properties and microstructure of a Mg–11Gd–4.5Y–1Nd–1.5Zn–0.5Zr (wt%) alloy
  36. Adaptive Lehrerinterventionen beim mathematischen Modellieren
  37. Influence of grid-connected solar inverters and mains monitoring systems on the spectral grid impedance