Knowledge Graph Question Answering Datasets and Their Generalizability: Are They Enough for Future Research?

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Existing approaches on Question Answering over Knowledge Graphs (KGQA) have weak generalizability. That is often due to the standard i.i.d. assumption on the underlying dataset. Recently, three levels of generalization for KGQA were defined, namely i.i.d., compositional, zero-shot. We analyze 25 well-known KGQA datasets for 5 different Knowledge Graphs (KGs). We show that according to this definition many existing and online available KGQA datasets are either not suited to train a generalizable KGQA system or that the datasets are based on discontinued and out-dated KGs. Generating new datasets is a costly process and, thus, is not an alternative to smaller research groups and companies. In this work, we propose a mitigation method for re-splitting available KGQA datasets to enable their applicability to evaluate generalization, without any cost and manual effort. We test our hypothesis on three KGQA datasets, i.e., LC-QuAD, LC-QuAD 2.0 and QALD-9). Experiments on re-splitted KGQA datasets demonstrate its effectiveness towards generalizability. The code and a unified way to access 18 available datasets is online at https: //github.com/semantic-systems/KGQA-datasets as well as https: //github.com/semantic-systems/KGQA-datasets-generalization.

Original languageEnglish
Title of host publicationSIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
EditorsEnrique Amigo, Pablo Castells, Julio Gonzalo
Number of pages10
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date07.07.2022
Pages3209-3218
ISBN (electronic)9781450387323
DOIs
Publication statusPublished - 07.07.2022
Externally publishedYes
Event45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 2022 - Online + Círculo de Bellas Artes (Circle of Beaux Arts), Madrid, Spain
Duration: 11.07.202215.07.2022
Conference number: 45
https://sigir.org/sigir2022/

Bibliographical note

Publisher Copyright:
© 2022 ACM.

Recently viewed

Researchers

  1. Kai Zhang

Publications

  1. Developing a model of financing for brownfield redevelopment
  2. NIF4OGGD - NLP interchange format for open German governmental data
  3. Bitcoin und Blockchain
  4. Developing shaping competence in informal setting at universities
  5. Improve a 3D distance measurement accuracy in stereo vision systems using optimization methods’ approach
  6. Sensorimotor Control and Proprioception in Neurorehabilitation
  7. Repeat Receipts: A device for generating visible data in market research focus groups
  8. Internet: Impact and Potential for Learning and Instruction
  9. Nonautonomous control of stable and unstable manifolds in two-dimensional flows
  10. Special issue on Variational Pragmatics
  11. How can problems be turned into something good? The role of entrepreneurial learning and error mastery orientation
  12. Optimal trajectory generation for camless internal combustion engine valve control
  13. How and Why Different Forms of Expertise Moderate Anchor Precision in Price Decisions
  14. Investigation of compression behavior of Mg-4Zn-2(Nd, Gd)-0.5Zr at 350°C by in situ synchrotron radiation diffraction
  15. Transformation products in the water cycle and the unsolved problem of their proactive assessment
  16. New and Rapid Fully Automated Method for Determination of Tazobactam and Piperacillin in Fatty Tissue and Serum by Column-Switching Liquid Chromatography
  17. The case of the composite Higgs
  18. The use of a monolithic column to improve the simultaneous determination of caffeine, paracetamol, pseudoephedrine, aspirin, dextromethorphan, chlorpheniramine in pharmaceutical formulations by HPLC-A comparison with a conventional reversed-phase silica-based column
  19. Analyzing Emotional Styles in the Field of Christian Religion and The Relevance of New Types of Visualization
  20. To use or not to use learning data
  21. Making the most out of timeseries symptom data
  22. Integrating teacher and student workspaces in a technology-enhanced mathematics lecture
  23. Individual Differences in Infants' Speech Segmentation Performance
  24. A Robust Approximated Derivative Action of a PID Regulator to be Applied in a Permanent Magnet Synchronous Motor Control
  25. Correction to: Operative communication: project Cybersyn and the intersection of information design, interface design, and interaction design (AI & SOCIETY, (2022), 10.1007/s00146-021-01346-2)
  26. Visual-Inertial Navigation Systems and Technologies
  27. Introduction to the challenges and chances regarding the utilization of nitrogen-rich by-products and waste streams
  28. Controlling a Bank Model Economy by Sliding Mode Control with Help of Kalman Filter
  29. Lyapunov stability analysis to set up a saturating PI controller with anti-windup for a mass flow system
  30. The principle of unjust enrichment
  31. Problem-Based and Project-Based Learning for Sustainable Development