Knowledge Graph Question Answering Datasets and Their Generalizability: Are They Enough for Future Research?

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Existing approaches on Question Answering over Knowledge Graphs (KGQA) have weak generalizability. That is often due to the standard i.i.d. assumption on the underlying dataset. Recently, three levels of generalization for KGQA were defined, namely i.i.d., compositional, zero-shot. We analyze 25 well-known KGQA datasets for 5 different Knowledge Graphs (KGs). We show that according to this definition many existing and online available KGQA datasets are either not suited to train a generalizable KGQA system or that the datasets are based on discontinued and out-dated KGs. Generating new datasets is a costly process and, thus, is not an alternative to smaller research groups and companies. In this work, we propose a mitigation method for re-splitting available KGQA datasets to enable their applicability to evaluate generalization, without any cost and manual effort. We test our hypothesis on three KGQA datasets, i.e., LC-QuAD, LC-QuAD 2.0 and QALD-9). Experiments on re-splitted KGQA datasets demonstrate its effectiveness towards generalizability. The code and a unified way to access 18 available datasets is online at https: //github.com/semantic-systems/KGQA-datasets as well as https: //github.com/semantic-systems/KGQA-datasets-generalization.

Original languageEnglish
Title of host publicationSIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
EditorsEnrique Amigo, Pablo Castells, Julio Gonzalo
Number of pages10
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date07.07.2022
Pages3209-3218
ISBN (electronic)9781450387323
DOIs
Publication statusPublished - 07.07.2022
Externally publishedYes
Event45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 2022 - Online + Círculo de Bellas Artes (Circle of Beaux Arts), Madrid, Spain
Duration: 11.07.202215.07.2022
Conference number: 45
https://sigir.org/sigir2022/

Bibliographical note

Publisher Copyright:
© 2022 ACM.

Recently viewed

Publications

  1. Determinants in the online distribution of digital content
  2. Bitcoin und Blockchain
  3. Navigating (In)Visibility
  4. The principle of unjust enrichment
  5. Junior High School Students’ Length Estimation Skills and Use of Strategies for Making Estimations
  6. Performance Saga: Interview 01
  7. Unlocking knowledge-policy action gaps in disaster-recovery-risk governance cycle
  8. Home range size and resource use of breeding and non-breeding white storks along a land use gradient
  9. Feature Extraction and Aggregation for Predicting the Euro 2016
  10. Editorial: Machine Learning and Data Mining in Materials Science
  11. Development and validation of a U.S. and German short version of the Later Life Workplace Index (LLWI-S)
  12. Effect of internal defects on tensile properties of A356 casting alloys
  13. Nest site selection and the effects of land use in a multi-scale approach on the distribution of a passerine in an island arid environment
  14. Forging of Mg–3Sn–2Ca–0.4Al Alloy Assisted by Its Processing Map and Validation Through Analytical Modeling
  15. Accuracy and bias of methods used for root length measurements in functional root research
  16. Reduction of capital tie up for assembly processes
  17. Tree mixtures mediate negative effects of introduced tree species on bird taxonomic and functional diversity
  18. The Politics of (Non)Knowledge in the (Un)Making of Migration
  19. A comparison between private and public access rules to bottlenecks - experiences and expectations from telecommunication and energy
  20. Modeling Converging Material Flows In The Supply Chain
  21. Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions
  22. CHANGING RECREATIONAL ACTIVITIES FOR REDUCING INSOMNIA SEVERITY? RESULTS FROM A SERIAL MEDIATION ANALYSIS ON THE IMPACT OF RECREATIONAL BEHAVIOR AS A MECHANISM OF CHANGE IN DIGITAL INTERVENTIONS FOR INSOMNIA
  23. Anisotropy and mechanical properties of dissimilar Al additive manufactured structures generated by multi-layer friction surfacing
  24. Developing a Process for the Analysis of User Journeys and the Prediction of Dropout in Digital Health Interventions:
  25. Article 11 Formal Validity
  26. The role of plant biodiversity in modifying the structure and functioning of higher tropic Levels in species-rich forests
  27. The role of task meaning on output in groups
  28. Analysis of the relevance of models, influencing factors and the point in time of the forecast on the prediction quality in order-related delivery time determination using machine learning
  29. Using rating scales for the assessment of physical self-concept
  30. Microstructural and mechanical aspects of reinforcement welds for lightweight components produced by friction hydro pillar processing
  31. "If you like something, you want it to develop."
  32. archiDART: an R package for the automated computation of plant root architectural traits
  33. Context-sensitive adjustment of pointing in great apes