ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. / Salnikov, Mikhail; Sakhovskiy, Andrey; Nikishina, Irina et al.
Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. ed. / Ryutaro Ichise. Springer Science and Business Media Deutschland, 2026. p. 95-110 (Lecture Notes in Computer Science; Vol. 15836 LNCS).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Salnikov, M, Sakhovskiy, A, Nikishina, I, Usmanova, A, Kraft, A, Möller, C, Banerjee, D, Huang, J, Jiang, L, Abdullah, R, Yan, X, Tutubalina, E, Usbeck, R & Panchenko, A 2026, ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. in R Ichise (ed.), Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Lecture Notes in Computer Science, vol. 15836 LNCS, Springer Science and Business Media Deutschland, pp. 95-110, 30th International Conference on Natural Language and Information Systems - NLDB 2025, Kanazawa, Japan, 04.07.25. https://doi.org/10.1007/978-3-031-97141-9_7

APA

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R., & Panchenko, A. (2026). ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. In R. Ichise (Ed.), Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings (pp. 95-110). (Lecture Notes in Computer Science; Vol. 15836 LNCS). Springer Science and Business Media Deutschland. Advance online publication. https://doi.org/10.1007/978-3-031-97141-9_7

Vancouver

Salnikov M, Sakhovskiy A, Nikishina I, Usmanova A, Kraft A, Möller C et al. ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. In Ichise R, editor, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Springer Science and Business Media Deutschland. 2026. p. 95-110. (Lecture Notes in Computer Science). Epub 2025 Jul 1. doi: 10.1007/978-3-031-97141-9_7

Bibtex

@inbook{23dd3ebcbd1a457cafa90bae648274d3,
title = "ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs",
abstract = "In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.",
keywords = "KGQA, Knowledge graphs, NLP, Question answering, Informatics",
author = "Mikhail Salnikov and Andrey Sakhovskiy and Irina Nikishina and Aida Usmanova and Angelie Kraft and Cedric M{\"o}ller and Debayan Banerjee and Junbo Huang and Longquan Jiang and Rana Abdullah and Xi Yan and Elena Tutubalina and Ricardo Usbeck and Alexander Panchenko",
note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.; 30th International Conference on Natural Language and Information Systems - NLDB 2025, NLDB 2025 ; Conference date: 04-07-2025 Through 06-07-2025",
year = "2025",
month = jul,
day = "1",
doi = "10.1007/978-3-031-97141-9_7",
language = "English",
isbn = "978-3-031-97140-2",
series = "Lecture Notes in Computer Science",
publisher = "Springer Science and Business Media Deutschland",
pages = "95--110",
editor = "Ryutaro Ichise",
booktitle = "Natural Language Processing and Information Systems",
address = "Germany",

}

RIS

TY - CHAP

T1 - ShortPathQA

T2 - 30th International Conference on Natural Language and Information Systems - NLDB 2025

AU - Salnikov, Mikhail

AU - Sakhovskiy, Andrey

AU - Nikishina, Irina

AU - Usmanova, Aida

AU - Kraft, Angelie

AU - Möller, Cedric

AU - Banerjee, Debayan

AU - Huang, Junbo

AU - Jiang, Longquan

AU - Abdullah, Rana

AU - Yan, Xi

AU - Tutubalina, Elena

AU - Usbeck, Ricardo

AU - Panchenko, Alexander

N1 - Conference code: 30

PY - 2025/7/1

Y1 - 2025/7/1

N2 - In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.

AB - In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.

KW - KGQA

KW - Knowledge graphs

KW - NLP

KW - Question answering

KW - Informatics

UR - http://www.scopus.com/inward/record.url?scp=105010833913&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-97141-9_7

DO - 10.1007/978-3-031-97141-9_7

M3 - Article in conference proceedings

AN - SCOPUS:105010833913

SN - 978-3-031-97140-2

T3 - Lecture Notes in Computer Science

SP - 95

EP - 110

BT - Natural Language Processing and Information Systems

A2 - Ichise, Ryutaro

PB - Springer Science and Business Media Deutschland

Y2 - 4 July 2025 through 6 July 2025

ER -

Recently viewed

Publications

  1. To Row Together or Paddle One's Own Canoe? Simulating Strategies to Spur Digital Platform Growth
  2. Advanced extrusion processes
  3. Diversity promotes temporal stability across levels of ecosystem organization in experimental grasslands
  4. The magnitude of correlation between deadlift 1RM and jumping performance is sports dependent
  5. “Smart is not smart enough!” Anticipating critical raw material use in smart city concepts
  6. Personalized Transaction Kernels for Recommendation Using MCTS
  7. “We cannot let this happen again”
  8. Biological Computer Laboratory
  9. Forest Ecosystems: A functional and biodiversity perspective
  10. A highly endangered species on the edge
  11. Assessing the structure of UK environmental concern and its association with pro-environmental behaviour
  12. Building capacity for the science-policy interface on biodiversity and ecosystem services
  13. Origins and practices of genetic risk and responsibility
  14. Increasing knowledge through cooperation
  15. Where are we with? A dialectical theory on innovation
  16. Teacher collaboration, inclusive education and differentiated instruction
  17. Toward a Production-Oriented Imagology
  18. The Contribution of Large Banking Institutions to Systemic Risk
  19. If-Then Planning in Sports
  20. Small-scale soil patterns drive sharp boundaries between succulent "dwarf" biomes (or habitats) in the arid Succulent Karoo, South Africa
  21. A cross-scale assessment of productivity–diversity relationships
  22. Container und Algorithmen
  23. Digitization and path disruption
  24. Germany Humboldt University in Berlin: Its Transformation in the Process of German Unification
  25. MOLGEN 5.0, a Molecular Structure Generator
  26. The Continuities of Twitter Strategies and Algorithmic Terror
  27. The Efficacy of a Web-Based Stress Management Intervention for Employees Experiencing Adverse Working Conditions and Occupational Self-efficacy as a Mediator
  28. Trust Centrality in Online Social Networks