ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. ed. / Ryutaro Ichise. Springer Science and Business Media Deutschland, 2026. p. 95-110 (Lecture Notes in Computer Science; Vol. 15836 LNCS).
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - ShortPathQA
T2 - 30th International Conference on Natural Language and Information Systems - NLDB 2025
AU - Salnikov, Mikhail
AU - Sakhovskiy, Andrey
AU - Nikishina, Irina
AU - Usmanova, Aida
AU - Kraft, Angelie
AU - Möller, Cedric
AU - Banerjee, Debayan
AU - Huang, Junbo
AU - Jiang, Longquan
AU - Abdullah, Rana
AU - Yan, Xi
AU - Tutubalina, Elena
AU - Usbeck, Ricardo
AU - Panchenko, Alexander
N1 - Conference code: 30
PY - 2025/7/1
Y1 - 2025/7/1
N2 - In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.
AB - In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.
KW - KGQA
KW - Knowledge graphs
KW - NLP
KW - Question answering
KW - Informatics
UR - http://www.scopus.com/inward/record.url?scp=105010833913&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-97141-9_7
DO - 10.1007/978-3-031-97141-9_7
M3 - Article in conference proceedings
AN - SCOPUS:105010833913
SN - 978-3-031-97140-2
T3 - Lecture Notes in Computer Science
SP - 95
EP - 110
BT - Natural Language Processing and Information Systems
A2 - Ichise, Ryutaro
PB - Springer Science and Business Media Deutschland
Y2 - 4 July 2025 through 6 July 2025
ER -