ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Mikhail Salnikov; Andrey Sakhovskiy; Irina Nikishina; Aida Usmanova; Angelie Kraft; Cedric Möller; Debayan Banerjee; Junbo Huang; Longquan Jiang; Rana Abdullah; Xi Yan; Elena Tutubalina; Ricardo Usbeck; Alexander Panchenko

doi:10.1007/978-3-031-97141-9_7

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Standard

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. / Salnikov, Mikhail; Sakhovskiy, Andrey; Nikishina, Irina et al.
Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. ed. / Ryutaro Ichise. Springer Science and Business Media Deutschland, 2026. p. 95-110 (Lecture Notes in Computer Science; Vol. 15836 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Harvard

Salnikov, M, Sakhovskiy, A, Nikishina, I, Usmanova, A , Kraft, A, Möller, C, Banerjee, D, Huang, J, Jiang, L, Abdullah, R, Yan, X, Tutubalina, E, Usbeck, R & Panchenko, A 2026, ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. in R Ichise (ed.), Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Lecture Notes in Computer Science, vol. 15836 LNCS, Springer Science and Business Media Deutschland, pp. 95-110, 30th International Conference on Natural Language and Information Systems - NLDB 2025, Kanazawa, Japan, 04.07.25. https://doi.org/10.1007/978-3-031-97141-9_7

APA

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R., & Panchenko, A. (2026). ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. In R. Ichise (Ed.), Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings (pp. 95-110). (Lecture Notes in Computer Science; Vol. 15836 LNCS). Springer Science and Business Media Deutschland. Advance online publication. https://doi.org/10.1007/978-3-031-97141-9_7

Vancouver

Salnikov M, Sakhovskiy A, Nikishina I, Usmanova A , Kraft A, Möller C et al. ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. In Ichise R, editor, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Springer Science and Business Media Deutschland. 2026. p. 95-110. (Lecture Notes in Computer Science). Epub 2025 Jul 1. doi: 10.1007/978-3-031-97141-9_7

Bibtex

@inbook{23dd3ebcbd1a457cafa90bae648274d3,

title = "ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs",

abstract = "In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.",

keywords = "KGQA, Knowledge graphs, NLP, Question answering, Informatics",

author = "Mikhail Salnikov and Andrey Sakhovskiy and Irina Nikishina and Aida Usmanova and Angelie Kraft and Cedric M{\"o}ller and Debayan Banerjee and Junbo Huang and Longquan Jiang and Rana Abdullah and Xi Yan and Elena Tutubalina and Ricardo Usbeck and Alexander Panchenko",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.; 30th International Conference on Natural Language and Information Systems - NLDB 2025, NLDB 2025 ; Conference date: 04-07-2025 Through 06-07-2025",

year = "2025",

month = jul,

day = "1",

doi = "10.1007/978-3-031-97141-9_7",

language = "English",

isbn = "978-3-031-97140-2",

series = "Lecture Notes in Computer Science",

publisher = "Springer Science and Business Media Deutschland",

pages = "95--110",

editor = "Ryutaro Ichise",

booktitle = "Natural Language Processing and Information Systems",

address = "Germany",

}

RIS

TY - CHAP

T1 - ShortPathQA

T2 - 30th International Conference on Natural Language and Information Systems - NLDB 2025

AU - Salnikov, Mikhail

AU - Sakhovskiy, Andrey

AU - Nikishina, Irina

AU - Usmanova, Aida

AU - Kraft, Angelie

AU - Möller, Cedric

AU - Banerjee, Debayan

AU - Huang, Junbo

AU - Jiang, Longquan

AU - Abdullah, Rana

AU - Yan, Xi

AU - Tutubalina, Elena

AU - Usbeck, Ricardo

AU - Panchenko, Alexander

N1 - Conference code: 30

PY - 2025/7/1

Y1 - 2025/7/1

N2 - In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.

AB - In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.

KW - KGQA

KW - Knowledge graphs

KW - NLP

KW - Question answering

KW - Informatics

UR - http://www.scopus.com/inward/record.url?scp=105010833913&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-97141-9_7

DO - 10.1007/978-3-031-97141-9_7

M3 - Article in conference proceedings

AN - SCOPUS:105010833913

SN - 978-3-031-97140-2

T3 - Lecture Notes in Computer Science

SP - 95

EP - 110

BT - Natural Language Processing and Information Systems

A2 - Ichise, Ryutaro

PB - Springer Science and Business Media Deutschland

Y2 - 4 July 2025 through 6 July 2025

ER -

Other publications by the same author(s)

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

Möller, C. & Usbeck, R., 2025, The Semantic Web: 22nd European Semantic Web Conference, ESWC 2025 Portoroz, Slovenia, June 1–5, 2025 Proceedings, Part I. Curry, E., Acosta, M., Poveda-Villalón, M., van Erp, M., Ojo, A., Hose, K., Shimizu, C. & Lisena, P. (eds.). Cham: Springer Nature Switzerland AG, Vol. 1. p. 460-480 21 p. (Lecture Notes in Computer Science ; vol. 15718).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

ASK-DBLP: Answering Questions over DBLP

Taffa, T., Neises, P., Ollinger, S., Westphal, P., Ackermann, M. R., Banerjee, D. & Usbeck, R., 02.11.2025, ISWC-C 2025, Industry, Doctoral Consortium, Posters and Demos at ISWC 2025: Joint Proceedings of Industry, Doctoral Consortium, Posters and Demos of the 24th International Semantic Web Conference (ISWC-C 2025), ISWC 2025 Companion Volume. Celino, I., Hassanzadeh, O., Bernstein, A., Noy, N., Cheng, G., Wang, S., Ferrada, S., Soulard, T., Kozaki, K., Takeda, H. & Gentile, A. L. (eds.). Aachen: Sun Site Central Europe (RWTH Aachen University), p. 435-440 6 p. D13. (CEUR Workshop Proceedings; vol. 4085).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, M. C., Banerjee, D. & Usbeck, R., 14.07.2025, Linking Meaning: Semantic Technologies Shaping the Future of AI: Cover 74617 Proceedings of the 21st International Conference on Semantic Systems, 3-5 September 2025, Vienna, Austria. Spahiu, B., Vahdati, S., Salatino, A., Pellegrini, T. & Havur, G. (eds.). IOS Press BV, p. 176-193 18 p. (Studies on the Semantic Web; vol. 62).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research

Best Practices in AI and Data Science Models Evaluation

Banerjee, D., Taffa, T. A. & Usbeck, R., 2025, INFORMATIK 2025 : The Wide Open - Offenheit von Source bis Science, 16.-19.September 2025 Potsdam. Lucke, U., Stieglitz, S., Uebernickel, F., Lamprecht, A.-L. & Klein, M. (eds.). Bonn: Gesellschaft für Informatik, Bonn, p. 1211-1219 9 p. (Lecture Notes in Informatics; vol. P366).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Bridge-Generate: Scholarly Hybrid Question Answering

Taffa, T. A. & Usbeck, R., 23.05.2025, WWW Companion 2025 - Companion Proceedings of the ACM Web Conference 2025: Companion Proceedings of the ACM Web Conference 2025, April 28-May 2, 2025 Sydney, NSW, Australia. Long, G., Blumestein, M., Chang, Y., Lewin-Eytan, L., Huang, H. & Yom-Tov, E. (eds.). New York: Association for Computing Machinery, Inc, p. 1321-1325 5 p.

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

DOI

https://doi.org/10.1007/978-3-031-97141-9_7
Final published version

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Standard

Harvard

APA

Vancouver

Bibtex

RIS

Other publications by the same author(s)

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

ASK-DBLP: Answering Questions over DBLP

Automating SPARQL Query Translations between DBpedia and Wikidata

Best Practices in AI and Data Science Models Evaluation

Bridge-Generate: Scholarly Hybrid Question Answering

DOI

Recently viewed

Publications