ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Standard

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. / Salnikov, Mikhail; Sakhovskiy, Andrey; Nikishina, Irina et al.
Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Hrsg. / Ryutaro Ichise. Springer Science and Business Media Deutschland, 2026. S. 95-110 (Lecture Notes in Computer Science; Band 15836 LNCS).

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Harvard

Salnikov, M, Sakhovskiy, A, Nikishina, I, Usmanova, A, Kraft, A, Möller, C, Banerjee, D, Huang, J, Jiang, L, Abdullah, R, Yan, X, Tutubalina, E, Usbeck, R & Panchenko, A 2026, ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. in R Ichise (Hrsg.), Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Lecture Notes in Computer Science, Bd. 15836 LNCS, Springer Science and Business Media Deutschland, S. 95-110, 30th International Conference on Natural Language and Information Systems - NLDB 2025, Kanazawa, Japan, 04.07.25. https://doi.org/10.1007/978-3-031-97141-9_7

APA

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R., & Panchenko, A. (2026). ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. In R. Ichise (Hrsg.), Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings (S. 95-110). (Lecture Notes in Computer Science; Band 15836 LNCS). Springer Science and Business Media Deutschland. Vorzeitige Online-Publikation. https://doi.org/10.1007/978-3-031-97141-9_7

Vancouver

Salnikov M, Sakhovskiy A, Nikishina I, Usmanova A, Kraft A, Möller C et al. ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs. in Ichise R, Hrsg., Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Springer Science and Business Media Deutschland. 2026. S. 95-110. (Lecture Notes in Computer Science). Epub 2025 Jul 1. doi: 10.1007/978-3-031-97141-9_7

Bibtex

@inbook{23dd3ebcbd1a457cafa90bae648274d3,
title = "ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs",
abstract = "In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.",
keywords = "KGQA, Knowledge graphs, NLP, Question answering, Informatics",
author = "Mikhail Salnikov and Andrey Sakhovskiy and Irina Nikishina and Aida Usmanova and Angelie Kraft and Cedric M{\"o}ller and Debayan Banerjee and Junbo Huang and Longquan Jiang and Rana Abdullah and Xi Yan and Elena Tutubalina and Ricardo Usbeck and Alexander Panchenko",
note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.; 30th International Conference on Natural Language and Information Systems - NLDB 2025, NLDB 2025 ; Conference date: 04-07-2025 Through 06-07-2025",
year = "2025",
month = jul,
day = "1",
doi = "10.1007/978-3-031-97141-9_7",
language = "English",
isbn = "978-3-031-97140-2",
series = "Lecture Notes in Computer Science",
publisher = "Springer Science and Business Media Deutschland",
pages = "95--110",
editor = "Ryutaro Ichise",
booktitle = "Natural Language Processing and Information Systems",
address = "Germany",

}

RIS

TY - CHAP

T1 - ShortPathQA

T2 - 30th International Conference on Natural Language and Information Systems - NLDB 2025

AU - Salnikov, Mikhail

AU - Sakhovskiy, Andrey

AU - Nikishina, Irina

AU - Usmanova, Aida

AU - Kraft, Angelie

AU - Möller, Cedric

AU - Banerjee, Debayan

AU - Huang, Junbo

AU - Jiang, Longquan

AU - Abdullah, Rana

AU - Yan, Xi

AU - Tutubalina, Elena

AU - Usbeck, Ricardo

AU - Panchenko, Alexander

N1 - Conference code: 30

PY - 2025/7/1

Y1 - 2025/7/1

N2 - In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.

AB - In this work, we release the Shortest Path subgraph Question Answering (ShortPathQA) dataset, the first dataset that provides textual questions with pre-computed relevant subgraphs retrieved from the Wikidata Knowledge Graph (KG), standardizing the evaluation framework for Knowledge Graph Question Answering (KGQA). For this purpose, we utilize the Mintaka dataset for both training and testing and additionally create a manual question-answering subset for testing. Our baseline experiments with both supervised approaches and unsupervised Large Language Model (LLM) inference indicate that even a simplified KGQA formulation with given KG subgraphs and candidate answers remains challenging. Our analysis has shown that LLMs are unable to correctly process and utilize graph data structures without detailed prompt engineering or model tuning. This limitation highlights the need for the creation of this dataset as a training ground for the development of methods that enable LLMs to work more effectively with graph data.

KW - KGQA

KW - Knowledge graphs

KW - NLP

KW - Question answering

KW - Informatics

UR - http://www.scopus.com/inward/record.url?scp=105010833913&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-97141-9_7

DO - 10.1007/978-3-031-97141-9_7

M3 - Article in conference proceedings

AN - SCOPUS:105010833913

SN - 978-3-031-97140-2

T3 - Lecture Notes in Computer Science

SP - 95

EP - 110

BT - Natural Language Processing and Information Systems

A2 - Ichise, Ryutaro

PB - Springer Science and Business Media Deutschland

Y2 - 4 July 2025 through 6 July 2025

ER -

DOI