Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset

Xi Yan; Patrick Westphal; Jan Seliger; Ricardo Usbeck

doi:10.3233/FAIA240615

Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Standard

Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset. / Yan, Xi; Westphal, Patrick; Seliger, Jan et al.
ECAI 2024 : 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain; including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings. ed. / Ulle Endriss; Francisco S. Melo; Kerstin Bach; Alberto José Bugarín Diz; Jose Maria Alonso-Moral; Senén Barro; Fredrik Heintz. Amsterdam: IOS Press BV, 2024. p. 1198-1205 (Frontiers in Artificial Intelligence and Applications; Vol. 392).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Harvard

Yan, X, Westphal, P, Seliger, J & Usbeck, R 2024, Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset. in U Endriss, FS Melo, K Bach, AJB Diz, JM Alonso-Moral, S Barro & F Heintz (eds), ECAI 2024 : 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain; including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings. Frontiers in Artificial Intelligence and Applications, vol. 392, IOS Press BV, Amsterdam, pp. 1198-1205, 27th European Conference on Artificial Intelligence - ECAI 2024, Santiago de Compostela, Spain, 19.10.24. https://doi.org/10.3233/FAIA240615

APA

Yan, X., Westphal, P., Seliger, J., & Usbeck, R. (2024). Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset. In U. Endriss, F. S. Melo, K. Bach, A. J. B. Diz, J. M. Alonso-Moral, S. Barro, & F. Heintz (Eds.), ECAI 2024 : 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain; including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings (pp. 1198-1205). (Frontiers in Artificial Intelligence and Applications; Vol. 392). IOS Press BV. https://doi.org/10.3233/FAIA240615

Vancouver

Yan X, Westphal P, Seliger J, Usbeck R. Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset. In Endriss U, Melo FS, Bach K, Diz AJB, Alonso-Moral JM, Barro S, Heintz F, editors, ECAI 2024 : 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain; including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings. Amsterdam: IOS Press BV. 2024. p. 1198-1205. (Frontiers in Artificial Intelligence and Applications). doi: 10.3233/FAIA240615

Bibtex

@inbook{41d62101511041df813ac0c8f77d9b15,

title = "Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset",

abstract = "Despite the plethora of resources such as large-scale corpora and manually curated Knowledge Graphs (KGs), the ability to perform reasoning with natural language inputs over biomedical graphs remains challenging due to insufficient training data. We propose a novel method for automatically constructing a Biomedical Knowledge Graph Question Answering (BioKGQA) dataset sourced from PrimeKG, the largest precision medicine-oriented KG. In total, we create 85,368 question-answer pairs along with their respective SPARQL queries. Our approach generates a diverse array of contextually relevant questions covering a wide spectrum of biomedical concepts and levels of complexity. We evaluate our method based on automatic metrics alongside manual annotations. We establish novel standards tailored for KGQA systems to highlight the linguistic correctness and semantical faithfulness of the generated questions based on extracted KG facts. The compiled dataset – PrimeKGQA – serves as a valuable benchmarking resource for advancing knowledge-driven biomedical research and evaluating KGQA systems.",

keywords = "Business informatics",

author = "Xi Yan and Patrick Westphal and Jan Seliger and Ricardo Usbeck",

note = "Publisher Copyright: {\textcopyright} 2024 The Authors.; 27th European Conference on Artificial Intelligence - ECAI 2024 : {"}Celebrating the past. Inspiring the future{"}, ECAI 2024 ; Conference date: 19-10-2024 Through 24-10-2024",

year = "2024",

month = oct,

day = "16",

doi = "10.3233/FAIA240615",

language = "English",

series = "Frontiers in Artificial Intelligence and Applications",

publisher = "IOS Press BV",

pages = "1198--1205",

editor = "Ulle Endriss and Melo, {Francisco S.} and Kerstin Bach and Diz, {Alberto Jos{\'e} Bugar{\'i}n} and Alonso-Moral, {Jose Maria} and Sen{\'e}n Barro and Fredrik Heintz",

booktitle = "ECAI 2024",

address = "Netherlands",

url = "https://www.ecai2024.eu/",

}

RIS

TY - CHAP

T1 - Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset

AU - Yan, Xi

AU - Westphal, Patrick

AU - Seliger, Jan

AU - Usbeck, Ricardo

N1 - Conference code: 27

PY - 2024/10/16

Y1 - 2024/10/16

N2 - Despite the plethora of resources such as large-scale corpora and manually curated Knowledge Graphs (KGs), the ability to perform reasoning with natural language inputs over biomedical graphs remains challenging due to insufficient training data. We propose a novel method for automatically constructing a Biomedical Knowledge Graph Question Answering (BioKGQA) dataset sourced from PrimeKG, the largest precision medicine-oriented KG. In total, we create 85,368 question-answer pairs along with their respective SPARQL queries. Our approach generates a diverse array of contextually relevant questions covering a wide spectrum of biomedical concepts and levels of complexity. We evaluate our method based on automatic metrics alongside manual annotations. We establish novel standards tailored for KGQA systems to highlight the linguistic correctness and semantical faithfulness of the generated questions based on extracted KG facts. The compiled dataset – PrimeKGQA – serves as a valuable benchmarking resource for advancing knowledge-driven biomedical research and evaluating KGQA systems.

AB - Despite the plethora of resources such as large-scale corpora and manually curated Knowledge Graphs (KGs), the ability to perform reasoning with natural language inputs over biomedical graphs remains challenging due to insufficient training data. We propose a novel method for automatically constructing a Biomedical Knowledge Graph Question Answering (BioKGQA) dataset sourced from PrimeKG, the largest precision medicine-oriented KG. In total, we create 85,368 question-answer pairs along with their respective SPARQL queries. Our approach generates a diverse array of contextually relevant questions covering a wide spectrum of biomedical concepts and levels of complexity. We evaluate our method based on automatic metrics alongside manual annotations. We establish novel standards tailored for KGQA systems to highlight the linguistic correctness and semantical faithfulness of the generated questions based on extracted KG facts. The compiled dataset – PrimeKGQA – serves as a valuable benchmarking resource for advancing knowledge-driven biomedical research and evaluating KGQA systems.

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85213378742&partnerID=8YFLogxK

U2 - 10.3233/FAIA240615

DO - 10.3233/FAIA240615

M3 - Article in conference proceedings

T3 - Frontiers in Artificial Intelligence and Applications

SP - 1198

EP - 1205

BT - ECAI 2024

A2 - Endriss, Ulle

A2 - Melo, Francisco S.

A2 - Bach, Kerstin

A2 - Diz, Alberto José Bugarín

A2 - Alonso-Moral, Jose Maria

A2 - Barro, Senén

A2 - Heintz, Fredrik

PB - IOS Press BV

CY - Amsterdam

T2 - 27th European Conference on Artificial Intelligence - ECAI 2024

Y2 - 19 October 2024 through 24 October 2024

ER -

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R. & Panchenko, A., 2026, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Ichise, R. (ed.). Springer Science and Business Media Deutschland, p. 95-110 16 p. (Lecture Notes in Computer Science; vol. 15836 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

Möller, C. & Usbeck, R., 2025, The Semantic Web: 22nd European Semantic Web Conference, ESWC 2025 Portoroz, Slovenia, June 1–5, 2025 Proceedings, Part I. Curry, E., Acosta, M., Poveda-Villalón, M., van Erp, M., Ojo, A., Hose, K., Shimizu, C. & Lisena, P. (eds.). Cham: Springer Nature Switzerland AG, Vol. 1. p. 460-480 21 p. (Lecture Notes in Computer Science ; vol. 15718).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

ASK-DBLP: Answering Questions over DBLP

Taffa, T., Neises, P., Ollinger, S., Westphal, P., Ackermann, M. R., Banerjee, D. & Usbeck, R., 02.11.2025, ISWC-C 2025, Industry, Doctoral Consortium, Posters and Demos at ISWC 2025: Joint Proceedings of Industry, Doctoral Consortium, Posters and Demos of the 24th International Semantic Web Conference (ISWC-C 2025), ISWC 2025 Companion Volume. Celino, I., Hassanzadeh, O., Bernstein, A., Noy, N., Cheng, G., Wang, S., Ferrada, S., Soulard, T., Kozaki, K., Takeda, H. & Gentile, A. L. (eds.). Aachen: Sun Site Central Europe (RWTH Aachen University), p. 435-440 6 p. D13. (CEUR Workshop Proceedings; vol. 4085).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, M. C., Banerjee, D. & Usbeck, R., 14.07.2025, Linking Meaning: Semantic Technologies Shaping the Future of AI: Cover 74617 Proceedings of the 21st International Conference on Semantic Systems, 3-5 September 2025, Vienna, Austria. Spahiu, B., Vahdati, S., Salatino, A., Pellegrini, T. & Havur, G. (eds.). IOS Press BV, p. 176-193 18 p. (Studies on the Semantic Web; vol. 62).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research

Best Practices in AI and Data Science Models Evaluation

Banerjee, D., Taffa, T. A. & Usbeck, R., 2025, INFORMATIK 2025 : The Wide Open - Offenheit von Source bis Science, 16.-19.September 2025 Potsdam. Lucke, U., Stieglitz, S., Uebernickel, F., Lamprecht, A.-L. & Klein, M. (eds.). Bonn: Gesellschaft für Informatik, Bonn, p. 1211-1219 9 p. (Lecture Notes in Informatics; vol. P366).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

DOI

https://doi.org/10.3233/FAIA240615
Final published version

Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset

Standard

Harvard

APA

Vancouver

Bibtex

RIS

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

ASK-DBLP: Answering Questions over DBLP

Automating SPARQL Query Translations between DBpedia and Wikidata

Best Practices in AI and Data Science Models Evaluation

DOI

Recently viewed

Activities

Publications