Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
ECAI 2024 : 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain; including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings. ed. / Ulle Endriss; Francisco S. Melo; Kerstin Bach; Alberto José Bugarín Diz; Jose Maria Alonso-Moral; Senén Barro; Fredrik Heintz. Amsterdam: IOS Press BV, 2024. p. 1198-1205 (Frontiers in Artificial Intelligence and Applications; Vol. 392).
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset
AU - Yan, Xi
AU - Westphal, Patrick
AU - Seliger, Jan
AU - Usbeck, Ricardo
N1 - Conference code: 27
PY - 2024/10/16
Y1 - 2024/10/16
N2 - Despite the plethora of resources such as large-scale corpora and manually curated Knowledge Graphs (KGs), the ability to perform reasoning with natural language inputs over biomedical graphs remains challenging due to insufficient training data. We propose a novel method for automatically constructing a Biomedical Knowledge Graph Question Answering (BioKGQA) dataset sourced from PrimeKG, the largest precision medicine-oriented KG. In total, we create 85,368 question-answer pairs along with their respective SPARQL queries. Our approach generates a diverse array of contextually relevant questions covering a wide spectrum of biomedical concepts and levels of complexity. We evaluate our method based on automatic metrics alongside manual annotations. We establish novel standards tailored for KGQA systems to highlight the linguistic correctness and semantical faithfulness of the generated questions based on extracted KG facts. The compiled dataset – PrimeKGQA – serves as a valuable benchmarking resource for advancing knowledge-driven biomedical research and evaluating KGQA systems.
AB - Despite the plethora of resources such as large-scale corpora and manually curated Knowledge Graphs (KGs), the ability to perform reasoning with natural language inputs over biomedical graphs remains challenging due to insufficient training data. We propose a novel method for automatically constructing a Biomedical Knowledge Graph Question Answering (BioKGQA) dataset sourced from PrimeKG, the largest precision medicine-oriented KG. In total, we create 85,368 question-answer pairs along with their respective SPARQL queries. Our approach generates a diverse array of contextually relevant questions covering a wide spectrum of biomedical concepts and levels of complexity. We evaluate our method based on automatic metrics alongside manual annotations. We establish novel standards tailored for KGQA systems to highlight the linguistic correctness and semantical faithfulness of the generated questions based on extracted KG facts. The compiled dataset – PrimeKGQA – serves as a valuable benchmarking resource for advancing knowledge-driven biomedical research and evaluating KGQA systems.
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=85213378742&partnerID=8YFLogxK
U2 - 10.3233/FAIA240615
DO - 10.3233/FAIA240615
M3 - Article in conference proceedings
T3 - Frontiers in Artificial Intelligence and Applications
SP - 1198
EP - 1205
BT - ECAI 2024
A2 - Endriss, Ulle
A2 - Melo, Francisco S.
A2 - Bach, Kerstin
A2 - Diz, Alberto José Bugarín
A2 - Alonso-Moral, Jose Maria
A2 - Barro, Senén
A2 - Heintz, Fredrik
PB - IOS Press BV
CY - Amsterdam
T2 - 27th European Conference on Artificial Intelligence - ECAI 2024
Y2 - 19 October 2024 through 24 October 2024
ER -