Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset. / Yan, Xi; Westphal, Patrick; Seliger, Jan et al.
ECAI 2024 : 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain; including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings. ed. / Ulle Endriss; Francisco S. Melo; Kerstin Bach; Alberto José Bugarín Diz; Jose Maria Alonso-Moral; Senén Barro; Fredrik Heintz. Amsterdam: IOS Press BV, 2024. p. 1198-1205 (Frontiers in Artificial Intelligence and Applications; Vol. 392).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Yan, X, Westphal, P, Seliger, J & Usbeck, R 2024, Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset. in U Endriss, FS Melo, K Bach, AJB Diz, JM Alonso-Moral, S Barro & F Heintz (eds), ECAI 2024 : 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain; including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings. Frontiers in Artificial Intelligence and Applications, vol. 392, IOS Press BV, Amsterdam, pp. 1198-1205, 27th European Conference on Artificial Intelligence - ECAI 2024, Santiago de Compostela, Spain, 19.10.24. https://doi.org/10.3233/FAIA240615

APA

Yan, X., Westphal, P., Seliger, J., & Usbeck, R. (2024). Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset. In U. Endriss, F. S. Melo, K. Bach, A. J. B. Diz, J. M. Alonso-Moral, S. Barro, & F. Heintz (Eds.), ECAI 2024 : 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain; including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings (pp. 1198-1205). (Frontiers in Artificial Intelligence and Applications; Vol. 392). IOS Press BV. https://doi.org/10.3233/FAIA240615

Vancouver

Yan X, Westphal P, Seliger J, Usbeck R. Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset. In Endriss U, Melo FS, Bach K, Diz AJB, Alonso-Moral JM, Barro S, Heintz F, editors, ECAI 2024 : 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain; including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings. Amsterdam: IOS Press BV. 2024. p. 1198-1205. (Frontiers in Artificial Intelligence and Applications). doi: 10.3233/FAIA240615

Bibtex

@inbook{41d62101511041df813ac0c8f77d9b15,
title = "Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset",
abstract = "Despite the plethora of resources such as large-scale corpora and manually curated Knowledge Graphs (KGs), the ability to perform reasoning with natural language inputs over biomedical graphs remains challenging due to insufficient training data. We propose a novel method for automatically constructing a Biomedical Knowledge Graph Question Answering (BioKGQA) dataset sourced from PrimeKG, the largest precision medicine-oriented KG. In total, we create 85,368 question-answer pairs along with their respective SPARQL queries. Our approach generates a diverse array of contextually relevant questions covering a wide spectrum of biomedical concepts and levels of complexity. We evaluate our method based on automatic metrics alongside manual annotations. We establish novel standards tailored for KGQA systems to highlight the linguistic correctness and semantical faithfulness of the generated questions based on extracted KG facts. The compiled dataset – PrimeKGQA – serves as a valuable benchmarking resource for advancing knowledge-driven biomedical research and evaluating KGQA systems.",
keywords = "Business informatics",
author = "Xi Yan and Patrick Westphal and Jan Seliger and Ricardo Usbeck",
note = "Publisher Copyright: {\textcopyright} 2024 The Authors.; 27th European Conference on Artificial Intelligence - ECAI 2024 : {"}Celebrating the past. Inspiring the future{"}, ECAI 2024 ; Conference date: 19-10-2024 Through 24-10-2024",
year = "2024",
month = oct,
day = "16",
doi = "10.3233/FAIA240615",
language = "English",
series = "Frontiers in Artificial Intelligence and Applications",
publisher = "IOS Press BV",
pages = "1198--1205",
editor = "Ulle Endriss and Melo, {Francisco S.} and Kerstin Bach and Diz, {Alberto Jos{\'e} Bugar{\'i}n} and Alonso-Moral, {Jose Maria} and Sen{\'e}n Barro and Fredrik Heintz",
booktitle = "ECAI 2024",
address = "Netherlands",
url = "https://www.ecai2024.eu/",

}

RIS

TY - CHAP

T1 - Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset

AU - Yan, Xi

AU - Westphal, Patrick

AU - Seliger, Jan

AU - Usbeck, Ricardo

N1 - Conference code: 27

PY - 2024/10/16

Y1 - 2024/10/16

N2 - Despite the plethora of resources such as large-scale corpora and manually curated Knowledge Graphs (KGs), the ability to perform reasoning with natural language inputs over biomedical graphs remains challenging due to insufficient training data. We propose a novel method for automatically constructing a Biomedical Knowledge Graph Question Answering (BioKGQA) dataset sourced from PrimeKG, the largest precision medicine-oriented KG. In total, we create 85,368 question-answer pairs along with their respective SPARQL queries. Our approach generates a diverse array of contextually relevant questions covering a wide spectrum of biomedical concepts and levels of complexity. We evaluate our method based on automatic metrics alongside manual annotations. We establish novel standards tailored for KGQA systems to highlight the linguistic correctness and semantical faithfulness of the generated questions based on extracted KG facts. The compiled dataset – PrimeKGQA – serves as a valuable benchmarking resource for advancing knowledge-driven biomedical research and evaluating KGQA systems.

AB - Despite the plethora of resources such as large-scale corpora and manually curated Knowledge Graphs (KGs), the ability to perform reasoning with natural language inputs over biomedical graphs remains challenging due to insufficient training data. We propose a novel method for automatically constructing a Biomedical Knowledge Graph Question Answering (BioKGQA) dataset sourced from PrimeKG, the largest precision medicine-oriented KG. In total, we create 85,368 question-answer pairs along with their respective SPARQL queries. Our approach generates a diverse array of contextually relevant questions covering a wide spectrum of biomedical concepts and levels of complexity. We evaluate our method based on automatic metrics alongside manual annotations. We establish novel standards tailored for KGQA systems to highlight the linguistic correctness and semantical faithfulness of the generated questions based on extracted KG facts. The compiled dataset – PrimeKGQA – serves as a valuable benchmarking resource for advancing knowledge-driven biomedical research and evaluating KGQA systems.

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85213378742&partnerID=8YFLogxK

U2 - 10.3233/FAIA240615

DO - 10.3233/FAIA240615

M3 - Article in conference proceedings

T3 - Frontiers in Artificial Intelligence and Applications

SP - 1198

EP - 1205

BT - ECAI 2024

A2 - Endriss, Ulle

A2 - Melo, Francisco S.

A2 - Bach, Kerstin

A2 - Diz, Alberto José Bugarín

A2 - Alonso-Moral, Jose Maria

A2 - Barro, Senén

A2 - Heintz, Fredrik

PB - IOS Press BV

CY - Amsterdam

T2 - 27th European Conference on Artificial Intelligence - ECAI 2024

Y2 - 19 October 2024 through 24 October 2024

ER -

DOI

Recently viewed

Publications

  1. Diversity promotes temporal stability across levels of ecosystem organization in experimental grasslands
  2. Compressive strength and hot deformation behavior of TX32 magnesium alloy with 0.4% Al and 0.4% Si additions
  3. Compression behavior of typical silicone rubbers for soft robotics applications at elevated temperatures
  4. Modeling and numerical simulation of multiscale behavior in polycrystals via extended crystal plasticity
  5. Welchen Einfluss haben körperliche Aktivität und Fitness im Kindes- und Jugendalter auf Bildungsoutcomes?
  6. Foreign Ownership and the Extensive Margins of Exports: Evidence for Manufacturing Enterprises in Germany
  7. The perceiver’s social role and a risk’s causal structure as determinants of environmental risk evaluation
  8. Validation of an open source, remote web-based eye-tracking method (WebGazer) for research in early childhood
  9. On the theoretical concept of the potential natural vegetation and proposals for an up-to-date modification
  10. Multilevel bridge governor by using model predictive control in wavelet packets for tracking trajectories
  11. Long-term memory predictors of adult language learning at the interface between syntactic form and meaning
  12. Entwicklung und Validierung eines Fragebogens zur Erfassung von Freude am Schulsport im Jugendalter (FEFS-J)
  13. Transdisciplinary co-creation increases the utilization of knowledge from sustainable development research
  14. Stepwise-based optimizing approaches for arrangements of loudspeaker in multi-zone sound field reproduction
  15. Middle Pleistocene interglacial peat deposits from Northern Germany investigated by 230Th/U and palynology
  16. Exploring the implications of the value concept for performance assessment of sustainable business models
  17. Sequence Analysis in Entrepreneurship Research: Business Founders’ Life Courses and Early-Stage Firm Survival
  18. Second-Order Sliding Mode Control with State and Disturbance Estimation for a Permanent Magnet Linear Motor
  19. Comparison of Trajectory Estimation Methods Based on LIDAR and Monocular Camera in a Simulated Environment
  20. Median Based Algorithm as an Entropy Function for Noise Detection in Wavelet Trees for Data Reconciliation
  21. Multiflux - Pilotstudie für die Entwicklung eines Verfahrens zur simulationsbasierten intelligenzdiagnostik
  22. Working from home during the COVID-19 crisis: How self-control strategies elucidate employees’ job performance
  23. Gesundheitskompetenz und Gesundheit von Studierenden unter besonderer Betrachtung der Lehramtsstudierenden
  24. Species diversity of forest floor biota in non-native Douglas-fir stands is similar to that of native stands
  25. ‘The human shield effect’: Human-wildlife co-occurrence patterns in the coffee forests of southwestern Ethiopia
  26. An extended kalman filter for temperature monitoring of a metal-polymer hybrid fibre based heater structure
  27. Einstellungen und Selbstwirksamkeitsüberzeugungen von Lehramtsstudierenden bezüglich inklusiven Unterrichts
  28. A cascade regulator using Lyapunov's PID-PID controllers for an aggregate actuator in automotive applications
  29. An Extended Kalman Filter as an Observer in a Sliding Mode Controller for a Metal-Polymer Composite Actuator
  30. Comparison of three methods of length compensation in a parallel kinematic and their equivalence conditions
  31. Numerical Investigation of Influence of Spot Geometry in Laser Peen Forming of Thin-Walled Ti-6Al-4V Specimens
  32. Gaseous nitrogen losses from urea applied to maize on a calcareous fluvo-aquic soil in the North China Plain
  33. Do overlapping audit and compensation committee memberships contribute to better financial reporting quality?
  34. High temperature deformation mechanisms and processing map for hot working of cast-homogenized Mg-3Sn-2Ca alloy
  35. Impact of an Acceptance Facilitating Intervention on Patients’ Acceptance of Internet-based Pain Interventions
  36. On the Problems of Honorary Work in German Sports Clubs – A Qualitative-Dominated Crossover Mixed Methods Study