Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Despite the plethora of resources such as large-scale corpora and manually curated Knowledge Graphs (KGs), the ability to perform reasoning with natural language inputs over biomedical graphs remains challenging due to insufficient training data. We propose a novel method for automatically constructing a Biomedical Knowledge Graph Question Answering (BioKGQA) dataset sourced from PrimeKG, the largest precision medicine-oriented KG. In total, we create 85,368 question-answer pairs along with their respective SPARQL queries. Our approach generates a diverse array of contextually relevant questions covering a wide spectrum of biomedical concepts and levels of complexity. We evaluate our method based on automatic metrics alongside manual annotations. We establish novel standards tailored for KGQA systems to highlight the linguistic correctness and semantical faithfulness of the generated questions based on extracted KG facts. The compiled dataset – PrimeKGQA – serves as a valuable benchmarking resource for advancing knowledge-driven biomedical research and evaluating KGQA systems.
Original languageEnglish
Title of host publicationECAI 2024 : 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain; including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings
EditorsUlle Endriss, Francisco S. Melo, Kerstin Bach, Alberto José Bugarín Diz, Jose Maria Alonso-Moral, Senén Barro, Fredrik Heintz
Number of pages8
Place of PublicationAmsterdam
PublisherIOS Press BV
Publication date16.10.2024
Pages1198-1205
ISBN (electronic)978-1-64368-548-9
DOIs
Publication statusPublished - 16.10.2024
Event27th European Conference on Artificial Intelligence - ECAI 2024: "Celebrating the past. Inspiring the future" - University of Santiago de Compostela., Santiago de Compostela, Spain
Duration: 19.10.202424.10.2024
Conference number: 27
https://www.ecai2024.eu/

Bibliographical note

Publisher Copyright:
© 2024 The Authors.

DOI

Recently viewed

Publications

  1. Watershed groundwater balance estimation using streamflow recession analysis and baseflow separation
  2. Reducing mean tardiness in a flexible job shop containing AGVs with optimized combinations of sequencing and routing rules
  3. Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers
  4. A Note on Estimation of Empirical Models for Margins of Exports with Unknown Non-linear Functional Forms
  5. Deconstructing the Theoretical Language of Process Research
  6. An isomorphism between polynomial eigenfunctions of the transfer operator and the Eichler cohomology for modular groups
  7. Influence of Equal-Channel Angular Pressing on the Microstructure and Texture of Mg-Zn-Y-Zr-RE Alloy Sheets
  8. Assessing Effects Through Semi-Field and Field Toxicity Testing
  9. Lost-customers approximation of semi-open queueing networks with backordering
  10. In situ synchrotron radiation diffraction investigation of the compression behaviour at 350 °C of ZK40 alloys with addition of CaO and Y
  11. Joseph Weizenbaum
  12. Neural relational inference for disaster multimedia retrieval
  13. Intellectual humility links to metacognitive ability
  14. Determinants of union membership in 18 EU countries
  15. Using an adaptive memory strategy to improve a multistart heuristic for sequencing by hybridization
  16. Graph-based Approaches for Analyzing Team Interaction on the Example of Soccer
  17. Assessing Quality of Teaching from Different Perspectives
  18. Influence of Long-Lasting Static Stretching Intervention on Functional and Morphological Parameters in the Plantar Flexors
  19. Numerical Investigation of the Effect of Rolling on the Localized Stress and Strain Induction for Wire + Arc Additive Manufactured Structures
  20. Do abundance distributions and species aggregation correctly predict macroecological biodiversity patterns in tropical forests?
  21. Using measures of reading time regularity (RTR) to quantify eye movement dynamics, and how they are shaped by linguistic information
  22. archiDART: an R package for the automated computation of plant root architectural traits
  23. Investigating the Promotional Effect of Green Signals in Sponsored Search Advertising Using Bayesian Parameter Estimation
  24. Does symbolic representation through class signalling appeal to voters? Evidence from a conjoint experiment
  25. Supporting non-hierarchical supply chain networks in the electronics industry
  26. Sustainable use of ecosystem services under multiple risks
  27. Geometric control tools for robotic manipulators
  28. Tree phylogenetic diversity promotes host–parasitoid interactions
  29. Net deferred tax assets and the long-run performance of initial public offerings
  30. Gluing life together. Computer simulation in the life sciences
  31. De-Anonymizing Anonymous
  32. Fermentative utilization of coffee mucilage using Bacillus coagulans and investigation of down-stream processing of fermentation broth for optically pure L(+)-lactic acid production
  33. Eulerian and Lagrangian perspectives on turbulent superstructures in Rayleigh-Bénard convection
  34. Disassembly and reassembly
  35. The Effect of Implicit Moral Attitudes on Managerial Decision-Making
  36. Introduction
  37. Creativity in the ‘spaces of hope’
  38. The role of plant biodiversity in modifying the structure and functioning of higher tropic Levels in species-rich forests
  39. TACKLING THE GLOBAL WASTE PROBLEM AS A MULTI-LEVEL PROCESS
  40. Tree diversity promotes functional dissimilarity and maintains functional richness despite species loss in predator assemblages