Biomedical Entity Linking with Triple-aware Pre-Training

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearch

Authors

The large-scale analysis of scientific and technical documents is crucial for extracting structured knowledge from unstructured text. A key challenge in this process is linking biomedical entities, as these entities are sparsely distributed and often underrepresented in the training data of large language models (LLM). At the same time, those LLMs are not aware of high level semantic connection between different biomedical entities, which are useful in identifying similar concepts in different textual contexts. To cope with aforementioned problems, some recent works focused on injecting knowledge graph information into LLMs. However, former methods either ignore the relational knowledge of the entities or lead to catastrophic forgetting. Therefore, we propose a novel framework to pre-train the powerful generative LLM by a corpus synthesized from a KG. In the evaluations we are unable to confirm the benefit of including synonym, description or relational information. This work-in-progress highlights key challenges and invites further discussion on leveraging semantic information for LLm performance and on scientific document processing.
Original languageEnglish
Title of host publicationSemantic Technologies and Deep Learning Models for Scientific, Technical and Legal Data 2025
EditorsRima Dessi, Joy Jeenu, Danilo Dessi, Francesco Osborne, Hidir Aras
Number of pages8
Place of PublicationAachen
PublisherCEUR-WS
Publication date16.06.2025
DOIs
Publication statusPublished - 16.06.2025
EventThird International Workshop on Semantic Technologies and Deep Learning Models for Scientific, Technical and Legal Data - SemTech4STLD 2025 - Portoroz, Slovenia
Duration: 01.06.202501.06.2025
Conference number: 3

    Research areas

  • Entity Linking, cientific data, Deep Learning, Semantic information
  • Informatics

Recently viewed

Researchers

  1. Georg Reischauer

Publications

  1. DISKNET – A Platform for the Systematic Accumulation of Knowledge in IS Research
  2. The frame of the game
  3. Implementing UNESCO's Convention on Cultural Diversity at the regional level
  4. Defining the notion of mining, extraction and collection
  5. Integrating Common Ground and Informativeness in Pragmatic Word Learning
  6. Current issues in competence modeling and assessment
  7. An Approach for Ex-Post-Facto Analysis of Knowledge Graph-Driven Chatbots – The DBpedia Chatbot
  8. Covert and overt automatic imitation are correlated
  9. Back from the Deep
  10. Material flow analysis for the incremental sheet-bulk gearing by rotating tools
  11. Political discourse in the media
  12. "Doing" Sustainability Assessment in Different Consumption and Production Contexts-Lessons from Case Study Comparison
  13. Zapping-Fernbedienung
  14. From Fleeting Enchantment to Embodied Commitment
  15. Pathways and mechanisms for catalyzing social impact through Orchestration: Insights from an open social innovation project
  16. A New, Rapid, Fully Automated Method for Determination of Fluconazole in Serum by Column-Switching Liquid Chromatography
  17. TextCSN
  18. Land use affects dung beetle communities and their ecosystem service in forests and grasslands
  19. New incremental methods for springback compensation by stress superposition
  20. Existenzgründungen junger Handwerksmeister
  21. Same but different? Measurement invariance of the PIAAC motivation-to-learn scale across key socio-demographic groups
  22. Landscape modification and habitat fragmentation: a synthesis
  23. It is not what it is
  24. Newsfeed clutter as an inhibitor of sensemaking
  25. SMARTPHONE APPS FOR TINNITUS: A REVIEW ON INTERVENTION COMPONENTS AND BEHAVIOR CHANGE TECHNIQUES USED IN TINNITUS APPS
  26. y-Randomization and its variants in QSPR/QSAR
  27. Exports and productivity: A survey of the evidence from firm-level data
  28. Mythos
  29. Exports, R&D and Productivity
  30. Sigrid Kopfermann
  31. Effects of oral corrective feedback on the development of complex morphosyntax
  32. Quality and time-related indicators in inceptive plans
  33. Online to offline social networking
  34. Silver Work
  35. Towards a Real-world Laboratory
  36. Sustainable Statehood: Reflections on Critical (Pre-)Conditions, Requirements and Design Options
  37. Do it again
  38. Prologue: Analyzing the Fine Details of Political Commitment
  39. Towards an agri-environment index for biodiversity conservation payment schemes
  40. Online-Beratung für Eltern
  41. Constitutive views on csr communication