Biomedical Entity Linking with Triple-aware Pre-Training

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearch

Standard

Biomedical Entity Linking with Triple-aware Pre-Training. / Yan, Xi; Möller, Cedric; Usbeck, Ricardo.

Conference XXX. 2023.

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearch

Harvard

APA

Yan, X., Möller, C., & Usbeck, R. (2023). Biomedical Entity Linking with Triple-aware Pre-Training. Manuscript in preparation. In Conference XXX https://doi.org/10.48550/arXiv.2308.14429

Vancouver

Yan X, Möller C, Usbeck R. Biomedical Entity Linking with Triple-aware Pre-Training. In Conference XXX. 2023 doi: 10.48550/arXiv.2308.14429

Bibtex

@inbook{70dae52eff184dcab552ec79040de9c7,
title = "Biomedical Entity Linking with Triple-aware Pre-Training",
abstract = " Linking biomedical entities is an essential aspect in biomedical natural language processing tasks, such as text mining and question answering. However, a difficulty of linking the biomedical entities using current large language models (LLM) trained on a general corpus is that biomedical entities are scarcely distributed in texts and therefore have been rarely seen during training by the LLM. At the same time, those LLMs are not aware of high level semantic connection between different biomedical entities, which are useful in identifying similar concepts in different textual contexts. To cope with aforementioned problems, some recent works focused on injecting knowledge graph information into LLMs. However, former methods either ignore the relational knowledge of the entities or lead to catastrophic forgetting. Therefore, we propose a novel framework to pre-train the powerful generative LLM by a corpus synthesized from a KG. In the evaluations we are unable to confirm the benefit of including synonym, description or relational information. ",
keywords = "cs.CL, cs.AI, Informatics",
author = "Xi Yan and Cedric M{\"o}ller and Ricardo Usbeck",
year = "2023",
month = aug,
day = "28",
doi = "10.48550/arXiv.2308.14429",
language = "English",
booktitle = "Conference XXX",

}

RIS

TY - CHAP

T1 - Biomedical Entity Linking with Triple-aware Pre-Training

AU - Yan, Xi

AU - Möller, Cedric

AU - Usbeck, Ricardo

PY - 2023/8/28

Y1 - 2023/8/28

N2 - Linking biomedical entities is an essential aspect in biomedical natural language processing tasks, such as text mining and question answering. However, a difficulty of linking the biomedical entities using current large language models (LLM) trained on a general corpus is that biomedical entities are scarcely distributed in texts and therefore have been rarely seen during training by the LLM. At the same time, those LLMs are not aware of high level semantic connection between different biomedical entities, which are useful in identifying similar concepts in different textual contexts. To cope with aforementioned problems, some recent works focused on injecting knowledge graph information into LLMs. However, former methods either ignore the relational knowledge of the entities or lead to catastrophic forgetting. Therefore, we propose a novel framework to pre-train the powerful generative LLM by a corpus synthesized from a KG. In the evaluations we are unable to confirm the benefit of including synonym, description or relational information.

AB - Linking biomedical entities is an essential aspect in biomedical natural language processing tasks, such as text mining and question answering. However, a difficulty of linking the biomedical entities using current large language models (LLM) trained on a general corpus is that biomedical entities are scarcely distributed in texts and therefore have been rarely seen during training by the LLM. At the same time, those LLMs are not aware of high level semantic connection between different biomedical entities, which are useful in identifying similar concepts in different textual contexts. To cope with aforementioned problems, some recent works focused on injecting knowledge graph information into LLMs. However, former methods either ignore the relational knowledge of the entities or lead to catastrophic forgetting. Therefore, we propose a novel framework to pre-train the powerful generative LLM by a corpus synthesized from a KG. In the evaluations we are unable to confirm the benefit of including synonym, description or relational information.

KW - cs.CL

KW - cs.AI

KW - Informatics

U2 - 10.48550/arXiv.2308.14429

DO - 10.48550/arXiv.2308.14429

M3 - Article in conference proceedings

BT - Conference XXX

ER -