Survey on English Entity Linking on Wikidata: Datasets and approaches

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Standard

Survey on English Entity Linking on Wikidata: Datasets and approaches. / Möller, Cedric; Lehmann, Jens; Usbeck, Ricardo.
in: Semantic Web, Jahrgang 13, Nr. 6, 26.09.2022, S. 925-966.

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Harvard

APA

Vancouver

Möller C, Lehmann J, Usbeck R. Survey on English Entity Linking on Wikidata: Datasets and approaches. Semantic Web. 2022 Sep 26;13(6):925-966. doi: 10.48550/arxiv.2112.01989, 10.3233/SW-212865

Bibtex

@article{99ec47b319aa4572a6e92c87371d22d5,
title = "Survey on English Entity Linking on Wikidata: Datasets and approaches",
abstract = "Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (2) Do the characteristics of Wikidata matter for the design of Entity Linking datasets and if so, how? (3) How do current Entity Linking approaches exploit the specific characteristics of Wikidata? (4) Which Wikidata characteristics are unexploited by existing Entity Linking approaches? This survey reveals that current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Thus, the potential for multilingual and time-dependent datasets, naturally suited for Wikidata, is not lifted. Furthermore, we show that most Entity Linking approaches use Wikidata in the same way as any other knowledge graph missing the chance to leverage Wikidata-specific characteristics to increase quality. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure. Hence, there is still room for improvement, for example, by including hyper-relational graph embeddings or type information. Many approaches also include information from Wikipedia, which is easily combinable with Wikidata and provides valuable textual information, which Wikidata lacks.",
keywords = "Entity Disambiguation, Entity Linking, Wikidata, Informatics, Business informatics",
author = "Cedric M{\"o}ller and Jens Lehmann and Ricardo Usbeck",
note = "We acknowledge the support of the EU project TAILOR (GA 952215), the Federal Ministry for Economic Affairs and Energy (BMWi) project SPEAKER (FKZ 01MK20011A), the German Federal Ministry of Education and Research (BMBF) projects and excellence clusters ML2R (FKZ 01 15 18038 A/B/C), MLwin (01S18050 D/F), ScaDS.AI (01/S18026A) as well as the Fraunhofer Zukunftsstiftung project JOSEPH. The authors also acknowledge the financial support by the Federal Ministry for Economic Affairs and Energy of Germany in the project CoyPu (project number 01MK21007G). Publisher Copyright: {\textcopyright} 2022 - The authors. Published by IOS Press.",
year = "2022",
month = sep,
day = "26",
doi = "10.48550/arxiv.2112.01989",
language = "English",
volume = "13",
pages = "925--966",
journal = "Semantic Web",
issn = "1570-0844",
publisher = "IOS Press BV",
number = "6",

}

RIS

TY - JOUR

T1 - Survey on English Entity Linking on Wikidata

T2 - Datasets and approaches

AU - Möller, Cedric

AU - Lehmann, Jens

AU - Usbeck, Ricardo

N1 - We acknowledge the support of the EU project TAILOR (GA 952215), the Federal Ministry for Economic Affairs and Energy (BMWi) project SPEAKER (FKZ 01MK20011A), the German Federal Ministry of Education and Research (BMBF) projects and excellence clusters ML2R (FKZ 01 15 18038 A/B/C), MLwin (01S18050 D/F), ScaDS.AI (01/S18026A) as well as the Fraunhofer Zukunftsstiftung project JOSEPH. The authors also acknowledge the financial support by the Federal Ministry for Economic Affairs and Energy of Germany in the project CoyPu (project number 01MK21007G). Publisher Copyright: © 2022 - The authors. Published by IOS Press.

PY - 2022/9/26

Y1 - 2022/9/26

N2 - Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (2) Do the characteristics of Wikidata matter for the design of Entity Linking datasets and if so, how? (3) How do current Entity Linking approaches exploit the specific characteristics of Wikidata? (4) Which Wikidata characteristics are unexploited by existing Entity Linking approaches? This survey reveals that current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Thus, the potential for multilingual and time-dependent datasets, naturally suited for Wikidata, is not lifted. Furthermore, we show that most Entity Linking approaches use Wikidata in the same way as any other knowledge graph missing the chance to leverage Wikidata-specific characteristics to increase quality. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure. Hence, there is still room for improvement, for example, by including hyper-relational graph embeddings or type information. Many approaches also include information from Wikipedia, which is easily combinable with Wikidata and provides valuable textual information, which Wikidata lacks.

AB - Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (2) Do the characteristics of Wikidata matter for the design of Entity Linking datasets and if so, how? (3) How do current Entity Linking approaches exploit the specific characteristics of Wikidata? (4) Which Wikidata characteristics are unexploited by existing Entity Linking approaches? This survey reveals that current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Thus, the potential for multilingual and time-dependent datasets, naturally suited for Wikidata, is not lifted. Furthermore, we show that most Entity Linking approaches use Wikidata in the same way as any other knowledge graph missing the chance to leverage Wikidata-specific characteristics to increase quality. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure. Hence, there is still room for improvement, for example, by including hyper-relational graph embeddings or type information. Many approaches also include information from Wikipedia, which is easily combinable with Wikidata and provides valuable textual information, which Wikidata lacks.

KW - Entity Disambiguation

KW - Entity Linking

KW - Wikidata

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85140850796&partnerID=8YFLogxK

U2 - 10.48550/arxiv.2112.01989

DO - 10.48550/arxiv.2112.01989

M3 - Journal articles

AN - SCOPUS:85140850796

VL - 13

SP - 925

EP - 966

JO - Semantic Web

JF - Semantic Web

SN - 1570-0844

IS - 6

ER -

DOI