BENGAL: An automatic benchmark generator for entity recognition and linking

Axel Cyrille Ngoma Ngomo; Michael Röder; Diego Moussallem; Ricardo Usbeck; René Speck

doi:10.18653/v1/W18-6541

BENGAL: An automatic benchmark generator for entity recognition and linking

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Authors

Axel Cyrille Ngoma Ngomo
Michael Röder
Diego Moussallem
Ricardo Usbeck
René Speck

The manual creation of gold standards for named entity recognition and entity linking is time- and resource-intensive. Moreover, recent works show that such gold standards contain a large proportion of mistakes in addition to being difficult to maintain. We hence present BENGAL, a novel automatic generation of such gold standards as a complement to manually created benchmarks. The main advantage of our benchmarks is that they can be readily generated at any time. They are also cost-effective while being guaranteed to be free of annotation errors. We compare the performance of 11 tools on benchmarks in English generated by BENGAL and on 16 benchmarks created manually. We show that our approach can be ported easily across languages by presenting results achieved by 4 tools on both Brazilian Portuguese and Spanish. Overall, our results suggest that our automatic benchmark generation approach can create varied benchmarks that have characteristics similar to those of existing benchmarks. Our approach is open-source. Our experimental results are available at http://faturl.com/bengalexpinlg and the code at https://github.com/dice-group/BENGAL.

Original language	English
Title of host publication	INLG 2018 - 11th International Natural Language Generation Conference, Proceedings of the Conference
Editors	Emiel Krahmer, Albert Gatt, Martijn Goudbeek
Number of pages	11
Publisher	Association for Computational Linguistics (ACL)
Publication date	01.11.2018
Pages	339-349
ISBN (electronic)	9781948087865
DOIs	https://doi.org/10.18653/v1/W18-6541
Publication status	Published - 01.11.2018
Externally published	Yes
Event	11th International Natural Language Generation Conference, INLG 2018 - Tilburg Universität , Tilburg, Netherlands Duration: 05.11.2018 → 08.11.2018 https://inlg2018.uvt.nl/#:~:text=The%2011th%20International%20Conference%20on,organised%20in%20nearby%20Brussels%2C%20Belgium.

Bibliographical note

Horizon 2020 Framework Programme Number: 688227

Publisher Copyright:
©2018 Association for Computational Linguistics

Research areas

Informatics
Business informatics

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Salnikov, M., Sakhovskiy, A., Nikishina, I., Usmanova, A., Kraft, A., Möller, C., Banerjee, D., Huang, J., Jiang, L., Abdullah, R., Yan, X., Tutubalina, E., Usbeck, R. & Panchenko, A., 2026, Natural Language Processing and Information Systems: 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Proceedings. Ichise, R. (ed.). Springer Science and Business Media Deutschland, p. 95-110 16 p. (Lecture Notes in Computer Science; vol. 15836 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

Möller, C. & Usbeck, R., 2025, The Semantic Web: 22nd European Semantic Web Conference, ESWC 2025 Portoroz, Slovenia, June 1–5, 2025 Proceedings, Part I. Curry, E., Acosta, M., Poveda-Villalón, M., van Erp, M., Ojo, A., Hose, K., Shimizu, C. & Lisena, P. (eds.). Cham: Springer Nature Switzerland AG, Vol. 1. p. 460-480 21 p. (Lecture Notes in Computer Science ; vol. 15718).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Automating SPARQL Query Translations between DBpedia and Wikidata

Bartels, M. C., Banerjee, D. & Usbeck, R., 14.07.2025, SEMANTiCS Conference 2025.

Research output: Contributions to collected editions/works › Article in conference proceedings › Research

Bridge-Generate: Scholarly Hybrid Question Answering

Taffa, T. A. & Usbeck, R., 23.05.2025, WWW Companion 2025 - Companion Proceedings of the ACM Web Conference 2025: Companion Proceedings of the ACM Web Conference 2025, April 28-May 2, 2025 Sydney, NSW, Australia. Long, G., Blumestein, M., Chang, Y., Lewin-Eytan, L., Huang, H. & Yom-Tov, E. (eds.). New York: Association for Computing Machinery, Inc, p. 1321-1325 5 p.

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Junior fellows and distinguished dissertation of the GI and AI for crisis

Usbeck, R., Kraft, A. & Westphal, P., 01.02.2025, In: IT - Information Technology. 67, 1, p. 1-2 2 p.

Research output: Journal contributions › Other (editorial matter etc.) › Research

DOI

https://doi.org/10.18653/v1/W18-6541
Final published version

BENGAL: An automatic benchmark generator for entity recognition and linking

Authors

Bibliographical note

Research areas

Other publications by the same author(s)

ShortPathQA: A Dataset for Controllable Fusion of Large Language Models with Knowledge Graphs

Analyzing the Influence of Knowledge Graph Information on Relation Extraction

Automating SPARQL Query Translations between DBpedia and Wikidata

Bridge-Generate: Scholarly Hybrid Question Answering

Junior fellows and distinguished dissertation of the GI and AI for crisis

Links

DOI

Recently viewed

Activities

Publications