N3 - A collection of datasets for named entity recognition and disambiguation in the NLP interchange format
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Authors
Extracting Linked Data following the Semantic Web principle from unstructured sources has become a key challenge for scientific research. Named Entity Recognition and Disambiguation are two basic operations in this extraction process. One step towards the realization of the Semantic Web vision and the development of highly accurate tools is the availability of data for validating the quality of processes for Named Entity Recognition and Disambiguation as well as for algorithm tuning. This article presents three novel, manually curated and annotated corpora (N3). All of them are based on a free license and stored in the NLP Interchange Format to leverage the Linked Data character of our datasets.
Original language | English |
---|---|
Title of host publication | Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 |
Editors | Nicoletta Calzolari, Khalid Choukri, Sara Goggi, Thierry Declerck, Joseph Mariani, Bente Maegaard, Asuncion Moreno, Jan Odijk, Helene Mazo, Stelios Piperidis, Hrafn Loftsson |
Number of pages | 5 |
Place of Publication | Reykjavik, Iceland |
Publisher | European Language Resources Association (ELRA) |
Publication date | 05.2014 |
Pages | 3529-3533 |
ISBN (electronic) | 9782951740884 |
Publication status | Published - 05.2014 |
Externally published | Yes |
Event | 9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland Duration: 26.05.2014 → 31.05.2014 Conference number: 9 http://www.lrec-conf.org/proceedings/lrec2014/index.html |
Bibliographical note
We thank Luise Erfurth and Didier Cherix for helping us creating annotations of
the datasets and Jens Lehmann for his feedback. A special thanks goes to news.de for allowing us to use their articles. Parts of this work were supported by the ESF and
the Free State of Saxony.
ACL materials are Copyright © 1963–2023
- Datasets, Named entity detection, Named entity disambiguation, NLP interchange format
- Informatics
- Business informatics