N3 - A collection of datasets for named entity recognition and disambiguation in the NLP interchange format

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

  • Michael Röder
  • Ricardo Usbeck
  • Sebastian Hellmann
  • Daniel Gerber
  • Andreas Both

Extracting Linked Data following the Semantic Web principle from unstructured sources has become a key challenge for scientific research. Named Entity Recognition and Disambiguation are two basic operations in this extraction process. One step towards the realization of the Semantic Web vision and the development of highly accurate tools is the availability of data for validating the quality of processes for Named Entity Recognition and Disambiguation as well as for algorithm tuning. This article presents three novel, manually curated and annotated corpora (N3). All of them are based on a free license and stored in the NLP Interchange Format to leverage the Linked Data character of our datasets.

Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
EditorsNicoletta Calzolari, Khalid Choukri, Sara Goggi, Thierry Declerck, Joseph Mariani, Bente Maegaard, Asuncion Moreno, Jan Odijk, Helene Mazo, Stelios Piperidis, Hrafn Loftsson
Number of pages5
Place of PublicationReykjavik, Iceland
PublisherEuropean Language Resources Association (ELRA)
Publication date05.2014
Pages3529-3533
ISBN (electronic)9782951740884
Publication statusPublished - 05.2014
Externally publishedYes
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: 26.05.201431.05.2014
Conference number: 9
http://www.lrec-conf.org/proceedings/lrec2014/index.html

Bibliographical note

We thank Luise Erfurth and Didier Cherix for helping us creating annotations of
the datasets and Jens Lehmann for his feedback. A special thanks goes to news.de for allowing us to use their articles. Parts of this work were supported by the ESF and
the Free State of Saxony.

ACL materials are Copyright © 1963–2023

Links

Recently viewed

Publications

  1. The Scalable Question Answering Over Linked Data (SQA) Challenge 2018
  2. Evaluating entity annotators using GERBIL
  3. Real-time RDF extraction from unstructured data streams
  4. A Service-oriented Search framework for full text, geospatial and semantic search
  5. 7th open challenge on question answering over linked data (QALD-7)
  6. An expert-based reference list of variables for characterizing and monitoring social-ecological systems
  7. AGDISTIS - Graph-based disambiguation of named entities using linked data
  8. OKBQA framework towards an open collaboration for development of natural language question-answering systems over knowledge bases
  9. Holistic and scalable ranking of RDF data
  10. HAWK - hybrid question answering using linked data
  11. ASSESS — automatic self-assessment using linked data
  12. GENESIS - A generic RDF data access interface
  13. Treating dialogue quality evaluation as an anomaly detection problem
  14. Semantic Answer Type and Relation Prediction Task (SMART 2021)
  15. Towards an open question answering architecture
  16. Offline question answering over linked data using limited resources
  17. GERBIL - General entity annotator benchmarking framework
  18. Mathematical relation between extended connectivity and eigenvector coefficients.
  19. 8th challenge on question answering over linked data (QALD-8)
  20. Entity linking in 40 languages using MAG
  21. On the distinctiveness of tags in collaborative tagging systems
  22. Developing a sustainable platform for entity annotation benchmarks
  23. Proceedings of the 7th Natural Language Interfaces for the Web of Data (NLIWoD)
  24. German Utilities and distributed PV
  25. Enhancing Community Interactions with Data-Driven Chatbots - The DBpedia Chatbot
  26. German Utilities and Distributed PV
  27. Analyzing Talk and Text II: Thematic Analysis
  28. Canopy leaf traits, basal area, and age predict functional patterns of regenerating communities in secondary subtropical forests
  29. Investigating quality raters' performance using interface evaluation methods
  30. NIF4OGGD - NLP interchange format for open German governmental data
  31. CETUS – a baseline approach to type extraction
  32. Question answering over linked data
  33. Support from the Internet for Individuals with Mental Disorders