Using Wikipedia for Cross-Language Named Entity Recognition

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.

OriginalspracheEnglisch
TitelBig Data Analytics in the Social and Ubiquitous Context : 5th International Workshop on Modeling Social Media, MSM 2014, 5th International Workshop on Mining Ubiquitous and Social Environments, MUSE 2014, and First International Workshop on Machine Learning for Urban Sensor Data, SenseML 2014, Revised Selected Papers
HerausgeberMartin Atzmüller, Alvin Chin, Frederik Janssen, Immanuel Schweizer, Christoph Trattner
Anzahl der Seiten25
VerlagSpringer International Publishing
Erscheinungsdatum2016
Seiten1-25
ISBN (Print)978-3-319-29008-9
ISBN (elektronisch)978-3-319-29009-6
DOIs
PublikationsstatusErschienen - 2016
Veranstaltung 5th International Workshop on Mining Ubiquitous and Social Environments - MUSE 2014 - Nancy, Frankreich
Dauer: 15.09.201415.09.2014
Konferenznummer: 5
https://www.semanticscholar.org/paper/The-Fifth-International-Workshop-on-Mining-and-Qin-Greene/03ed707786c842ce7a36b091457e1452d2723aec
https://www.kde.cs.uni-kassel.de/wp-content/uploads/ws/muse2014/

DOI

Zuletzt angesehen

Publikationen

  1. Are all errors created equal?
  2. Accuracy, latency, and confidence in abstract reasoning: The influence of fear of failure and gender
  3. Improvements in Flexibility depend on Stretching Duration
  4. Why a Systematic Investigation of Production Planning and Control Procedures is Needed for the Target-oriented Configuration of PPC
  5. Developing shaping competence in informal setting at universities
  6. Utilization of protein-rich residues in biotechnological processes
  7. Horizontal, but not vertical canopy structure is related to stand functional diversity in a subtropical slope forest
  8. Introduction
  9. Structuring multiple perspectives in environmental decision-making
  10. Set oriented computation of transport rates in 3-degree of freedom systems
  11. Deactivation and transformation products in biodegradability testing of ß-lactams amoxicillin and piperacillin
  12. Optimum parameters and rate-controlling mechanisms for hot working of extruded Mg-3Sn-1Ca alloy
  13. Portrait of a Thinker
  14. Actor analysis as a tool for exploring the decision-making processes in environmental governance
  15. Are survey expectations theory-consistent?
  16. SH-CoDE: Scholarly Hybrid Complex Question Decomposition and Execution
  17. Children's use of strategies in estimating length and capacity
  18. Entwicklung und realisierung eines computer-basierten lernprogramms zur GMP-schulung/Programm-entwicklung und benutzer-akzeptanz
  19. Use
  20. The multipole resonance probe: Evolution of a plasma sensor
  21. §50 Windenergie auf See
  22. In-House Experimentation Platforms
  23. Sustainable engineering education in research and practice
  24. Tipping points ahead? How laypeople respond to linear versus nonlinear climate change predictions
  25. Which children can find a way through a strange town using a streetmap?-results of an empirical study on children's orientation competence
  26. Impacts of software and its engineering on the carbon footprint of ICT
  27. A Kinetic Approach to the study of Ideal Multipole Resonance Probe
  28. Regionalism and Diffusion Revisited