Using Wikipedia for Cross-Language Named Entity Recognition

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.

OriginalspracheEnglisch
TitelBig Data Analytics in the Social and Ubiquitous Context : 5th International Workshop on Modeling Social Media, MSM 2014, 5th International Workshop on Mining Ubiquitous and Social Environments, MUSE 2014, and First International Workshop on Machine Learning for Urban Sensor Data, SenseML 2014, Revised Selected Papers
HerausgeberMartin Atzmüller, Alvin Chin, Frederik Janssen, Immanuel Schweizer, Christoph Trattner
Anzahl der Seiten25
VerlagSpringer International Publishing
Erscheinungsdatum2016
Seiten1-25
ISBN (Print)978-3-319-29008-9
ISBN (elektronisch)978-3-319-29009-6
DOIs
PublikationsstatusErschienen - 2016
Veranstaltung 5th International Workshop on Mining Ubiquitous and Social Environments - MUSE 2014 - Nancy, Frankreich
Dauer: 15.09.201415.09.2014
Konferenznummer: 5
https://www.semanticscholar.org/paper/The-Fifth-International-Workshop-on-Mining-and-Qin-Greene/03ed707786c842ce7a36b091457e1452d2723aec
https://www.kde.cs.uni-kassel.de/wp-content/uploads/ws/muse2014/

DOI

Zuletzt angesehen

Publikationen

  1. Closed-form Solution for the Direct Kinematics Problem of the Planar 3-RPR Parallel Mechanism
  2. Artificial Intelligence in Foreign Language Learning and Teaching
  3. Outperformed by a Computer? - Comparing Human Decisions to Reinforcement Learning Agents, Assigning Lot Sizes in a Learning Factory
  4. Cascade PID Controllers Applied on Level and Flow Systems in a SMAR Didactic Plant
  5. Academic language features in mathematical modelling tasks raise difficulty in reading comprehension for secondary students
  6. Situated multiplying in primary school
  7. A cognitive mapping approach to understanding public objection to energy infrastructure
  8. Document assignment in multi-site search engines
  9. Detection time analysis of propulsion system fault effects in a hexacopter
  10. Can measurement errors explain variance in the relationship between muscle- and tendon stiffness and range of motion?—a blinded reliability and objectivity study
  11. Integrating the underlying structure of stochasticity into community ecology
  12. Unraveling Privacy Concerns in Complex Data Ecosystems with Architectural Thinking
  13. A new way of assessing the interaction of a metallic phase precursor with a modified oxide support substrate as a source of information for predicting metal dispersion
  14. Public Value: rethinking value creation
  15. Biodiversity-multifunctionality relationships depend on identity and number of measured functions
  16. Structure analysis in an octocopter using piezoelectric sensors and machine learning
  17. A Framework for Applying Natural Language Processing in Digital Health Interventions
  18. On the origin of passive rotation in rotational joints, and how to calculate it
  19. Towards productive functions?
  20. Use of Machine-Learning Algorithms Based on Text, Audio and Video Data in the Prediction of Anxiety and Post-Traumatic Stress in General and Clinical Populations
  21. Methodological support for the selection of simplified equations of state for modeling technical fluids
  22. Spectral Early-Warning Signals for Sudden Changes in Time-Dependent Flow Patterns
  23. Enhancing EFL classroom instruction via the FeedBook: effects on language development and communicative language use.
  24. Interplays between relational and instrumental values
  25. Automated Invoice Processing: Machine Learning-Based Information Extraction for Long Tail Suppliers
  26. How alloying and processing effects can influence the microstructure and mechanical properties of directly extruded thin zinc wires
  27. Value Structure and Dimensions
  28. Conceptual understanding of complex components and Nyquist-Shannon sampling theorem
  29. Nonlinear PD fault-tolerant control for dynamic positioning of ships with actuator constraints