Using Wikipedia for Cross-Language Named Entity Recognition

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.

OriginalspracheEnglisch
TitelBig Data Analytics in the Social and Ubiquitous Context : 5th International Workshop on Modeling Social Media, MSM 2014, 5th International Workshop on Mining Ubiquitous and Social Environments, MUSE 2014, and First International Workshop on Machine Learning for Urban Sensor Data, SenseML 2014, Revised Selected Papers
HerausgeberMartin Atzmüller, Alvin Chin, Frederik Janssen, Immanuel Schweizer, Christoph Trattner
Anzahl der Seiten25
VerlagSpringer International Publishing
Erscheinungsdatum2016
Seiten1-25
ISBN (Print)978-3-319-29008-9
ISBN (elektronisch)978-3-319-29009-6
DOIs
PublikationsstatusErschienen - 2016
Veranstaltung 5th International Workshop on Mining Ubiquitous and Social Environments - MUSE 2014 - Nancy, Frankreich
Dauer: 15.09.201415.09.2014
Konferenznummer: 5
https://www.semanticscholar.org/paper/The-Fifth-International-Workshop-on-Mining-and-Qin-Greene/03ed707786c842ce7a36b091457e1452d2723aec
https://www.kde.cs.uni-kassel.de/wp-content/uploads/ws/muse2014/

DOI

Zuletzt angesehen

Publikationen

  1. New method for assessing the repeatability of the measuring system for roughness measurements
  2. How, when and why do negotiators use reference points?
  3. Challenges for biodiversity monitoring using citizen science in transitioning social-ecological systems
  4. Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset
  5. Proxy Indicators for the Quality of Open-domain Dialogues
  6. Unveiling local knowledge
  7. Pathways of Data-driven Business Model Design and Realization
  8. Offline question answering over linked data using limited resources
  9. Geodesign as a boundary management process
  10. Life Cycle Assessment of Consumption Patterns – Understanding the links between changing social practices and environmental impacts
  11. Consequences of extreme weather events for developing countries based on the example of Mongolia
  12. Creating Value from in-Vehicle Data
  13. Operationalization of the concept of sustainable development on different time scales
  14. Performance incentives in activity-based management
  15. The impact of explicit references in computer supported collaborative learning: Evidence from eye movement analyses
  16. Employing A-B tests for optimizing prices levels in e-commerce applications
  17. Integrating teacher and student workspaces in a technology-enhanced mathematics lecture
  18. Multi-view hidden markov perceptrons
  19. Exploring the dark and unexpected sides of digitalization
  20. Tschick
  21. Probabilistic movement models and zones of control
  22. Decision-making models for Robotic Warehouse
  23. One step forward, two steps back
  24. Performance Saga: Interview 06
  25. A PD Fuzzy Control of a Nonholonomic Car-Like Robot for Drive Assistant Systems
  26. Integrating multiple elements of environmental justice into urban blue space planning using public participation geographic information systems
  27. Sustainable use of ecosystem services under multiple risks
  28. Children's interpretation of ambiguous pronouns based on prior discourse
  29. Organizational practices for the aging workforce
  30. Conditionality of EU funds: an instrument to enforce EU fundamental values?
  31. The micro-processes during repatriate knowledge transfer
  32. Utilization of protein-rich residues in biotechnological processes
  33. Pathways to Implementation: Evidence on How Participation in Environmental Governance Impacts on Environmental Outcomes
  34. Quantifying ecosystem services of rewetted peatlands − the MoorFutures methodologies
  35. Learning Analytics
  36. The Role of Assessment and Quality Management in Transformations towards Sustainable Development
  37. To help or not to help an outgroup member
  38. Mathematics-specific motivations for choosing a mathematics teaching degree study programme
  39. Top-down biological motion perception does not differ between adults scoring high versus low on autism traits
  40. Soil carbon sequestration