Using Wikipedia for Cross-Language Named Entity Recognition

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.

Original languageEnglish
Title of host publicationBig Data Analytics in the Social and Ubiquitous Context : 5th International Workshop on Modeling Social Media, MSM 2014, 5th International Workshop on Mining Ubiquitous and Social Environments, MUSE 2014, and First International Workshop on Machine Learning for Urban Sensor Data, SenseML 2014, Revised Selected Papers
EditorsMartin Atzmüller, Alvin Chin, Frederik Janssen, Immanuel Schweizer, Christoph Trattner
Number of pages25
PublisherSpringer International Publishing
Publication date2016
Pages1-25
ISBN (print)978-3-319-29008-9
ISBN (electronic)978-3-319-29009-6
DOIs
Publication statusPublished - 2016
Event 5th International Workshop on Mining Ubiquitous and Social Environments - MUSE 2014 - Nancy, France
Duration: 15.09.201415.09.2014
Conference number: 5
https://www.semanticscholar.org/paper/The-Fifth-International-Workshop-on-Mining-and-Qin-Greene/03ed707786c842ce7a36b091457e1452d2723aec
https://www.kde.cs.uni-kassel.de/wp-content/uploads/ws/muse2014/

    Research areas

  • Business informatics - Hide Markov Model, Target Language, Conditional Random Field, Source Language, Entitiy Recognition

Recently viewed

Activities

  1. How, when, and why do negotiators use reference points? A qualitative interview study with negotiation experts.
  2. The Domestication Approach Revisited in the Context of Digitization, Mobilization and Mediatization
  3. Preliminary selection of experimental techniques in Subtask D
  4. The Value Knowledge Grid - a new way of diagnosing the Culturally Non-Copyables: Building Blocks for Diagnostics
  5. Closing Session: Summary Notes
  6. International Conference on Methods and Models in Automation an Robotics - MMAR 2016
  7. Everything flows – identification and characterization of coherent patterns
  8. Teaching the machine how to assess grammar skills. Modelling verb-tense exercise characteristics as a basis for an adaptive E-learning system
  9. A Framework for Text Analytics in Online Interventions
  10. Deterministic and Stochastic Models for Rota Scheduling in Public Bus Transport
  11. Performing Contracts: BDSM, Performance Art and Aesthetics of Complexity
  12. The semantics of transformation: conceptual work based on Freirean methodology.
  13. On the relational structure of two tests measuring general pedagogical knowledge
  14. Employer Longevity Readiness Index Workshop: Session 2: How do you build a longevity readiness Index?
  15. Bridges or blocks: How objects cross or enact boundaries in interorganizational teams
  16. Explaining the learning progress in mathematics of retained students and low-achieving students
  17. Dynamical systems methods in fluid mechanics
  18. Intelligent software system for replacing a force sensor in the case of clearance measurement

Publications

  1. Control versus Complexity
  2. Selecting and Adapting Methods for Analysis and Design in Value-Sensitive Digital Social Innovation Projects: Toward Design Principles
  3. Convolutional Neural Networks
  4. Integrating the underlying structure of stochasticity into community ecology
  5. Cognitive load and instructionally supported learning with provided and learner-generated visualizations
  6. Globally asymptotic output feedback tracking of robot manipulators with actuator constraints
  7. Constructions and Reconstructions. The Architectural Image between Rendering and Photography
  8. Is too much help an obstacle? Effects of interactivity and cognitive style on learning with dynamic versus non-dynamic visualizations with narrative explanations
  9. Soft Optimal Computing Methods to Identify Surface Roughness in Manufacturing Using a Monotonic Regressor
  10. A Review of the Application of Machine Learning and Data Mining Approaches in Continuum Materials Mechanics
  11. Analysis And Comparison Of Dispatching RuleBased Scheduling In Dual-Resource Constrained Shop-Floor Scenarios
  12. A simple nonlinear PD control for faster and high-precision positioning of servomechanisms with actuator saturation
  13. How does Enterprise Architecture support the Design and Realization of Data-Driven Business Models?
  14. Faulty Process Detection Using Machine Learning Techniques
  15. A Lean Convolutional Neural Network for Vehicle Classification
  16. Sensitivity to complexity - an important prerequisite of problem solving mathematics teaching
  17. A reference architecture for the integration of EMIS and ERP-Systems
  18. Supporting the Development and Implementation of a Digitalization Strategy in SMEs through a Lightweight Architecture-based Method
  19. Positioning Improvement for a Laser Scanning System using cSORPD control
  20. Generating Energy Optimal Powertrain Force Trajectories with Dynamic Constraints
  21. FaST: A linear time stack trace alignment heuristic for crash report deduplication
  22. What does it mean to be sensitive for the complexity of (problem oriented) teaching?
  23. TARGET SETTING FOR OPERATIONAL PERFORMANCE IMPROVEMENTS - STUDY CASE -
  24. Mapping interest rate projections using neural networks under cointegration
  25. Users’ handedness and performance when controlling integrated input devices
  26. Supporting discourse in a synchronous learning environment