Using Wikipedia for Cross-Language Named Entity Recognition

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.

OriginalspracheEnglisch
TitelBig Data Analytics in the Social and Ubiquitous Context : 5th International Workshop on Modeling Social Media, MSM 2014, 5th International Workshop on Mining Ubiquitous and Social Environments, MUSE 2014, and First International Workshop on Machine Learning for Urban Sensor Data, SenseML 2014, Revised Selected Papers
HerausgeberMartin Atzmüller, Alvin Chin, Frederik Janssen, Immanuel Schweizer, Christoph Trattner
Anzahl der Seiten25
VerlagSpringer International Publishing
Erscheinungsdatum2016
Seiten1-25
ISBN (Print)978-3-319-29008-9
ISBN (elektronisch)978-3-319-29009-6
DOIs
PublikationsstatusErschienen - 2016
Veranstaltung 5th International Workshop on Mining Ubiquitous and Social Environments - MUSE 2014 - Nancy, Frankreich
Dauer: 15.09.201415.09.2014
Konferenznummer: 5
https://www.semanticscholar.org/paper/The-Fifth-International-Workshop-on-Mining-and-Qin-Greene/03ed707786c842ce7a36b091457e1452d2723aec
https://www.kde.cs.uni-kassel.de/wp-content/uploads/ws/muse2014/

DOI

Zuletzt angesehen

Publikationen

  1. An Optimal and Stabilising PI Controller with an Anti-windup Scheme for a Purification Process of Potable Water
  2. How, when and why do negotiators use reference points?
  3. Making the most out of timeseries symptom data
  4. Distributable Modular Software Framework for Manufacturing Systems
  5. Are Acute Effects of Foam-Rolling Attributed to Dynamic Warm Up Effects? A Comparative Study
  6. Modernizing persistence–bioaccumulation–toxicity (PBT) assessment with high throughput animal-free methods
  7. On the computation of the warping function and the torsional properties of thin-walled crosssections of prismatic beams
  8. A Graphic Language for Business Application Systems to Improve Communication Concerning Requirements Specification with the User
  9. On Software, or the Persistence of Visual Knowledge.
  10. CHANGING RECREATIONAL ACTIVITIES FOR REDUCING INSOMNIA SEVERITY? RESULTS FROM A SERIAL MEDIATION ANALYSIS ON THE IMPACT OF RECREATIONAL BEHAVIOR AS A MECHANISM OF CHANGE IN DIGITAL INTERVENTIONS FOR INSOMNIA
  11. A geometric approach for the model parameter estimation in a permanent magnet synchronous motor
  12. Time Use Research and Time Use Data
  13. Petri net based EMIS-mappers for flexible manufacturing systems
  14. Contested Promises
  15. Recruitment practices in small and medium size enterprises.
  16. The Use of Anti-Windup Techniques in Didactic Level Systems
  17. Data quality assessment framework for critical raw materials. The case of cobalt
  18. Using Multi-Label Classification for Improved Question Answering
  19. Using a Seminorm for Wavelet Denoising of sEMG Signals for Monitoring during Rehabilitation with Embedded Orthosis System
  20. Managing (in) times of uncertainty
  21. Model-Based Optimization of Spiral Coils for Improving Wireless Power Transfer
  22. Is There a Way Back or Can the Internet Remember its Own History?
  23. Indicators for relational values of nature’s contributions to good quality of life
  24. Cyclooxygenase-2-expression in the outer root sheath of anagen but not telogen hair follicles of the mouse skin
  25. Insights into adoption of farming practices through multiple lenses
  26. Sustainability Science with Ozzy Osbourne, Julia Roberts and Ai Weiwei
  27. Linking trait similarity to interspecific spatial associations in a moist tropical forest
  28. RAWSim-O: A Simulation Framework for Robotic Mobile Fulfillment Systems
  29. Comprehension of climate change and environmental attitudes across the lifespan
  30. Closed-Loop Supply Chain Management - Eine Simulationsstudie
  31. Determination of the antifungal agent posaconazole in human serum by HPLC with parallel column-switching technique
  32. The relationship between acculturation strategies and depressive and anxiety disorders in Turkish migrants in the Netherlands
  33. Design and evaluation of learning processes in an international sustainability oriented study programme. In search of a new educational quality and assessment method
  34. Knowledge acquisition and development in sustainability-oriented small and medium-sized enterprises
  35. Modernisierung und Partizipation
  36. An Analysis of Methane Mitigation as a Response to Climate Change
  37. Timing, fragmentation of work and income inequality
  38. The reception of trust in different legal systems: some lessons for Vietnam; a comparative study
  39. It's Not What You Know, It's How You Use It
  40. Destinationaler Wandel
  41. Dadadatadada: From Dada to Data and Back Again
  42. FRAMEWORK CONDITIONS AND STRATEGIES FOR ATTRACTING YOUNG WOMEN TO ENGINEERING IN TIMES OF DIGITAL AND GLOBAL TRANSFORMATION
  43. Combined experimental–numerical study on residual stresses induced by a single impact as elementary process of mechanical peening
  44. Addendum to L. Lauwers and L. Van Liedekerke, “Ultraproducts and aggregation"