Learning from partially annotated sequences

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings
EditorsDimitrios Gunopulos, Thomas Hofmann, Donato Malerba, Michalis Vazirgiannis
Number of pages16
Place of PublicationHeidelberg, Berlin
PublisherSpringer Verlag
Publication date2011
EditionPART 1
Pages407-422
ISBN (print)978-3-642-23779-9
ISBN (electronic)978-3-642-23780-5
DOIs
Publication statusPublished - 2011
Externally publishedYes
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011 - Athen, Greece
Duration: 05.09.201109.09.2011
http://www.ecmlpkdd2011.org/
https://www.ecmlpkdd2011.org/

    Research areas

  • Informatics - Automatically generated, Cross-lingual, Labeled data, Named entity recognition, NAtural language processing, Perceptron, Semi-supervised, Sequential prediction, Hide Markov Model, Unlabeled Data, Neural Information Processing System, Entity Recognition, Annotate Sequence
  • Business informatics

Recently viewed

Publications

  1. Soft Optimal Computing to Identify Surface Roughness in Manufacturing Using a Gaussian and a Trigonometric Regressor
  2. Teaching Sustainable Development in a Sensory and Artful Way — Concepts, Methods, and Examples
  3. A Decoupled MPC for Motion Control in Robotino Using a Geometric Approach
  4. Optimization of waste management systems by integrating material fluxes, agents and regulatory mechanisms
  5. Learning to change universities from within
  6. rSOESGOPE Method Applied to Four-Tank System Modeling
  7. Comparison of three methods of length compensation in a parallel kinematic and their equivalence conditions
  8. Microstructure refinement by a novel friction-based processing on Mg-Zn-Ca alloy
  9. Achieving enhanced mechanical properties in Mg-Gd-Y-Zn-Mn alloy by altering dynamic recrystallization behavior via pre-ageing treatment
  10. A Preregistered Test of Competing Theories to Explain Ego Depletion Effects Using Psychophysiological Indicators of Mental Effort
  11. Competition between honey bees and wild bees and the role of nesting resources in a nature reserve
  12. Development and criterion validity of differentiated and elevated vocational interests in adolescence
  13. Labelling Sustainable Software Products and Websites
  14. Comparison of an Electrochemical and Luminescence-Based Oxygen Measuring System for Use in the Biodegradability Testing According to Closed Bottle Test (OECD 301D)
  15. Changes in the Work Situation and Attitudes in East-germany After the Introduction of Capitalism
  16. Effects on the (CSR) Reputation