Learning from partially annotated sequences

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings
EditorsDimitrios Gunopulos, Thomas Hofmann, Donato Malerba, Michalis Vazirgiannis
Number of pages16
Place of PublicationHeidelberg, Berlin
PublisherSpringer Verlag
Publication date2011
EditionPART 1
Pages407-422
ISBN (print)978-3-642-23779-9
ISBN (electronic)978-3-642-23780-5
DOIs
Publication statusPublished - 2011
Externally publishedYes
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011 - Athen, Greece
Duration: 05.09.201109.09.2011
http://www.ecmlpkdd2011.org/
https://www.ecmlpkdd2011.org/

    Research areas

  • Informatics - Automatically generated, Cross-lingual, Labeled data, Named entity recognition, NAtural language processing, Perceptron, Semi-supervised, Sequential prediction, Hide Markov Model, Unlabeled Data, Neural Information Processing System, Entity Recognition, Annotate Sequence
  • Business informatics

Recently viewed

Researchers

  1. Marcus Erben

Publications

  1. Other spaces
  2. Does thinking-aloud affect learning, visual information processing and cognitive load when learning with seductive details as expected from self-regulation perspective?
  3. Kalman Filter for Predictive Maintenance and Anomaly Detection
  4. Applications of the Simultaneous Modular Approach in the Field of Material Flow Analysis
  5. Active learning for network intrusion detection
  6. Complexity of traffic scenes and EEG-measures of processing workload in car driving
  7. Material flow during constrained friction processing and its effects on the local properties of AM50 rods
  8. Mirrored piezo servo hydraulic actuators for use in camless combustion engines and its Control with mirrored inputs and MPC
  9. Scale-dependent diversity patterns affect spider assemblages of two contrasting forest ecosystems
  10. An Outcome-Oriented, Social-Ecological Framework for Assessing Protected Area Effectiveness
  11. The Creation of the Concept through the Interaction of Philosophy with Science and Art
  12. From "cracking the orthographic code" to "playing with language"
  13. Project and Design of a Catamaran Prototype with Aerial Propulsion System
  14. Intraspecific trait variation increases species diversity in a trait-based grassland model
  15. Cost effectiveness of guided Internet-based interventions for depression in comparison with control conditions
  16. Studying properties of water data using manifold-aware anomaly detectors
  17. Constraints are the solution, not the problem
  18. Orchestrating distributed data governance in open social innovation
  19. Teaching methods for modelling problems and students’ task-specific enjoyment, value, interest and self-efficacy expectations
  20. Self-regulation in error management training: emotion control and metacognition as mediators of performance effects
  21. Understanding and Supporting Management Decision-Making
  22. Topic selection and development in learner-native speaker voice-based telecollaborative discourse
  23. Effect of gap distortion on the field splitting of collective modes in superfluid He3-B
  24. Estimation and interpretation of a Heckman selection model with endogenous covariates
  25. Adaptive control of the nonlinear dynamic behavior of the cantilever-sample system of an atomic force microscope
  26. Comparing Empirical Methodologies in Pragmatics
  27. Jackson networks in nonautonomous random environments
  28. The generative drawing principle in multimedia learning