Learning from partially annotated sequences

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings
EditorsDimitrios Gunopulos, Thomas Hofmann, Donato Malerba, Michalis Vazirgiannis
Number of pages16
Place of PublicationHeidelberg, Berlin
PublisherSpringer Verlag
Publication date2011
EditionPART 1
Pages407-422
ISBN (print)978-3-642-23779-9
ISBN (electronic)978-3-642-23780-5
DOIs
Publication statusPublished - 2011
Externally publishedYes
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011 - Athen, Greece
Duration: 05.09.201109.09.2011
http://www.ecmlpkdd2011.org/
https://www.ecmlpkdd2011.org/

    Research areas

  • Informatics - Automatically generated, Cross-lingual, Labeled data, Named entity recognition, NAtural language processing, Perceptron, Semi-supervised, Sequential prediction, Hide Markov Model, Unlabeled Data, Neural Information Processing System, Entity Recognition, Annotate Sequence
  • Business informatics

Recently viewed

Publications

  1. Active learning for network intrusion detection
  2. Complexity of traffic scenes and EEG-measures of processing workload in car driving
  3. Mirrored piezo servo hydraulic actuators for use in camless combustion engines and its Control with mirrored inputs and MPC
  4. Simple saturated PID control for fast transient of motion systems
  5. A Lyapunov based PI controller with an anti-windup scheme for a purification process of potable water
  6. Embarrassment as a public vs. private emotion and symbolic coping behaviour
  7. The Creation of the Concept through the Interaction of Philosophy with Science and Art
  8. From "cracking the orthographic code" to "playing with language"
  9. Strategies of postural control in static and in dynamic testing situations
  10. Cost effectiveness of guided Internet-based interventions for depression in comparison with control conditions
  11. Design of an Information-Based Distributed Production Planning System
  12. Operations Systems of Container Terminals
  13. Sensor concept for solving the direct kinematics problem of the Stewart-Gough platform
  14. Topic selection and development in learner-native speaker voice-based telecollaborative discourse
  15. Adaptive control of the nonlinear dynamic behavior of the cantilever-sample system of an atomic force microscope
  16. Explaining and controlling for the psychometric properties of computer-generated figural matrix items
  17. »HOW TO MAKE YOUR OWN SAMPLES«
  18. Aspect-oriented software development
  19. The buffering effect of selection, optimization, and compensation strategy use on the relationship between problem solving demands and occupational well-being
  20. Measuring Learning Styles with Questionnaires Versus Direct Observation of Preferential Choice Behavior in Authentic Learning Situations
  21. Oddih
  22. Performance of process-based models for simulation of grain N in crop rotations across Europe
  23. “Ideation is Fine, but Execution is Key”
  24. Understanding storytelling in the context of information systems
  25. Towards a spatial understanding of identity play