Learning from partially annotated sequences

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings
EditorsDimitrios Gunopulos, Thomas Hofmann, Donato Malerba, Michalis Vazirgiannis
Number of pages16
Place of PublicationHeidelberg, Berlin
PublisherSpringer Verlag
Publication date2011
EditionPART 1
Pages407-422
ISBN (print)978-3-642-23779-9
ISBN (electronic)978-3-642-23780-5
DOIs
Publication statusPublished - 2011
Externally publishedYes
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011 - Athen, Greece
Duration: 05.09.201109.09.2011
http://www.ecmlpkdd2011.org/
https://www.ecmlpkdd2011.org/

    Research areas

  • Informatics - Automatically generated, Cross-lingual, Labeled data, Named entity recognition, NAtural language processing, Perceptron, Semi-supervised, Sequential prediction, Hide Markov Model, Unlabeled Data, Neural Information Processing System, Entity Recognition, Annotate Sequence
  • Business informatics

Recently viewed

Publications

  1. Active learning for network intrusion detection
  2. Global Finite-Time Stabilization of Planar Linear Systems With Actuator Saturation
  3. A Lyapunov based PI controller with an anti-windup scheme for a purification process of potable water
  4. Embarrassment as a public vs. private emotion and symbolic coping behaviour
  5. The Creation of the Concept through the Interaction of Philosophy with Science and Art
  6. Strategies of postural control in static and in dynamic testing situations
  7. Design of an Information-Based Distributed Production Planning System
  8. Understanding and Supporting Management Decision-Making
  9. Topic selection and development in learner-native speaker voice-based telecollaborative discourse
  10. Adaptive control of the nonlinear dynamic behavior of the cantilever-sample system of an atomic force microscope
  11. Transductive support vector machines for structured variables
  12. Exploring transition research as transformative science
  13. »HOW TO MAKE YOUR OWN SAMPLES«
  14. Performance of process-based models for simulation of grain N in crop rotations across Europe
  15. Aspect-oriented software development
  16. Learning shortest paths in word graphs
  17. Distributable Modular Software Framework for Manufacturing Systems
  18. Measuring Learning Styles with Questionnaires Versus Direct Observation of Preferential Choice Behavior in Authentic Learning Situations
  19. Oddih
  20. “Ideation is Fine, but Execution is Key”
  21. Towards a spatial understanding of identity play
  22. Resolving the Complexity-Flexibility Dilemma in Multi-Issue Negotiations: Nested Bracketing as a Strategy to Enhance Negotiation Outcomes
  23. Developing a sustainable platform for entity annotation benchmarks
  24. Machine Learning For Determining Planned Order Lead Times In Job Shop Production: A Systematic Review Of Input Factors And Applied Methods
  25. Foreign bias in institutional portfolio allocation
  26. Gaining deep leverage? Reflecting and shaping real-world lab impacts through leverage points
  27. Preventive Diagnostics for cardiovascular diseases based on probabilistic methods and description logic
  28. Action rate models for predicting actions in soccer
  29. Rethink Textile Production - Developing sustainable concepts for textile industry using production simulation