Learning from partially annotated sequences

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings
EditorsDimitrios Gunopulos, Thomas Hofmann, Donato Malerba, Michalis Vazirgiannis
Number of pages16
Place of PublicationHeidelberg, Berlin
PublisherSpringer Verlag
Publication date2011
EditionPART 1
Pages407-422
ISBN (print)978-3-642-23779-9
ISBN (electronic)978-3-642-23780-5
DOIs
Publication statusPublished - 2011
Externally publishedYes
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011 - Athen, Greece
Duration: 05.09.201109.09.2011
http://www.ecmlpkdd2011.org/
https://www.ecmlpkdd2011.org/

    Research areas

  • Informatics - Automatically generated, Cross-lingual, Labeled data, Named entity recognition, NAtural language processing, Perceptron, Semi-supervised, Sequential prediction, Hide Markov Model, Unlabeled Data, Neural Information Processing System, Entity Recognition, Annotate Sequence
  • Business informatics

Recently viewed

Publications

  1. Using a CRIS to reduce workload and increase quality for research reporting and university marketing
  2. Advisory systems in pluralistic knowledge societies:
  3. Modeling Individual Differences in Children’s Information Integration During Pragmatic Word Learning
  4. Development and validation of a U.S. and German short version of the Later Life Workplace Index (LLWI-S)
  5. Computer-based Adaptive Speed Tests
  6. Belowground top-down and aboveground bottom-up effects structure multitrophic community relationships in a biodiverse forest
  7. A Decision Support System for Crew Rostering in Public Transit
  8. The role of gestures in a teacher-student-discourse about atoms
  9. Temporal order judgments
  10. Grounding language performance in the anticipatory dynamics of the body
  11. Light treatment of a complex problem
  12. Sustainable development indicators
  13. Case Study
  14. Basics Kooperativer Rhetorik im Studium
  15. Thermomechanical characterization of Portevin-Le Châtelier bands in AlMg3 (AA5754) and modeling based on a modified Estrin-McCormick approach
  16. Digital Natives - eine Generation des Übergangs
  17. Block matrix based LU decomposition to analyze kinetic damping in active plasma resonance spectroscopy
  18. Regulatory focus and thinking about the future versus reality.
  19. Patient-reported outcomes in rehabilitation research
  20. Exploring the “works with nature” pillar of food sovereignty
  21. Participation and Effective Environmental Governance
  22. On the Importance of a Motivational Agency Variable
  23. Green technology innovation
  24. Extensive Margins of Imports and Profitability
  25. Tatarenblut und Immertreu
  26. Orientierung im Realraum
  27. Qu’est-ce que la « marge d’indétermination »?
  28. A longitudinal analysis of the knowledge and application of sustainability management tools in large German companies