Learning from partially annotated sequences

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings
EditorsDimitrios Gunopulos, Thomas Hofmann, Donato Malerba, Michalis Vazirgiannis
Number of pages16
Place of PublicationHeidelberg, Berlin
PublisherSpringer Verlag
Publication date2011
EditionPART 1
Pages407-422
ISBN (print)978-3-642-23779-9
ISBN (electronic)978-3-642-23780-5
DOIs
Publication statusPublished - 2011
Externally publishedYes
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011 - Athen, Greece
Duration: 05.09.201109.09.2011
http://www.ecmlpkdd2011.org/
https://www.ecmlpkdd2011.org/

    Research areas

  • Informatics - Automatically generated, Cross-lingual, Labeled data, Named entity recognition, NAtural language processing, Perceptron, Semi-supervised, Sequential prediction, Hide Markov Model, Unlabeled Data, Neural Information Processing System, Entity Recognition, Annotate Sequence
  • Business informatics

Recently viewed

Publications

  1. The generative drawing principle in multimedia learning
  2. From Knowledge to Application
  3. Bifactor Models for Predicting Criteria by General and Specific Factors
  4. On the Appropriate Methodologies for Data Science Projects
  5. Constructing strangeness
  6. Optimizing price levels in e-commerce applications
  7. Calibration of a simple method for determining ammonia loss in the field
  8. Using smart services as a key enabler for collaboration in global production networks
  9. Schreibt Ihr Unternehmen auch "grüne" Zahlen?
  10. Guest editorial
  11. Adaptor device for transmitting e.g. blood pressure data of human body from blood pressure measuring device of data communication system to e.g. personal computer, has controller for controlling transmission of data to communication module
  12. Dialectical conditions
  13. Panel Cointegration Testing in the Presence of a Time Trend
  14. Tree species and functional traits but not species richness affect interrill erosion processes in young subtropical forests
  15. Connected Text Reading and Differences in Text Reading Fluency in Adult Readers
  16. Biocultural approaches to pollinator conservation
  17. Market and network corruption
  18. Development and Validation of the Short Form of the Later Life Workplace Index (LLWI-SF)
  19. Global trait–environment relationships of plant communities
  20. Gemachter oder gelebter Tourismus?
  21. The research potential of new types of enterprise data based on surveys from official statistics in Germany
  22. The impact of goal specificity and goal type on learning outcome and cognitive load
  23. Matching between oral inward–outward movements of object names and oral movements associated with denoted objects
  24. How to Assess Knowledge Cumulation in Environmental Governance Research? Conceptual and Empirical Explorations
  25. Diverse values and a common utopia
  26. The Structure and Behavioural Effects of Revealed Social Identity Preferences
  27. Situated Institutions: The Role of Place, Space and Embeddedness in Institutional Dynamics
  28. Discourses for deep transformation
  29. Impact of prescribed burning on the nutrient balance of heathlands with particular reference to nitrogen and phosphorus
  30. From verbal complexity to student success
  31. Brief / Briefformular