Learning from partially annotated sequences

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings
EditorsDimitrios Gunopulos, Thomas Hofmann, Donato Malerba, Michalis Vazirgiannis
Number of pages16
Place of PublicationHeidelberg, Berlin
PublisherSpringer Verlag
Publication date2011
EditionPART 1
Pages407-422
ISBN (print)978-3-642-23779-9
ISBN (electronic)978-3-642-23780-5
DOIs
Publication statusPublished - 2011
Externally publishedYes
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011 - Athen, Greece
Duration: 05.09.201109.09.2011
http://www.ecmlpkdd2011.org/
https://www.ecmlpkdd2011.org/

    Research areas

  • Informatics - Automatically generated, Cross-lingual, Labeled data, Named entity recognition, NAtural language processing, Perceptron, Semi-supervised, Sequential prediction, Hide Markov Model, Unlabeled Data, Neural Information Processing System, Entity Recognition, Annotate Sequence
  • Business informatics

Recently viewed

Publications

  1. Other spaces
  2. Machine Learning Applications
  3. The Creation of the Concept through the Interaction of Philosophy with Science and Art
  4. Do guided internet-based interventions result in clinically relevant changes for patients with depression?
  5. Networking for the environment
  6. Design of an Information-Based Distributed Production Planning System
  7. Teaching methods for modelling problems and students’ task-specific enjoyment, value, interest and self-efficacy expectations
  8. Topic selection and development in learner-native speaker voice-based telecollaborative discourse
  9. Adaptive control of the nonlinear dynamic behavior of the cantilever-sample system of an atomic force microscope
  10. Estimation and interpretation of a Heckman selection model with endogenous covariates
  11. The buffering effect of selection, optimization, and compensation strategy use on the relationship between problem solving demands and occupational well-being
  12. Holistic and scalable ranking of RDF data
  13. Polar Coordinates and Interactive Learning
  14. Effects Of Different Order Processing Strategies On Operating Curves Of Logistic Models
  15. Meta-Image – a collaborative environment for the image discourse
  16. Recontextualizing Anthropomorphic Metaphors in Organization Studies
  17. Switching Dispatching Rules with Gaussian Processes
  18. Global fern and lycophyte richness explained: How regional and local factors shape plot richness
  19. Clause identification using entropy guided transformation learning
  20. ℓp-norm multiple kernel learning
  21. Individual Differences in Infants' Speech Segmentation Performance
  22. HAWK - hybrid question answering using linked data
  23. Extending talk on a prescribed discussion topic in a learner-native speaker eTandem learning task
  24. Knowledge-Enhanced Language Models Are Not Bias-Proof

Press / Media

  1. Wieder gefragt