Learning from partially annotated sequences

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings
EditorsDimitrios Gunopulos, Thomas Hofmann, Donato Malerba, Michalis Vazirgiannis
Number of pages16
Place of PublicationHeidelberg, Berlin
PublisherSpringer Verlag
Publication date2011
EditionPART 1
Pages407-422
ISBN (print)978-3-642-23779-9
ISBN (electronic)978-3-642-23780-5
DOIs
Publication statusPublished - 2011
Externally publishedYes
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011 - Athen, Greece
Duration: 05.09.201109.09.2011
http://www.ecmlpkdd2011.org/
https://www.ecmlpkdd2011.org/

    Research areas

  • Informatics - Automatically generated, Cross-lingual, Labeled data, Named entity recognition, NAtural language processing, Perceptron, Semi-supervised, Sequential prediction, Hide Markov Model, Unlabeled Data, Neural Information Processing System, Entity Recognition, Annotate Sequence
  • Business informatics

Recently viewed

Activities

  1. Einführung in das Asylrecht
  2. Analysing video-taped interactions in fantasy role-playing games
  3. SoepCampus - 2010
  4. Scientists for Future Workshop - 2020
  5. Language Learning in Digital Projects - The Going Green Project
  6. On Race & Ecocide: Thinking of New Forms of Environmentality and Eco-Colonialism
  7. Explaining the performance of participatory and collaborative governance in addressing long-term environmental policy issues
  8. Potentiale entdecken - Qualität sichern!
  9. Digital Transformation and Digital Business
  10. Future as a Result of Evolution and Planning
  11. IdeenExpo 2011
  12. Green IT & IT-for-Green - 2010
  13. Die Dokumentarische Methode
  14. The concept of a sustainable use of biocidal active substances – applied to rodenticides
  15. Association for Information Systems (AIS) (Externe Organisation)
  16. Gerechtigkeit und Transformation. Eine Tagung in Tutzing
  17. UV photodegradation of trimipramine under different environmental variables and chemical nature of aqueous solution - biodegradation and LC-MSn characterization of the formed transformation products
  18. College (Organisation)
  19. Ecological restoration as a tool out of multiple crisis - examples from grassland restoration
  20. Teachers‘ Approaches and Attitudes towards Academic Language Support: Differences in Natural and Social Science Classrooms
  21. Imaginaries of Disconnection
  22. European Conference on Information Systems 2023 (Veranstaltung)
  23. Making students becoming responsible citizens in 21st century by fostering mathematical reasoning about real-world applications
  24. Interdisziplinarität in kuratorischen Netzwerken
  25. International Seminar On “Solid State Materials Processing”
  26. Machbarkeitsstudie zu einem Visitor Welcome Center auf Usedom