Learning from partially annotated sequences

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Learning from partially annotated sequences. / Fernandes, Eraldo R.; Brefeld, Ulf.

Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings. ed. / Dimitrios Gunopulos; Thomas Hofmann; Donato Malerba; Michalis Vazirgiannis. PART 1. ed. Heidelberg, Berlin : Springer, 2011. p. 407-422 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6911 LNAI, No. PART 1).

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Fernandes, ER & Brefeld, U 2011, Learning from partially annotated sequences. in D Gunopulos, T Hofmann, D Malerba & M Vazirgiannis (eds), Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings. PART 1 edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 6911 LNAI, Springer, Heidelberg, Berlin, pp. 407-422, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011, Athen, Greece, 05.09.11. https://doi.org/10.1007/978-3-642-23780-5_36

APA

Fernandes, E. R., & Brefeld, U. (2011). Learning from partially annotated sequences. In D. Gunopulos, T. Hofmann, D. Malerba, & M. Vazirgiannis (Eds.), Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings (PART 1 ed., pp. 407-422). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6911 LNAI, No. PART 1). Springer. https://doi.org/10.1007/978-3-642-23780-5_36

Vancouver

Fernandes ER, Brefeld U. Learning from partially annotated sequences. In Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M, editors, Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings. PART 1 ed. Heidelberg, Berlin: Springer. 2011. p. 407-422. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). doi: 10.1007/978-3-642-23780-5_36

Bibtex

@inbook{0b7856608ef7418f8057a3ab0347cc36,
title = "Learning from partially annotated sequences",
abstract = "We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.",
keywords = "Informatics, Automatically generated, Cross-lingual, Labeled data, Named entity recognition, NAtural language processing, Perceptron, Semi-supervised, Sequential prediction, Hide Markov Model, Unlabeled Data, Neural Information Processing System, Entity Recognition, Annotate Sequence, Business informatics",
author = "Fernandes, {Eraldo R.} and Ulf Brefeld",
year = "2011",
doi = "10.1007/978-3-642-23780-5_36",
language = "English",
isbn = "978-3-642-23779-9",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
number = "PART 1",
pages = "407--422",
editor = "Dimitrios Gunopulos and Thomas Hofmann and Donato Malerba and Michalis Vazirgiannis",
booktitle = "Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings",
address = "Germany",
edition = "PART 1",
note = "European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011, ECML PKDD 2011 ; Conference date: 05-09-2011 Through 09-09-2011",
url = "http://www.ecmlpkdd2011.org/, https://www.ecmlpkdd2011.org/",

}

RIS

TY - CHAP

T1 - Learning from partially annotated sequences

AU - Fernandes, Eraldo R.

AU - Brefeld, Ulf

PY - 2011

Y1 - 2011

N2 - We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

AB - We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

KW - Informatics

KW - Automatically generated

KW - Cross-lingual

KW - Labeled data

KW - Named entity recognition

KW - NAtural language processing

KW - Perceptron

KW - Semi-supervised

KW - Sequential prediction

KW - Hide Markov Model

KW - Unlabeled Data

KW - Neural Information Processing System

KW - Entity Recognition

KW - Annotate Sequence

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=80052421057&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-23780-5_36

DO - 10.1007/978-3-642-23780-5_36

M3 - Article in conference proceedings

AN - SCOPUS:80052421057

SN - 978-3-642-23779-9

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 407

EP - 422

BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings

A2 - Gunopulos, Dimitrios

A2 - Hofmann, Thomas

A2 - Malerba, Donato

A2 - Vazirgiannis, Michalis

PB - Springer

CY - Heidelberg, Berlin

T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011

Y2 - 5 September 2011 through 9 September 2011

ER -