Learning from partially annotated sequences

Eraldo R. Fernandes; Ulf Brefeld

doi:10.1007/978-3-642-23780-5_36

Learning from partially annotated sequences

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Standard

Learning from partially annotated sequences. / Fernandes, Eraldo R.; Brefeld, Ulf.
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings. ed. / Dimitrios Gunopulos; Thomas Hofmann; Donato Malerba; Michalis Vazirgiannis. PART 1. ed. Heidelberg, Berlin: Springer Verlag, 2011. p. 407-422 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6911 LNAI, No. PART 1).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Harvard

Fernandes, ER & Brefeld, U 2011, Learning from partially annotated sequences. in D Gunopulos, T Hofmann, D Malerba & M Vazirgiannis (eds), Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings. PART 1 edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 6911 LNAI, Springer Verlag, Heidelberg, Berlin, pp. 407-422, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011, Athen, Greece, 05.09.11. https://doi.org/10.1007/978-3-642-23780-5_36

APA

Fernandes, E. R., & Brefeld, U. (2011). Learning from partially annotated sequences. In D. Gunopulos, T. Hofmann, D. Malerba, & M. Vazirgiannis (Eds.), Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings (PART 1 ed., pp. 407-422). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6911 LNAI, No. PART 1). Springer Verlag. https://doi.org/10.1007/978-3-642-23780-5_36

Vancouver

Fernandes ER , Brefeld U. Learning from partially annotated sequences. In Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M, editors, Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings. PART 1 ed. Heidelberg, Berlin: Springer Verlag. 2011. p. 407-422. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). doi: 10.1007/978-3-642-23780-5_36

Bibtex

@inbook{0b7856608ef7418f8057a3ab0347cc36,

title = "Learning from partially annotated sequences",

abstract = "We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.",

keywords = "Informatics, Automatically generated, Cross-lingual, Labeled data, Named entity recognition, NAtural language processing, Perceptron, Semi-supervised, Sequential prediction, Hide Markov Model, Unlabeled Data, Neural Information Processing System, Entity Recognition, Annotate Sequence, Business informatics",

author = "Fernandes, {Eraldo R.} and Ulf Brefeld",

year = "2011",

doi = "10.1007/978-3-642-23780-5_36",

language = "English",

isbn = "978-3-642-23779-9",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

number = "PART 1",

pages = "407--422",

editor = "Dimitrios Gunopulos and Thomas Hofmann and Donato Malerba and Michalis Vazirgiannis",

booktitle = "Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings",

address = "Germany",

edition = "PART 1",

note = "European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011, ECML PKDD 2011 ; Conference date: 05-09-2011 Through 09-09-2011",

url = "http://www.ecmlpkdd2011.org/, https://www.ecmlpkdd2011.org/",

}

RIS

TY - CHAP

T1 - Learning from partially annotated sequences

AU - Fernandes, Eraldo R.

AU - Brefeld, Ulf

PY - 2011

Y1 - 2011

N2 - We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

AB - We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

KW - Informatics

KW - Automatically generated

KW - Cross-lingual

KW - Labeled data

KW - Named entity recognition

KW - NAtural language processing

KW - Perceptron

KW - Semi-supervised

KW - Sequential prediction

KW - Hide Markov Model

KW - Unlabeled Data

KW - Neural Information Processing System

KW - Entity Recognition

KW - Annotate Sequence

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=80052421057&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/047857db-fd1a-3b48-8b27-a1acc478a333/

U2 - 10.1007/978-3-642-23780-5_36

DO - 10.1007/978-3-642-23780-5_36

M3 - Article in conference proceedings

AN - SCOPUS:80052421057

SN - 978-3-642-23779-9

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 407

EP - 422

BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings

A2 - Gunopulos, Dimitrios

A2 - Hofmann, Thomas

A2 - Malerba, Donato

A2 - Vazirgiannis, Michalis

PB - Springer Verlag

CY - Heidelberg, Berlin

T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011

Y2 - 5 September 2011 through 9 September 2011

ER -

Other publications by the same author(s)

Interactive sequential generative models for team sports

Fassmeyer, D., Cordes, M. & Brefeld, U., 02.2025, In: Machine Learning. 114, 2, 15 p., 38.

Research output: Journal contributions › Journal articles › Research › peer-review

Joint Item Response Models for Manual and Automatic Scores on Open-Ended Test Items

Bengs, D., Brefeld, U., Kroehne, U. & Zehner, F., 2025, (Accepted/In press) In: Psychometrika.

Research output: Journal contributions › Journal articles › Research › peer-review

Machine Learning and Data Mining for Sports Analytics: 11th International Workshop, MLSA 2024, Vilnius, Lithuania, September 9, 2024, Revised Selected Papers

Brefeld, U. (Editor), Davis, J. (Editor), Van Haaren, J. (Editor) & Zimmermann, A. (Editor), 2025, Cham: Springer Verlag. 119 p. (Communications in Computer and Information Science; vol. 2460)

Research output: Books and anthologies › Conference proceedings › Research

Masked autoencoder for multiagent trajectories

Rudolph, Y. & Brefeld, U., 02.2025, In: Machine Learning. 114, 2, 18 p., 44.

Research output: Journal contributions › Journal articles › Research › peer-review

Self-improvement for Computerized Adaptive Testing

Rudolph, Y., Neubauer, K. & Brefeld, U., 2026, Machine Learning and Knowledge Discovery in Databases - Research Track: European Conference, ECML PKDD 2025, Porto, Portugal, September 15–19, 2025, Proceedings. Ribeiro, R. P., Jorge, A. M., Soares, C., Gama, J., Pfahringer, B., Japkowicz, N., Larrañaga, P. & Abreu, P. H. (eds.). Cham: Springer International Publishing, Vol. 2. p. 70-86 17 p. (Lecture Notes in Computer Science; vol. 16014 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

DOI

https://doi.org/10.1007/978-3-642-23780-5_36
Final published version

Learning from partially annotated sequences

Standard

Harvard

APA

Vancouver

Bibtex

RIS

Other publications by the same author(s)

Interactive sequential generative models for team sports

Joint Item Response Models for Manual and Automatic Scores on Open-Ended Test Items

Machine Learning and Data Mining for Sports Analytics: 11th International Workshop, MLSA 2024, Vilnius, Lithuania, September 9, 2024, Revised Selected Papers

Masked autoencoder for multiagent trajectories

Self-improvement for Computerized Adaptive Testing

DOI

Recently viewed

Activities

Publications