Learning from partially annotated sequences
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings. ed. / Dimitrios Gunopulos; Thomas Hofmann; Donato Malerba; Michalis Vazirgiannis. PART 1. ed. Heidelberg, Berlin: Springer, 2011. p. 407-422 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6911 LNAI, No. PART 1).
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Learning from partially annotated sequences
AU - Fernandes, Eraldo R.
AU - Brefeld, Ulf
PY - 2011
Y1 - 2011
N2 - We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.
AB - We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.
KW - Informatics
KW - Automatically generated
KW - Cross-lingual
KW - Labeled data
KW - Named entity recognition
KW - NAtural language processing
KW - Perceptron
KW - Semi-supervised
KW - Sequential prediction
KW - Hide Markov Model
KW - Unlabeled Data
KW - Neural Information Processing System
KW - Entity Recognition
KW - Annotate Sequence
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=80052421057&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/047857db-fd1a-3b48-8b27-a1acc478a333/
U2 - 10.1007/978-3-642-23780-5_36
DO - 10.1007/978-3-642-23780-5_36
M3 - Article in conference proceedings
AN - SCOPUS:80052421057
SN - 978-3-642-23779-9
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 407
EP - 422
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2011, Proceedings
A2 - Gunopulos, Dimitrios
A2 - Hofmann, Thomas
A2 - Malerba, Donato
A2 - Vazirgiannis, Michalis
PB - Springer
CY - Heidelberg, Berlin
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - ECML PKDD 2011
Y2 - 5 September 2011 through 9 September 2011
ER -