Entropy-guided feature generation for structured learning of Portuguese dependency parsing
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
Computational Processing of the Portuguese Language: 10th International Conference, PROPOR 2012, Coimbra, Portugal, April 17-20, 2012. Proceedings. ed. / Helena Caseli; Aline Villavicencio; Antonio Teixeira; Fernando Perdigao. Berlin, Heidelberg: Springer, 2012. p. 146-156 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7243 LNAI).
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Entropy-guided feature generation for structured learning of Portuguese dependency parsing
AU - Fernandes, Eraldo R.
AU - Milidiú, Ruy L.
N1 - Conference code: 10
PY - 2012
Y1 - 2012
N2 - Feature generation is a difficult, yet highly necessary, subtask of machine learning modeling. Usually, it is partially solved by a domain expert that generates complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to obtain feature templates and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. The method receives as input the training dataset with basic features and produces a set of feature templates by conjoining basic features that are highly discriminative together. We denote this method entropy guided since it is based on the conditional entropy of local decision variables given the feature values. We illustrate our approach on the Portuguese dependency parsing task and report on experiments with the Bosque corpus. We show that the entropy-guided templates outperform the manually built templates used by MSTParser, which was the best performing system on the Bosque corpus up to now. Furthermore, our approach allows an effortless inclusion of two new basic features that automatically generate additional templates. As a result, our system achieves a per-token accuracy of 92.66%, what represents a reduction by more than 15% on the previous smallest error rate for Portuguese dependency parsing.
AB - Feature generation is a difficult, yet highly necessary, subtask of machine learning modeling. Usually, it is partially solved by a domain expert that generates complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to obtain feature templates and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. The method receives as input the training dataset with basic features and produces a set of feature templates by conjoining basic features that are highly discriminative together. We denote this method entropy guided since it is based on the conditional entropy of local decision variables given the feature values. We illustrate our approach on the Portuguese dependency parsing task and report on experiments with the Bosque corpus. We show that the entropy-guided templates outperform the manually built templates used by MSTParser, which was the best performing system on the Bosque corpus up to now. Furthermore, our approach allows an effortless inclusion of two new basic features that automatically generate additional templates. As a result, our system achieves a per-token accuracy of 92.66%, what represents a reduction by more than 15% on the previous smallest error rate for Portuguese dependency parsing.
KW - dependency parsing
KW - entropy-guided feature generation
KW - machine learning
KW - structured learning
KW - Informatics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=84858599304&partnerID=8YFLogxK
UR - https://d-nb.info/1019948167
U2 - 10.1007/978-3-642-28885-2_17
DO - 10.1007/978-3-642-28885-2_17
M3 - Article in conference proceedings
AN - SCOPUS:84858599304
SN - 978-3-642-28884-5
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 146
EP - 156
BT - Computational Processing of the Portuguese Language
A2 - Caseli, Helena
A2 - Villavicencio, Aline
A2 - Teixeira, Antonio
A2 - Perdigao, Fernando
PB - Springer
CY - Berlin, Heidelberg
T2 - International Conference on Computational Processing of Portuguese
Y2 - 17 April 2012 through 20 April 2012
ER -