Entropy-guided feature generation for structured learning of Portuguese dependency parsing
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Authors
Feature generation is a difficult, yet highly necessary, subtask of machine learning modeling. Usually, it is partially solved by a domain expert that generates complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to obtain feature templates and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. The method receives as input the training dataset with basic features and produces a set of feature templates by conjoining basic features that are highly discriminative together. We denote this method entropy guided since it is based on the conditional entropy of local decision variables given the feature values. We illustrate our approach on the Portuguese dependency parsing task and report on experiments with the Bosque corpus. We show that the entropy-guided templates outperform the manually built templates used by MSTParser, which was the best performing system on the Bosque corpus up to now. Furthermore, our approach allows an effortless inclusion of two new basic features that automatically generate additional templates. As a result, our system achieves a per-token accuracy of 92.66%, what represents a reduction by more than 15% on the previous smallest error rate for Portuguese dependency parsing.
Originalsprache | Englisch |
---|---|
Titel | Computational Processing of the Portuguese Language : 10th International Conference, PROPOR 2012, Coimbra, Portugal, April 17-20, 2012. Proceedings |
Herausgeber | Helena Caseli, Aline Villavicencio, Antonio Teixeira, Fernando Perdigao |
Anzahl der Seiten | 11 |
Erscheinungsort | Berlin, Heidelberg |
Verlag | Springer |
Erscheinungsdatum | 2012 |
Seiten | 146-156 |
ISBN (Print) | 978-3-642-28884-5 |
ISBN (elektronisch) | 978-3-642-28885-2 |
DOIs | |
Publikationsstatus | Erschienen - 2012 |
Extern publiziert | Ja |
Veranstaltung | International Conference on Computational Processing of Portuguese - Coimbra, Portugal Dauer: 17.04.2012 → 20.04.2012 Konferenznummer: 10 https://aclweb.org/portal/content/10th-international-conference-computational-processing-portuguese-propor-2012 |
- Informatik
- Wirtschaftsinformatik