Entropy-guided feature generation for structured learning of Portuguese dependency parsing

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Feature generation is a difficult, yet highly necessary, subtask of machine learning modeling. Usually, it is partially solved by a domain expert that generates complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to obtain feature templates and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. The method receives as input the training dataset with basic features and produces a set of feature templates by conjoining basic features that are highly discriminative together. We denote this method entropy guided since it is based on the conditional entropy of local decision variables given the feature values. We illustrate our approach on the Portuguese dependency parsing task and report on experiments with the Bosque corpus. We show that the entropy-guided templates outperform the manually built templates used by MSTParser, which was the best performing system on the Bosque corpus up to now. Furthermore, our approach allows an effortless inclusion of two new basic features that automatically generate additional templates. As a result, our system achieves a per-token accuracy of 92.66%, what represents a reduction by more than 15% on the previous smallest error rate for Portuguese dependency parsing.

Original languageEnglish
Title of host publicationComputational Processing of the Portuguese Language : 10th International Conference, PROPOR 2012, Coimbra, Portugal, April 17-20, 2012. Proceedings
EditorsHelena Caseli, Aline Villavicencio, Antonio Teixeira, Fernando Perdigao
Number of pages11
Place of PublicationBerlin, Heidelberg
PublisherSpringer Verlag
Publication date2012
Pages146-156
ISBN (print)978-3-642-28884-5
ISBN (electronic)978-3-642-28885-2
DOIs
Publication statusPublished - 2012
Externally publishedYes
EventInternational Conference on Computational Processing of Portuguese - Coimbra, Portugal
Duration: 17.04.201220.04.2012
Conference number: 10
https://aclweb.org/portal/content/10th-international-conference-computational-processing-portuguese-propor-2012

Recently viewed

Publications

  1. Using Natural Language Processing Techniques to Tackle the Construct Identity Problem in Information Systems Research
  2. The Use of Genetic Algorithm for PID Controller Auto-Tuning in ARM CORTEX M4 Platform
  3. Evolutionary generation of dispatching rule sets for complex dynamic scheduling problems
  4. Modelling tasks—The relation between linguistic skills, intra-mathematical skills, and context-related prior knowledge
  5. Algebraic combinatorics in mathematical chemistry. Methods and algorithms. I. Permutation groups and coherent (cellular) algebras.
  6. Ant colony optimization algorithm and artificial immune system applied to a robot route
  7. Development of a Didactic Graphical Simulation Interface on MATLAB for Systems Control
  8. Proceedings of the SeMantic Answer Type and Relation Prediction Task at ISWC 2021 Semantic Web Challenge (SMART2021)
  9. Using Decision Trees and Reinforcement Learning for the Dynamic Adjustment of Composite Sequencing Rules in a Flexible Manufacturing System
  10. Modelling the Complexity of Measurement Estimation Situations - A Theoretical Framework for the Estimation of Lengths
  11. Building a process layer for business applications using the blackboard pattern
  12. Emergency detection based on probabilistic modeling in AAL environments
  13. Control of a Sun Tracking Robot Based on Adaptive Sliding Mode Control with Kalman Filtering and Model Predictive Control
  14. A Multilevel CFA-MTMM Model for Nested Structurally Different Methods
  15. Introducing a multivariate model for predicting driving performance
  16. Identification of structure-biodegradability relationships for ionic liquids - clustering of a dataset based on structural similarity
  17. Age-related differences in processing visual device and task characteristics when using technical devices
  18. Implicit statistical learning and working memory predict EFL development and written task outcomes in adolescents
  19. Entropy-guided feature generation for structured learning of Portuguese dependency parsing
  20. Using the flatness of DC-Drives to emulate a generator for a decoupled MPC using a geometric approach for motion control in Robotino
  21. A discrete-time fractional order PI controller for a three phase synchronous motor using an optimal loop shaping approach
  22. Globally asymptotic output feedback tracking of robot manipulators with actuator constraints
  23. Construct Objectification and De-Objectification in Organization Theory
  24. A model predictive control in Robotino and its implementation using ROS system
  25. Long-term memory predictors of adult language learning at the interface between syntactic form and meaning
  26. Comparing the performance of computational estimation methods for physicochemical properties of dimethylsiloxanes and selected siloxanols
  27. Human–learning–machines: introduction to a special section on how cybernetics and constructivism inspired new forms of learning
  28. A change of values is in the air
  29. Integrating errors into the training process