Entropy-guided feature generation for structured learning of Portuguese dependency parsing

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Feature generation is a difficult, yet highly necessary, subtask of machine learning modeling. Usually, it is partially solved by a domain expert that generates complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to obtain feature templates and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. The method receives as input the training dataset with basic features and produces a set of feature templates by conjoining basic features that are highly discriminative together. We denote this method entropy guided since it is based on the conditional entropy of local decision variables given the feature values. We illustrate our approach on the Portuguese dependency parsing task and report on experiments with the Bosque corpus. We show that the entropy-guided templates outperform the manually built templates used by MSTParser, which was the best performing system on the Bosque corpus up to now. Furthermore, our approach allows an effortless inclusion of two new basic features that automatically generate additional templates. As a result, our system achieves a per-token accuracy of 92.66%, what represents a reduction by more than 15% on the previous smallest error rate for Portuguese dependency parsing.

Original languageEnglish
Title of host publicationComputational Processing of the Portuguese Language : 10th International Conference, PROPOR 2012, Coimbra, Portugal, April 17-20, 2012. Proceedings
EditorsHelena Caseli, Aline Villavicencio, Antonio Teixeira, Fernando Perdigao
Number of pages11
Place of PublicationBerlin, Heidelberg
PublisherSpringer Verlag
Publication date2012
Pages146-156
ISBN (print)978-3-642-28884-5
ISBN (electronic)978-3-642-28885-2
DOIs
Publication statusPublished - 2012
Externally publishedYes
EventInternational Conference on Computational Processing of Portuguese - Coimbra, Portugal
Duration: 17.04.201220.04.2012
Conference number: 10
https://aclweb.org/portal/content/10th-international-conference-computational-processing-portuguese-propor-2012

Recently viewed

Publications

  1. The Use of Genetic Algorithm for PID Controller Auto-Tuning in ARM CORTEX M4 Platform
  2. Evaluation of Time/Phase Parameters in Frequency Measurements for Inertial Navigation Systems
  3. A Multilevel CFA-MTMM Model for Nested Structurally Different Methods
  4. Using the flatness of DC-Drives to emulate a generator for a decoupled MPC using a geometric approach for motion control in Robotino
  5. TextGraphs 2024 Shared Task on Text-Graph Representations for Knowledge Graph Question Answering
  6. A Review of Latent Variable Modeling Using R - A Step-by-Step-Guide
  7. Global temporal typing patterns in foreign language writing
  8. Efficient Order Picking Methods in Robotic Mobile Fulfillment Systems
  9. OKBQA framework towards an open collaboration for development of natural language question-answering systems over knowledge bases
  10. Machine Learning and Knowledge Discovery in Databases
  11. Finding Creativity in Predictability: Seizing Kairos in Chronos Through Temporal Work in Complex Innovation Processes
  12. Mechanical performance prediction for friction riveting joints of dissimilar materials via machine learning
  13. Intersection tests for the cointegrating rank in dependent panel data
  14. Volume of Imbalance Container Prediction using Kalman Filter and Long Short-Term Memory
  15. A guided simulated annealing search for solving the pick-up and delivery problem with time windows and capacity constraints
  16. Foundations and applications of computer based material flow networks for einvironmental management
  17. From pre-processing to advanced dynamic modeling of pupil data
  18. The Replication Database: Documenting the Replicability of Psychological Science
  19. Training effects of two different unstable shoe constructions on postural control in static and dynamic testing situations
  20. Artificial intelligence
  21. Monitoring of microbially mediated corrosion and scaling processes using redox potential measurements
  22. Mimicking and anticipating others’ actions is linked to social information processing
  23. The erosion of relational values resulting from landscape simplification
  24. Metaphors and Paradigms of the Language Animal—or—The Advantage of seeing “Time Is a Resource” as a Paradigm
  25. Construct- and criterion-related validity of the German Core Self-Evaluations Scale
  26. Facing complexity through informed simplifications
  27. Special Issue The Discourse of Redundancy Introduction
  28. “Circuits of Commons”: Exploring the Connections Between Economic Lives and the Commons
  29. Soil conditions modify species diversity effects on tree functional trait expression
  30. Modelling, explaining, enacting and getting feedback: How can the acquisition of core practices in teacher education be optimally fostered?
  31. Creep behavior of AE42 based hybrid composites
  32. Developing a Complex Portrait of Content Teaching for Multilingual Learners via Nonlinear Theoretical Understandings
  33. An Overview of Electro Hydraulic Full Variable Valve Train Systems to Reduce Emissions in Internal Combustion Engines
  34. Reciprocal Relationships Between Dispositional Optimism and Work Experiences
  35. How to support teachers to give feedback to modelling tasks effectively? Results from a teacher-training-study in the Co²CA project
  36. Introduction
  37. Visual Detection of Traffic Incident through Automatic Monitoring of Vehicle Activities
  38. Semiparametric one-step estimation of a sample selection model with endogenous covariates
  39. More than a YouTube Channel
  40. How generative drawing affects the learning process
  41. Missing links
  42. Application of design of experiments for laser shock peening process optimization
  43. On the Difficulty of Forgetting
  44. A slow-fast trait continuum at the whole community level in relation to land-use intensification
  45. Measurement in Machine Vision Editorial Paper
  46. Hacking the Classroom
  47. Knowledge Spaces of Globalization
  48. Relevance of the Basset history term for Lagrangian particle dynamics