Entropy-guided feature generation for structured learning of Portuguese dependency parsing

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Feature generation is a difficult, yet highly necessary, subtask of machine learning modeling. Usually, it is partially solved by a domain expert that generates complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to obtain feature templates and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. The method receives as input the training dataset with basic features and produces a set of feature templates by conjoining basic features that are highly discriminative together. We denote this method entropy guided since it is based on the conditional entropy of local decision variables given the feature values. We illustrate our approach on the Portuguese dependency parsing task and report on experiments with the Bosque corpus. We show that the entropy-guided templates outperform the manually built templates used by MSTParser, which was the best performing system on the Bosque corpus up to now. Furthermore, our approach allows an effortless inclusion of two new basic features that automatically generate additional templates. As a result, our system achieves a per-token accuracy of 92.66%, what represents a reduction by more than 15% on the previous smallest error rate for Portuguese dependency parsing.

Original languageEnglish
Title of host publicationComputational Processing of the Portuguese Language : 10th International Conference, PROPOR 2012, Coimbra, Portugal, April 17-20, 2012. Proceedings
EditorsHelena Caseli, Aline Villavicencio, Antonio Teixeira, Fernando Perdigao
Number of pages11
Place of PublicationBerlin, Heidelberg
PublisherSpringer Verlag
Publication date2012
Pages146-156
ISBN (print)978-3-642-28884-5
ISBN (electronic)978-3-642-28885-2
DOIs
Publication statusPublished - 2012
Externally publishedYes
EventInternational Conference on Computational Processing of Portuguese - Coimbra, Portugal
Duration: 17.04.201220.04.2012
Conference number: 10
https://aclweb.org/portal/content/10th-international-conference-computational-processing-portuguese-propor-2012

Recently viewed

Researchers

  1. Marcus Erben

Publications

  1. Building a process layer for business applications using the blackboard pattern
  2. Evaluating entity annotators using GERBIL
  3. Evaluating the construct validity of Objective Personality Tests using a multitrait-multimethod-Multioccasion-(MTMM-MO)-approach
  4. Constructions and Reconstructions. The Architectural Image between Rendering and Photography
  5. What does it mean to be sensitive for the complexity of (problem oriented) teaching?
  6. Development and validation of a method for the determination of trace alkylphenols and phthalates in the atmosphere
  7. Sensitivity to complexity - an important prerequisite of problem solving mathematics teaching
  8. From "cracking the orthographic code" to "playing with language"
  9. A Review of Latent Variable Modeling Using R - A Step-by-Step-Guide
  10. Binary Random Nets II
  11. The impact of linguistic complexity on the solution of mathematical modelling tasks
  12. Survey on challenges of Question Answering in the Semantic Web
  13. Visual Frames – Framing Visuals
  14. Dynamically adjusting the k-values of the ATCS rule in a flexible flow shop scenario with reinforcement learning
  15. Quality Assurance Methods and the Open Source Model
  16. Kalman Filter for Predictive Maintenance and Anomaly Detection
  17. Educational reconstruction as model for the theory-based design of student-centered learning environments in electrical engineering courses
  18. Developing a sustainable platform for entity annotation benchmarks
  19. Failing and the perception of failure in student-driven transdisciplinary projects
  20. What role for frames in scalar conflicts?
  21. An analytical approach to evaluating monotonic functions of fuzzy numbers
  22. Input-Output Linearization of a Thermoelectric Cooler for an Ice Clamping System Using a Dual Extended Kalman Filter
  23. Automatic three-dimensional geometry and mesh generation of periodic representative volume elements for matrix-inclusion composites
  24. Contextualizing certification and auditing
  25. Topic selection and development in learner-native speaker voice-based telecollaborative discourse
  26. Effects of an expressive writing intervention (EWI) with women treated for breast cancer explored with recurrence quantification analysis (RQA) of changes in text structure - a proof-of-concept study
  27. Conceptualizing protected area research in a transdisciplinary
  28. Multiobjective optimal control of fluid mixing
  29. Logistic Operating Curves in Theory and Practice
  30. Robust approximate fixed-time tracking control for uncertain robot manipulators
  31. Hydrograph analysis and basef low separation
  32. Deconstructing the Theoretical Language of Process Research
  33. An introductional lecture on chaotic systems through Lorenz attractor and forced Lotka Volterra equation for interdisciplinary education
  34. Introduction to the challenges and chances regarding the utilization of nitrogen-rich by-products and waste streams
  35. Reliability and Validity of Assessing User Satisfaction With Web-Based Health Interventions
  36. Knowledge integration