Entropy-guided feature generation for structured learning of Portuguese dependency parsing

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Feature generation is a difficult, yet highly necessary, subtask of machine learning modeling. Usually, it is partially solved by a domain expert that generates complex and discriminative feature templates by conjoining the available basic features. This is a limited and expensive way to obtain feature templates and is recognized as a modeling bottleneck. In this work, we propose an automatic method to generate feature templates for structured learning algorithms. The method receives as input the training dataset with basic features and produces a set of feature templates by conjoining basic features that are highly discriminative together. We denote this method entropy guided since it is based on the conditional entropy of local decision variables given the feature values. We illustrate our approach on the Portuguese dependency parsing task and report on experiments with the Bosque corpus. We show that the entropy-guided templates outperform the manually built templates used by MSTParser, which was the best performing system on the Bosque corpus up to now. Furthermore, our approach allows an effortless inclusion of two new basic features that automatically generate additional templates. As a result, our system achieves a per-token accuracy of 92.66%, what represents a reduction by more than 15% on the previous smallest error rate for Portuguese dependency parsing.

Original languageEnglish
Title of host publicationComputational Processing of the Portuguese Language : 10th International Conference, PROPOR 2012, Coimbra, Portugal, April 17-20, 2012. Proceedings
EditorsHelena Caseli, Aline Villavicencio, Antonio Teixeira, Fernando Perdigao
Number of pages11
Place of PublicationBerlin, Heidelberg
PublisherSpringer Verlag
Publication date2012
Pages146-156
ISBN (print)978-3-642-28884-5
ISBN (electronic)978-3-642-28885-2
DOIs
Publication statusPublished - 2012
Externally publishedYes
EventInternational Conference on Computational Processing of Portuguese - Coimbra, Portugal
Duration: 17.04.201220.04.2012
Conference number: 10
https://aclweb.org/portal/content/10th-international-conference-computational-processing-portuguese-propor-2012

Recently viewed

Activities

  1. A New Approach for Optimal Solving of Cyclic and Non-Cyclic Bus Driver Rostering Problems
  2. The Expert in the Loop: Developing a Provenance Linked Open Data Management Platform
  3. Discerning aspects of memory: The ethics of memory in (post)global and transnational contexts
  4. Temporary Organizing and Organizing Trmporality: On the Multilayered Architecture of Accelerators
  5. Is there a threshold effect of time headway on subjective variables for different velocities?
  6. Maximum-Likelihood-Based Panel Cointegration Testing
  7. A conceptual framework on users' digitalisation practices transforming their digital infrastructure for work
  8. Architecture of Computing Systems - ARCS2008
  9. Organizing temporality: A practice perspective on the multilayered architecture of accelerators
  10. Maximum-Likelihood-Based Panel Cointegration Test with Linear Time Trend and Fisher Hypothesis
  11. Learning Shortest Paths for Word Graphs
  12. Note-taking while Working on Mathematical Modelling Tasks
  13. Unit Root & Cointegration Testing Conference 2005
  14. Maximum-Likelihood-Based Panel Cointegration Test with Linear Time Trend
  15. Applied Econometrics with Stata for PhD Students
  16. Spas in the New Länder: A Transformation with an Uncertain Outcome
  17. Is there a threshold effect of time headway on subjective variables for different velocities?
  18. A Mixed Methods Longitudinal Design Study On Learning Results In An Innovative Study Model - First Qualitative Results In HESD
  19. Tilling the fields of knowledge in sustainability-oriented science
  20. Effects of enhanced visual feedback on postural control in static and dynamic conditions.
  21. Coauthoring an interorganizational collaboration: Exploring multi-voicedness and introducing spatiotemporal orientations
  22. Towards an Undercommons (Eco)Logistics?

Publications

  1. Discourse Analyses in Chat-based CSCL with Learning Protocols
  2. N3 - A collection of datasets for named entity recognition and disambiguation in the NLP interchange format
  3. A coding scheme to analyse global text processing in computer supported collaborative learning: What eye movements can tell us
  4. Optimal regulation for dynamic hybrid systems based on dynamic programming in the case of an intelligent vehicle drive assistant
  5. Cross-document coreference resolution using latent features
  6. The Scalable Question Answering Over Linked Data (SQA) Challenge 2018
  7. Emergency detection based on probabilistic modeling in AAL-environments
  8. Applied quality assurance methods under the open source development model
  9. Eliciting Learner Perceptions of Web 2.0 Tasks through Mixed-Methods Classroom Research
  10. Evaluating entity annotators using GERBIL
  11. Modeling of Logistic Processes in Assembly Areas
  12. The role of spatial ability in learning from instructional animations - Evidence for an ability-as-compensator hypothesis
  13. Measuring cognitive load with subjective rating scales during problem solving
  14. A Service-oriented Search framework for full text, geospatial and semantic search
  15. Real-time RDF extraction from unstructured data streams
  16. OKBQA framework towards an open collaboration for development of natural language question-answering systems over knowledge bases
  17. Simulation based optimization of lot sizes for opposing logistic objectives
  18. 7th open challenge on question answering over linked data (QALD-7)
  19. Evaluation of standard ERP software implementation approaches in terms of their capability for business process optimization
  20. Using transition management concepts for the evaluation of intersecting policy domains ('grand challenges')
  21. Dynamic environment modelling and prediction for autonomous systems
  22. Optimising business performance with standard software systems
  23. Learning how to request using textbooks
  24. Concepts
  25. A New Approach for Optimal Solving Cyclic and Non-Cyclic Bus Drvier Rostering Problems
  26. Value Structure and Dimensions
  27. Web-scale extension of RDF knowledge bases from templated websites