A machine learning approach to Portuguese clause identification

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

In this work, we apply and evaluate a machine-learningbased system to Portuguese clause identification. To the best of our knowledge, this is the first machine-learning-based approach to this task. The proposed system is based on Entropy Guided Transformation Learning. In order to train and evaluate the proposed system, we derive a clause annotated corpus from the Bosque corpus of the Floresta Sint́a(c)tica Project - an European and Brazilian Portuguese treebank. We include part-of-speech (POS) tags to the derived corpus by using an automatic state-of-the-art tagger. Additionally, we use a simple heuristic to derive a phrase-chunk-like (PCL) feature from phrases in the Bosque corpus. We train an extractor to this sub-task and use it to automatically include the PCL feature in the derived clause corpus. We use POS and PCL tags as input features in the proposed clause identifier. This system achieves a Fβ=1 of 73.90, when using the golden values of the PCL feature. When the automatic values are used, the system obtains Fβ=1 = 69.31. These are promising results for a first machine learning approach to Portuguese clause identification. Moreover, these results are achieved using a very simple PCL feature, which is generated by a PCL extractor developed with very little modeling effort.

OriginalspracheEnglisch
TitelComputational Processing of the Portuguese Language : 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings
HerausgeberThiago Alexandre Salgueiro Pardo, Antonio Branco, Aldebaro Klautau, Renata Viera, Vera Lucia Strube de Lima
Anzahl der Seiten10
ErscheinungsortBerlin, Heidelberg
VerlagSpringer
Erscheinungsdatum2010
Seiten55-64
ISBN (Print)3-642-12319-8, 978-3-642-12319-1
ISBN (elektronisch)978-3-642-12320-7
DOIs
PublikationsstatusErschienen - 2010
Extern publiziertJa
VeranstaltungInternational Conference on Computational Processing of the Portuguese Language - Porto Alegre, Brasilien
Dauer: 27.04.201030.04.2010
Konferenznummer: 9
https://www.inf.pucrs.br/~propor2010/

DOI

Zuletzt angesehen

Publikationen

  1. Minimization of answer distortion in personality questionnaires
  2. Simulation of the quench sensitivity of the aluminum alloy 6082
  3. Climate board governance and carbon assurance – European evidence
  4. Capital structure decisions of globally-listed shipping companies
  5. Wirkzusammenhänge innerhalb der Produktionsplanung und -steuerung
  6. Adjustable automation and manoeuvre control in automated driving
  7. An Optimization of Salt Hydrates for Thermochemical Heat Storage
  8. Europäische Klimaschutzziele sind auch ohne Atomkraft erreichbar
  9. Creep and hot working behavior of a new magnesium alloy Mg-3Sn-2Ca
  10. Discrete Lyapunov Controllers for an Actuator in Camless Engines
  11. Can the German Electricity Market Benefit from the EU Enlargement?
  12. Supervised clustering of streaming data for email batch detection
  13. Explaining Disagreement on Interest Rates in a Taylor-Rule Setting
  14. Firm wage premia, industrial relations, and rent sharing in Germany
  15. A localized boundary element method for the floating body problem
  16. Web-scale extension of RDF knowledge bases from templated websites
  17. Evidence on copula-based double-hurdle models with flexible margins
  18. Microstructure and corrosion of AZ91 with small amounts of cerium
  19. Comparing marginal effects between different models and/or samples
  20. The Influence Of Product Reuse On Production Planning and Control
  21. BUSINESS MODELS IN BANKING: A CLUSTER ANALYSIS USING ARCHIVAL DATA
  22. Dynamic control of internal force for visco-elastic contact grasps
  23. Distributable Modular Software Framework for Manufacturing Systems
  24. A welfare analysis of electricity transmission planning in Germany
  25. Das Erlernen digitaler Gesundheitskompetenz im schulischen Kontext
  26. Changeability of pre-service teachers’ beliefs about multilingualism
  27. Towards a Heuristic for Scheduling Offshore Installation Processes
  28. Introducing parametric uncertainty into a nonlinear friction model
  29. Pathways for Germany’s low-carbon energy transformation towards 2050
  30. Simulation of composite hot extrusion with high reinforcing Volumes
  31. Deep Rolling for Tailoring Residual Stresses of AA2024 Sheet Metals
  32. Predicate‐based model of problem‐solving for robotic actions planning
  33. Separating Cognitive and Content Domains in Mathematical Competence
  34. Neuere Ansätze des 'Verstehens' in der 'Historischen Bildungsforschung'
  35. Dynamic Inversion-Enhanced U-Control of Quadrotor Trajectory Tracking
  36. Microstructure, mechanical and corrosion properties of Mg-Gd-Zn alloys
  37. Make it your Break! Benefits of Person-Break Fit for Post-Break Affect
  38. Kompetenzorientiertes Fachwissen von Mathematik-Lehramtsstudierenden