A machine learning approach to Portuguese clause identification

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

In this work, we apply and evaluate a machine-learningbased system to Portuguese clause identification. To the best of our knowledge, this is the first machine-learning-based approach to this task. The proposed system is based on Entropy Guided Transformation Learning. In order to train and evaluate the proposed system, we derive a clause annotated corpus from the Bosque corpus of the Floresta Sint́a(c)tica Project - an European and Brazilian Portuguese treebank. We include part-of-speech (POS) tags to the derived corpus by using an automatic state-of-the-art tagger. Additionally, we use a simple heuristic to derive a phrase-chunk-like (PCL) feature from phrases in the Bosque corpus. We train an extractor to this sub-task and use it to automatically include the PCL feature in the derived clause corpus. We use POS and PCL tags as input features in the proposed clause identifier. This system achieves a Fβ=1 of 73.90, when using the golden values of the PCL feature. When the automatic values are used, the system obtains Fβ=1 = 69.31. These are promising results for a first machine learning approach to Portuguese clause identification. Moreover, these results are achieved using a very simple PCL feature, which is generated by a PCL extractor developed with very little modeling effort.

Original languageEnglish
Title of host publicationComputational Processing of the Portuguese Language : 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings
EditorsThiago Alexandre Salgueiro Pardo, Antonio Branco, Aldebaro Klautau, Renata Viera, Vera Lucia Strube de Lima
Number of pages10
Place of PublicationBerlin, Heidelberg
PublisherSpringer Verlag
Publication date2010
Pages55-64
ISBN (print)3-642-12319-8, 978-3-642-12319-1
ISBN (electronic)978-3-642-12320-7
DOIs
Publication statusPublished - 2010
Externally publishedYes
EventInternational Conference on Computational Processing of the Portuguese Language - Porto Alegre, Brazil
Duration: 27.04.201030.04.2010
Conference number: 9
https://www.inf.pucrs.br/~propor2010/

Recently viewed

Publications

  1. A Dollhouse as a Situational Context for Mulitplicative Reasoning
  2. A college of the Atlantic for Europe
  3. Reflecting on the Roles and Skill Sets of Designers and Design Researchers
  4. Exploring the Poincaré Ellipsis
  5. Sensormikrosysteme zur Überwachung der Atemluftqualität basierend auf Polymer Nanofasern
  6. Methodische Operationalisierung unternehmerischer Nachhaltigkeit
  7. "The Development of the Turtle Carapace" (1989), by Ann Campbell Burke
  8. Engagement in innovation management
  9. “Caught in the Middle! Wealth Inequality and Conflict over Redistribution”
  10. Fragmentarisches Schreiben
  11. Ehrliche Erfindungen
  12. Milieu und Raum
  13. No matter what the name, we’re all the same? Examining ethnic online discrimination in ridesharing marketplaces
  14. The relevance of international restoration principles for ecosystem restoration practice in Rwanda
  15. Model choice and size distribution: a Bayequentist approach
  16. Landscape context influences chytrid fungus distribution in an endangered European amphibian
  17. Heritage, culture and artistic reciprocity
  18. Carbon labelling of grocery products
  19. Clean energy storage technology in the making
  20. Foreign Ownership and the Extensive Margins of Exports
  21. Governing the co-production of nature's contributions to people
  22. Entscheide du
  23. The Aging of the Unions in West Germany, 1980-2006
  24. Weaponising Investments
  25. Corporate Sustainability Accounting: Beyond Unfreezing
  26. Critical assessment of models for transport of engineered nanoparticles in saturated porous media
  27. Protected area management in a post-natural world
  28. Straw amendment and nitrification inhibitor controlling N losses and immobilization in a soil cooling-warming experiment