A machine learning approach to Portuguese clause identification

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

In this work, we apply and evaluate a machine-learningbased system to Portuguese clause identification. To the best of our knowledge, this is the first machine-learning-based approach to this task. The proposed system is based on Entropy Guided Transformation Learning. In order to train and evaluate the proposed system, we derive a clause annotated corpus from the Bosque corpus of the Floresta Sint́a(c)tica Project - an European and Brazilian Portuguese treebank. We include part-of-speech (POS) tags to the derived corpus by using an automatic state-of-the-art tagger. Additionally, we use a simple heuristic to derive a phrase-chunk-like (PCL) feature from phrases in the Bosque corpus. We train an extractor to this sub-task and use it to automatically include the PCL feature in the derived clause corpus. We use POS and PCL tags as input features in the proposed clause identifier. This system achieves a Fβ=1 of 73.90, when using the golden values of the PCL feature. When the automatic values are used, the system obtains Fβ=1 = 69.31. These are promising results for a first machine learning approach to Portuguese clause identification. Moreover, these results are achieved using a very simple PCL feature, which is generated by a PCL extractor developed with very little modeling effort.

OriginalspracheEnglisch
TitelComputational Processing of the Portuguese Language : 9th International Conference, PROPOR 2010, Porto Alegre, RS, Brazil, April 27-30, 2010. Proceedings
HerausgeberThiago Alexandre Salgueiro Pardo, Antonio Branco, Aldebaro Klautau, Renata Viera, Vera Lucia Strube de Lima
Anzahl der Seiten10
ErscheinungsortBerlin, Heidelberg
VerlagSpringer Verlag
Erscheinungsdatum2010
Seiten55-64
ISBN (Print)3-642-12319-8, 978-3-642-12319-1
ISBN (elektronisch)978-3-642-12320-7
DOIs
PublikationsstatusErschienen - 2010
Extern publiziertJa
VeranstaltungInternational Conference on Computational Processing of the Portuguese Language - Porto Alegre, Brasilien
Dauer: 27.04.201030.04.2010
Konferenznummer: 9
https://www.inf.pucrs.br/~propor2010/

DOI

Zuletzt angesehen

Publikationen

  1. Participation in Residential Childcare.
  2. "Versuch macht klug", Teil 3
  3. Cross-cultural generalization
  4. Architekturästhetische Stadtwahrnehmung als eine Ästhetik der kulturellen Differenz
  5. Architektur Campus Leuphana
  6. Kommentierung des § 110 VwGO (Teilurteil)
  7. "Global competence" - der neue Fokusbereich in PISA 2018. Ein holpriger Start
  8. Exporter performance in the German business services sector
  9. The Myth of Deconsolidation
  10. Stauch, Marc: The Law of Medical Negligence in England and Germany. A Comparative Analysis. (Zugl.: Göttingen, Univ., Diss., 2007.) – Oxford and Portland, Ore.: Hart 2008
  11. Peter Sloterdijk (1947b)
  12. Vom Öko-Controlling zum Controlling nachhaltiger Wertschöpfungsketten
  13. I modelli di sviluppo imprenditoriale delle imprese locali
  14. Complexities and Nuances in Radical Right Voters' (Anti)Feminism
  15. Ästhetikkolumne
  16. Von Kunst aus
  17. Central bank announcements
  18. Media Genealogy
  19. Market Valuation of Biodiversity Risk:
  20. Organizational Downsizing
  21. At the mercy of prior entry
  22. Längsschnittdaten und Mehrebenenanalyse
  23. Limitations of Protected Areas Zoning in Mediterranean Cultural Landscapes Under the Ecosystem Services Approach
  24. Aufmerksamkeitsprozesse im dreidimensionalen Raum
  25. Ecological-economic viability as a criterion of strong sustainability under uncertainty
  26. The Role of Extensive Margins of Exports in The Great Export Recovery in Germany, 2009/2010
  27. Does pictorial composition guide the eye? Investigating four centuries of last supper pictures
  28. Free Ensembles and Small (Chamber) Orchestras as Innovative Drivers of Classical Music in Germany
  29. Mittsommerfeuer
  30. Die berufliche Endstufe erlangt - und nun?
  31. Jugend und Alkohol