Portuguese part-of-speech tagging with large margin structure learning

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Part-of-Speech Tagging is a fundamental task on many Natural Language Processing systems. This task consists in identifying the syntactic category, i.e. the part of speech, of each word in a sentence. Despite the fact that the current state-of-the-art accuracy for this task is around 97%, any improvement has an immediate impact on more complex tasks, like Parsing, Semantic Role Labeling and Information Extraction. Thus, it is still relevant to explore this task. In this paper, we introduce a part-of-speech tagger based on the Structure Learning framework that reduces the smallest known error on the Portuguese Mac-Morpho corpus by 7.8%. We also apply our tagger to a recently revised version of Mac-Morpho. Our system accuracy on this latter version is competitive with a semi-supervised Neural Network trained on Mac-Morpho plus a very large non-annotated corpus. Additionally, our system is simpler than previous systems and uses a very limited feature set. Our system employs a Large Margin training criteria to derive a structure predictor that is more robust on unseen data.

Original languageEnglish
Title of host publicationBRACIS 2014 : 2014 Brazilian Conference on Intelligent Systems ; 19-23 October 2014, São Carlos, São Paulo, Brazil ; proceedings
Number of pages6
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers Inc.
Publication date12.12.2014
Pages25-30
Article number6984802
ISBN (print)978-1-4799-7859-5
ISBN (electronic)978-1-4799-5618-0
DOIs
Publication statusPublished - 12.12.2014
Externally publishedYes
EventBrazilian Conference on Intelligent Systems - BRACIS 2014 - Sao Carlos, Sao Paulo, Brazil
Duration: 18.10.201423.10.2014
Conference number: 3
https://ieeexplore.ieee.org/xpl/conhome/6979382/proceeding

DOI

Recently viewed

Publications

  1. New method for assessing the repeatability of the measuring system for roughness measurements
  2. Lessons from modeling 100% renewable scenarios using GENeSYS-MOD
  3. A Geometric Approach by Using Switching and Flatness Based Control in Electromechanical Actuators for Linear Motion
  4. Hypertexts in context
  5. Vertical Dynamics Description and its Control in the Presence of Nonlinear Friction
  6. The principle of unjust enrichment
  7. Competence models for assessing individual learning outcomes and evaluating educational processes - a priority program of the German research foundation (DFG)
  8. Implementation of formative assessment
  9. Import and export of ideas
  10. Lost-customers approximation of semi-open queueing networks with backordering
  11. Connected Text Reading and Differences in Text Reading Fluency in Adult Readers
  12. Anisotropic wavelet bases and restricted nonlinear approximation
  13. A toolkit for robust risk assessment using F-divergences
  14. Emotion Perception in Human-Robot Interaction
  15. Embodiment of Science in Science Slams.
  16. Learning linear classifiers sensitive to example dependent and noisy costs
  17. The role of the situation model in mathematical modelling
  18. How numeric advice precision affects advice taking
  19. Identifying core habitat before it's too late
  20. Bright Spots for Local WFD Implementation Through Collaboration with Nature Conservation Authorities?
  21. 'KNOW WHY' thinking as a new approach to systems thinking
  22. Does symbolic representation through class signalling appeal to voters? Evidence from a conjoint experiment
  23. Nitrate Pollution of Groundwater Long Exceeding Trigger Value
  24. Competence-Oriented Teaching
  25. The performatization of space
  26. Gläserne Bienen (1957)
  27. Writing Creatively in a Foreign Language