Portuguese part-of-speech tagging with large margin structure learning
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
BRACIS 2014: 2014 Brazilian Conference on Intelligent Systems ; 19-23 October 2014, São Carlos, São Paulo, Brazil ; proceedings. Piscataway: Institute of Electrical and Electronics Engineers Inc., 2014. p. 25-30 6984802.
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Portuguese part-of-speech tagging with large margin structure learning
AU - Fernandes, Eraldo Rezende
AU - Rodrigues, Irving Muller
AU - Milidiú, Ruy Luiz
N1 - Conference code: 3
PY - 2014/12/12
Y1 - 2014/12/12
N2 - Part-of-Speech Tagging is a fundamental task on many Natural Language Processing systems. This task consists in identifying the syntactic category, i.e. the part of speech, of each word in a sentence. Despite the fact that the current state-of-the-art accuracy for this task is around 97%, any improvement has an immediate impact on more complex tasks, like Parsing, Semantic Role Labeling and Information Extraction. Thus, it is still relevant to explore this task. In this paper, we introduce a part-of-speech tagger based on the Structure Learning framework that reduces the smallest known error on the Portuguese Mac-Morpho corpus by 7.8%. We also apply our tagger to a recently revised version of Mac-Morpho. Our system accuracy on this latter version is competitive with a semi-supervised Neural Network trained on Mac-Morpho plus a very large non-annotated corpus. Additionally, our system is simpler than previous systems and uses a very limited feature set. Our system employs a Large Margin training criteria to derive a structure predictor that is more robust on unseen data.
AB - Part-of-Speech Tagging is a fundamental task on many Natural Language Processing systems. This task consists in identifying the syntactic category, i.e. the part of speech, of each word in a sentence. Despite the fact that the current state-of-the-art accuracy for this task is around 97%, any improvement has an immediate impact on more complex tasks, like Parsing, Semantic Role Labeling and Information Extraction. Thus, it is still relevant to explore this task. In this paper, we introduce a part-of-speech tagger based on the Structure Learning framework that reduces the smallest known error on the Portuguese Mac-Morpho corpus by 7.8%. We also apply our tagger to a recently revised version of Mac-Morpho. Our system accuracy on this latter version is competitive with a semi-supervised Neural Network trained on Mac-Morpho plus a very large non-annotated corpus. Additionally, our system is simpler than previous systems and uses a very limited feature set. Our system employs a Large Margin training criteria to derive a structure predictor that is more robust on unseen data.
KW - Machine Learning
KW - Natural Language Processing
KW - POS Tagging
KW - Structure Learning
KW - Informatics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=84922535000&partnerID=8YFLogxK
U2 - 10.1109/BRACIS.2014.16
DO - 10.1109/BRACIS.2014.16
M3 - Article in conference proceedings
AN - SCOPUS:84922535000
SN - 978-1-4799-7859-5
SP - 25
EP - 30
BT - BRACIS 2014
PB - Institute of Electrical and Electronics Engineers Inc.
CY - Piscataway
T2 - Brazilian Conference on Intelligent Systems - BRACIS 2014
Y2 - 18 October 2014 through 23 October 2014
ER -