Domain adaptation of POS taggers without handcrafted features
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
IJCNN 2017: the International Joint Conference on Neural Networks. Piscataway: Institute of Electrical and Electronics Engineers Inc., 2017. p. 3331-3338 7966274 (Proceedings of the International Joint Conference on Neural Networks; Vol. 2017).
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Domain adaptation of POS taggers without handcrafted features
AU - Rodrigues, Irving M.
AU - Fernandes, Eraldo R.
AU - dos Santos, Cicero N.
PY - 2017/6/30
Y1 - 2017/6/30
N2 - Unsupervised domain adaptation is an attractive option when labeled data is lacking for some domain of interest but is available for other domain. Part-of-speech (POS) tagging is often considered a solved task when enough labeled data is available in the domain of interest. However, when considering a domain adaptation scenario, this is far from true. Several approaches have been proposed for domain adaptation of POS taggers, however as far as we know, all of them are based on handcrafted features. In this work, we employ a machine learning method whose input is exclusively composed of the raw text. This method learns word- and character-level representations (embeddings), and has been successfully applied to intra-domain tasks. We show that this method achieves strong performances on the domain adaptation of English and Portuguese POS taggers.
AB - Unsupervised domain adaptation is an attractive option when labeled data is lacking for some domain of interest but is available for other domain. Part-of-speech (POS) tagging is often considered a solved task when enough labeled data is available in the domain of interest. However, when considering a domain adaptation scenario, this is far from true. Several approaches have been proposed for domain adaptation of POS taggers, however as far as we know, all of them are based on handcrafted features. In this work, we employ a machine learning method whose input is exclusively composed of the raw text. This method learns word- and character-level representations (embeddings), and has been successfully applied to intra-domain tasks. We show that this method achieves strong performances on the domain adaptation of English and Portuguese POS taggers.
KW - Informatics
KW - tagging
KW - syntactics
KW - training
KW - Vocabulary
KW - training data
KW - Feature extraction
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=85030974151&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2017.7966274
DO - 10.1109/IJCNN.2017.7966274
M3 - Article in conference proceedings
AN - SCOPUS:85030974151
SN - 978-1-5090-6183-9
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 3331
EP - 3338
BT - IJCNN 2017
PB - Institute of Electrical and Electronics Engineers Inc.
CY - Piscataway
T2 - International Joint Conference on Neural Networks
Y2 - 14 May 2017 through 19 May 2017
ER -