Standard
Using Wikipedia for Cross-Language Named Entity Recognition. /
Fernandes, Eraldo R.; Brefeld, Ulf; Blanco, Roi et al.
Big Data Analytics in the Social and Ubiquitous Context: 5th International Workshop on Modeling Social Media, MSM 2014, 5th International Workshop on Mining Ubiquitous and Social Environments, MUSE 2014, and First International Workshop on Machine Learning for Urban Sensor Data, SenseML 2014, Revised Selected Papers. Hrsg. / Martin Atzmüller; Alvin Chin; Frederik Janssen; Immanuel Schweizer; Christoph Trattner. Springer International Publishing AG, 2016. S. 1-25 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 9546).
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Harvard
Fernandes, ER, Brefeld, U, Blanco, R & Atserias, J 2016,
Using Wikipedia for Cross-Language Named Entity Recognition. in M Atzmüller, A Chin, F Janssen, I Schweizer & C Trattner (Hrsg.),
Big Data Analytics in the Social and Ubiquitous Context: 5th International Workshop on Modeling Social Media, MSM 2014, 5th International Workshop on Mining Ubiquitous and Social Environments, MUSE 2014, and First International Workshop on Machine Learning for Urban Sensor Data, SenseML 2014, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 9546, Springer International Publishing AG, S. 1-25, 5th International Workshop on Mining Ubiquitous and Social Environments - MUSE 2014, Nancy, Frankreich,
15.09.14.
https://doi.org/10.1007/978-3-319-29009-6_1
APA
Fernandes, E. R., Brefeld, U., Blanco, R., & Atserias, J. (2016).
Using Wikipedia for Cross-Language Named Entity Recognition. In M. Atzmüller, A. Chin, F. Janssen, I. Schweizer, & C. Trattner (Hrsg.),
Big Data Analytics in the Social and Ubiquitous Context: 5th International Workshop on Modeling Social Media, MSM 2014, 5th International Workshop on Mining Ubiquitous and Social Environments, MUSE 2014, and First International Workshop on Machine Learning for Urban Sensor Data, SenseML 2014, Revised Selected Papers (S. 1-25). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 9546). Springer International Publishing AG.
https://doi.org/10.1007/978-3-319-29009-6_1
Vancouver
Fernandes ER, Brefeld U, Blanco R, Atserias J.
Using Wikipedia for Cross-Language Named Entity Recognition. in Atzmüller M, Chin A, Janssen F, Schweizer I, Trattner C, Hrsg., Big Data Analytics in the Social and Ubiquitous Context: 5th International Workshop on Modeling Social Media, MSM 2014, 5th International Workshop on Mining Ubiquitous and Social Environments, MUSE 2014, and First International Workshop on Machine Learning for Urban Sensor Data, SenseML 2014, Revised Selected Papers. Springer International Publishing AG. 2016. S. 1-25. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-29009-6_1
Bibtex
@inbook{6e4b3c84791249f2ad0e98fd7e464d1c,
title = "Using Wikipedia for Cross-Language Named Entity Recognition",
abstract = "Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.",
keywords = "Business informatics, Hide Markov Model, Target Language, Conditional Random Field, Source Language, Entitiy Recognition",
author = "Fernandes, {Eraldo R.} and Ulf Brefeld and Roi Blanco and Jordi Atserias",
year = "2016",
doi = "10.1007/978-3-319-29009-6_1",
language = "English",
isbn = "978-3-319-29008-9",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer International Publishing AG",
pages = "1--25",
editor = "Martin Atzm{\"u}ller and Alvin Chin and Frederik Janssen and Immanuel Schweizer and Christoph Trattner",
booktitle = "Big Data Analytics in the Social and Ubiquitous Context",
address = "Switzerland",
note = " 5th International Workshop on Mining Ubiquitous and Social Environments - MUSE 2014, MUSE 2014 ; Conference date: 15-09-2014 Through 15-09-2014",
url = "https://www.semanticscholar.org/paper/The-Fifth-International-Workshop-on-Mining-and-Qin-Greene/03ed707786c842ce7a36b091457e1452d2723aec, https://www.kde.cs.uni-kassel.de/wp-content/uploads/ws/muse2014/",
}
RIS
TY - CHAP
T1 - Using Wikipedia for Cross-Language Named Entity Recognition
AU - Fernandes, Eraldo R.
AU - Brefeld, Ulf
AU - Blanco, Roi
AU - Atserias, Jordi
N1 - Conference code: 5
PY - 2016
Y1 - 2016
N2 - Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.
AB - Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, we present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, we devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, we observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. We demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.
KW - Business informatics
KW - Hide Markov Model
KW - Target Language
KW - Conditional Random Field
KW - Source Language
KW - Entitiy Recognition
UR - http://www.scopus.com/inward/record.url?scp=84955265040&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-29009-6_1
DO - 10.1007/978-3-319-29009-6_1
M3 - Article in conference proceedings
SN - 978-3-319-29008-9
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 1
EP - 25
BT - Big Data Analytics in the Social and Ubiquitous Context
A2 - Atzmüller, Martin
A2 - Chin, Alvin
A2 - Janssen, Frederik
A2 - Schweizer, Immanuel
A2 - Trattner, Christoph
PB - Springer International Publishing AG
T2 - 5th International Workshop on Mining Ubiquitous and Social Environments - MUSE 2014
Y2 - 15 September 2014 through 15 September 2014
ER -