An Off-the-shelf Approach to Authorship Attribution
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Standard
COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers. Dublin: Association for Computational Linguistics (ACL), 2014. S. 895-904 (COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers).
Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - An Off-the-shelf Approach to Authorship Attribution
AU - Nasir, Jamal Abdul
AU - Görnitz, Nico
AU - Brefeld, Ulf
N1 - Conference code: 25
PY - 2014
Y1 - 2014
N2 - Authorship detection is a challenging task due to many design choices the user has to decide on. The performance highly depends on the right set of features, the amount of data, in-sample vs. out-of-sample settings, and profile- vs. instance-based approaches. So far, the variety of combinations renders off-the-shelf methods for authorship detection inappropriate. We propose a novel and generally deployable method that does not share these limitations. We treat authorship attribution as an anomaly detection problem where author regions are learned in feature space. The choice of the right feature space for a given task is identified automatically by representing the optimal solution as a linear mixture of multiple kernel functions (MKL). Our approach allows to include labelled as well as unlabelled examples to remedy the in-sample and out-of-sample problems. Empirically, we observe our proposed novel technique either to be better or on par with baseline competitors. However, our method relieves the user from critical design choices (e.g., feature set) and can therefore be used as an off-the-shelf method for authorship attribution.
AB - Authorship detection is a challenging task due to many design choices the user has to decide on. The performance highly depends on the right set of features, the amount of data, in-sample vs. out-of-sample settings, and profile- vs. instance-based approaches. So far, the variety of combinations renders off-the-shelf methods for authorship detection inappropriate. We propose a novel and generally deployable method that does not share these limitations. We treat authorship attribution as an anomaly detection problem where author regions are learned in feature space. The choice of the right feature space for a given task is identified automatically by representing the optimal solution as a linear mixture of multiple kernel functions (MKL). Our approach allows to include labelled as well as unlabelled examples to remedy the in-sample and out-of-sample problems. Empirically, we observe our proposed novel technique either to be better or on par with baseline competitors. However, our method relieves the user from critical design choices (e.g., feature set) and can therefore be used as an off-the-shelf method for authorship attribution.
KW - Informatics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=84959886512&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/71fe9cf0-6be3-3329-b3d8-e329a11a5d88/
M3 - Article in conference proceedings
SN - 978-194164326-6
T3 - COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers
SP - 895
EP - 904
BT - COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014
PB - Association for Computational Linguistics (ACL)
CY - Dublin
T2 - 25th International Conference on Computational Linguistics - COLING 2014
Y2 - 23 August 2014 through 29 August 2014
ER -