Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
Human Practice. Digital Ecologies. Our Future: 14. Internationale Tagung Wirtschaftsinformatik (WI 2019), Tagungsband . ed. / Thomas Ludwig; Volkmar Pipek. Siegen: Universitätsverlag Siegen, 2019. p. 453-467.
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics
AU - Lommel, Lasse
AU - Riebeling, Meike
AU - Funk, Burkhardt
AU - Junginger, Christian
N1 - Conference code: 14
PY - 2019
Y1 - 2019
N2 - Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon.
AB - Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon.
KW - Business informatics
KW - topic modeling, word embeddings, LDA, seeded LDA
KW - topic modeling
KW - word embeddings
KW - LDA
KW - seeded LDA
UR - https://wi2019.de/tagungsband/
UR - https://wi2019.de/wp-content/uploads/Tagungsband_WI2019_reduziert.pdf
UR - https://www.universi.uni-siegen.de/katalog/einzelpublikationen/897618.html
U2 - 10.25819/ubsi/1016
DO - 10.25819/ubsi/1016
M3 - Article in conference proceedings
SP - 453
EP - 467
BT - Human Practice. Digital Ecologies. Our Future
A2 - Ludwig, Thomas
A2 - Pipek, Volkmar
PB - Universitätsverlag Siegen
CY - Siegen
T2 - 14. Internationale Tagung Wirtschaftsinformatik - WI 2019
Y2 - 24 February 2019 through 27 February 2019
ER -