Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon.
OriginalspracheEnglisch
TitelHuman Practice. Digital Ecologies. Our Future : 14. Internationale Tagung Wirtschaftsinformatik (WI 2019), Tagungsband
HerausgeberThomas Ludwig, Volkmar Pipek
Anzahl der Seiten15
ErscheinungsortSiegen
VerlagUniversitätsverlag Siegen
Erscheinungsdatum2019
Seiten453-467
ISBN (elektronisch)978-3-96182-063-4
DOIs
PublikationsstatusErschienen - 2019
Veranstaltung14. Internationale Tagung Wirtschaftsinformatik - WI 2019: Human Practice. Digital Ecologies. Our Future. - Universität Siegen, Institut für Wirtschaftsinformatik, Siegen, Deutschland
Dauer: 24.02.201927.02.2019
Konferenznummer: 14
https://wi2019.de/
https://wi2019.de/call-for-papers/
https://wi2019.de/

Links

DOI