Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Lasse Lommel; Meike  Riebeling; Burkhardt Funk; Christian  Junginger

doi:10.25819/ubsi/1016

Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

Standard

Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics. / Lommel, Lasse; Riebeling, Meike ; Funk, Burkhardt et al.
Human Practice. Digital Ecologies. Our Future: 14. Internationale Tagung Wirtschaftsinformatik (WI 2019), Tagungsband . Hrsg. / Thomas Ludwig; Volkmar Pipek. Siegen: Universitätsverlag Siegen, 2019. S. 453-467.

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

Harvard

Lommel, L, Riebeling, M, Funk, B & Junginger, C 2019, Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics. in T Ludwig & V Pipek (Hrsg.), Human Practice. Digital Ecologies. Our Future: 14. Internationale Tagung Wirtschaftsinformatik (WI 2019), Tagungsband . Universitätsverlag Siegen, Siegen, S. 453-467, 14. Internationale Tagung Wirtschaftsinformatik - WI 2019, Siegen, Deutschland, 24.02.19. https://doi.org/10.25819/ubsi/1016

APA

Lommel, L., Riebeling, M., Funk, B., & Junginger, C. (2019). Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics. In T. Ludwig, & V. Pipek (Hrsg.), Human Practice. Digital Ecologies. Our Future: 14. Internationale Tagung Wirtschaftsinformatik (WI 2019), Tagungsband (S. 453-467). Universitätsverlag Siegen. https://doi.org/10.25819/ubsi/1016

Vancouver

Lommel L, Riebeling M, Funk B, Junginger C. Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics. in Ludwig T, Pipek V, Hrsg., Human Practice. Digital Ecologies. Our Future: 14. Internationale Tagung Wirtschaftsinformatik (WI 2019), Tagungsband . Siegen: Universitätsverlag Siegen. 2019. S. 453-467 doi: 10.25819/ubsi/1016

Bibtex

@inbook{5413e328ea194031b724e97ab07c0d4d,

title = "Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics",

abstract = "Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon.",

keywords = "Business informatics, topic modeling, word embeddings, LDA, seeded LDA, topic modeling, word embeddings, LDA, seeded LDA",

author = "Lasse Lommel and Meike Riebeling and Burkhardt Funk and Christian Junginger",

year = "2019",

doi = "10.25819/ubsi/1016",

language = "English",

pages = "453--467",

editor = "Thomas Ludwig and Volkmar Pipek",

booktitle = "Human Practice. Digital Ecologies. Our Future",

publisher = "Universit{\"a}tsverlag Siegen",

address = "Germany",

note = "14. Internationale Tagung Wirtschaftsinformatik - WI 2019 ; Conference date: 24-02-2019 Through 27-02-2019",

url = "https://wi2019.de/, https://wi2019.de/call-for-papers/, https://wi2019.de/",

}

RIS

TY - CHAP

T1 - Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

AU - Lommel, Lasse

AU - Riebeling, Meike

AU - Funk, Burkhardt

AU - Junginger, Christian

N1 - Conference code: 14

PY - 2019

Y1 - 2019

N2 - Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon.

AB - Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon.

KW - Business informatics

KW - topic modeling, word embeddings, LDA, seeded LDA

KW - topic modeling

KW - word embeddings

KW - LDA

KW - seeded LDA

UR - https://wi2019.de/tagungsband/

UR - https://wi2019.de/wp-content/uploads/Tagungsband_WI2019_reduziert.pdf

UR - https://www.universi.uni-siegen.de/katalog/einzelpublikationen/897618.html

U2 - 10.25819/ubsi/1016

DO - 10.25819/ubsi/1016

M3 - Article in conference proceedings

SP - 453

EP - 467

BT - Human Practice. Digital Ecologies. Our Future

A2 - Ludwig, Thomas

A2 - Pipek, Volkmar

PB - Universitätsverlag Siegen

CY - Siegen

T2 - 14. Internationale Tagung Wirtschaftsinformatik - WI 2019

Y2 - 24 February 2019 through 27 February 2019

ER -

Weitere Publikationen dieser Person(en)

Capitalizing on natural language processing (NLP) to automate the evaluation of coach implementation fidelity in guided digital cognitive-behavioral therapy (GdCBT)

Zainal, N. H., Eckhardt, R., Rackoff, G. N., Fitzsimmons-Craft, E. E., Rojas-Ashe, E., Barr Taylor, C., Funk, B., Eisenberg, D., Wilfley, D. E. & Newman, M. G., 02.04.2025, in: Psychological Medicine. 55, e106.

Publikation: Beiträge in Zeitschriften › Zeitschriftenaufsätze › Forschung › begutachtet

Construct relation extraction from scientific papers: Is it automatable yet?

Funk, B. & Scharfenberger, J., 07.01.2025, Proceedings of the 58th Hawaii International Conference on System Sciences, HICSS 2025. Bui, T. X. (Hrsg.). Honolulu: University of Hawaii at Manoa, S. 4675-4684 10 S. (Hawaii International Conference on System Sciences (HICSS); Band 2025).

Publikation: Beiträge in Sammelwerken › Abstracts in Konferenzbänden › Forschung › begutachtet

Enhancing Invoice Recognition with LLM Embeddings in GAT Networks

Thiée, L. W. & Funk, B., 08.2025, Americas Conference on Information Systems, AMCIS 2025. The Association for Information Systems (AIS), S. 4483-4492 10 S. (Americas Conference on Information Systems, AMCIS 2025; Band 7).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

From Feedback to Formative Guidance: Leveraging LLMs for Personalized Support in Programming Projects

Ghoochani, F., Scharfenberger, J., Funk, B., Doublan, R., Jakharabhai Odedra, M. & Etsiwah, B., 12.06.2025, UMAP 2025 - Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization. Conati, C., Narducci, F., Rossiello, G., Musto, C. & Vassileva, J. (Hrsg.). Association for Computing Machinery, Inc, S. 398-403 6 S. (UMAP 2025 - Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

The promise and challenges of computer mouse trajectories in DMHIs – A feasibility study on pre-treatment dropout predictions

Zantvoort, K., Matthiesen, J., Bjurner, P., Bendix, M., Brefeld, U., Funk, B. & Kaldo, V., 06.2025, in: Internet Interventions. 40, 7 S., 100828.

Publikation: Beiträge in Zeitschriften › Zeitschriftenaufsätze › Forschung › begutachtet

DOI

https://doi.org/10.25819/ubsi/1016
Endgültige, publizierte Fassung

Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Standard

Harvard

APA

Vancouver

Bibtex

RIS

Weitere Publikationen dieser Person(en)

Capitalizing on natural language processing (NLP) to automate the evaluation of coach implementation fidelity in guided digital cognitive-behavioral therapy (GdCBT)

Construct relation extraction from scientific papers: Is it automatable yet?

Enhancing Invoice Recognition with LLM Embeddings in GAT Networks

From Feedback to Formative Guidance: Leveraging LLMs for Personalized Support in Programming Projects

The promise and challenges of computer mouse trajectories in DMHIs – A feasibility study on pre-treatment dropout predictions

Links

DOI