Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon.
Original languageEnglish
Title of host publicationHuman Practice. Digital Ecologies. Our Future : 14. Internationale Tagung Wirtschaftsinformatik (WI 2019), Tagungsband
EditorsThomas Ludwig, Volkmar Pipek
Number of pages15
Place of PublicationSiegen
PublisherUniversitätsverlag Siegen
Publication date2019
Pages453-467
ISBN (electronic)978-3-96182-063-4
DOIs
Publication statusPublished - 2019
Event14. Internationale Tagung Wirtschaftsinformatik - WI 2019: Human Practice. Digital Ecologies. Our Future. - Universität Siegen, Institut für Wirtschaftsinformatik, Siegen, Germany
Duration: 24.02.201927.02.2019
Conference number: 14
https://wi2019.de/
https://wi2019.de/call-for-papers/
https://wi2019.de/

Links

DOI

Recently viewed

Publications

  1. Reading and Calculating in Word Problem Solving
  2. On robustness properties in permanent magnet machine control by using decoupling controller
  3. Using heuristic worked examples to promote solving of reality‑based tasks in mathematics in lower secondary school
  4. Cognitive load and instructionally supported learning with provided and learner-generated visualizations
  5. A tutorial introduction to adaptive fractal analysis
  6. Situated multiplying in primary school
  7. Validation of an open source, remote web-based eye-tracking method (WebGazer) for research in early childhood
  8. Mathematics in Robot Control for Theoretical and Applied Problems
  9. Challenges and boundaries in implementing social return on investment
  10. XOperator - Interconnecting the semantic web and instant messaging networks
  11. Template-based Question Answering using Recursive Neural Networks
  12. Machine Learning and Knowledge Discovery in Databases
  13. Integrating the underlying structure of stochasticity into community ecology
  14. Dynamically changing sequencing rules with reinforcement learning in a job shop system with stochastic influences
  15. Parking space management through deep learning – an approach for automated, low-cost and scalable real-time detection of parking space occupancy
  16. Assembly Theory for Restoring Ecosystem Structure and Functioning
  17. Mathematical relation between extended connectivity and eigenvector coefficients.
  18. Latent structure perceptron with feature induction for unrestricted coreference resolution
  19. Globally asymptotic output feedback tracking of robot manipulators with actuator constraints
  20. Challenges in detecting proximal effects of existential threat on lie detection accuracy
  21. Constructions and Reconstructions. The Architectural Image between Rendering and Photography
  22. Should learners use their hands for learning? Results from an eye-tracking study
  23. Is too much help an obstacle? Effects of interactivity and cognitive style on learning with dynamic versus non-dynamic visualizations with narrative explanations
  24. Soft Optimal Computing Methods to Identify Surface Roughness in Manufacturing Using a Monotonic Regressor
  25. Lyapunov stability analysis to set up a PI controller for a mass flow system in case of a non-saturating input
  26. A sensor fault detection scheme as a functional safety feature for DC-DC converters
  27. Detection time analysis of propulsion system fault effects in a hexacopter
  28. Using cross-recurrence quantification analysis to compute similarity measures for time series of unequal length with applications to sleep stage analysis