Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon.
Original languageEnglish
Title of host publicationHuman Practice. Digital Ecologies. Our Future : 14. Internationale Tagung Wirtschaftsinformatik (WI 2019), Tagungsband
EditorsThomas Ludwig, Volkmar Pipek
Number of pages15
Place of PublicationSiegen
PublisherUniversitätsverlag Siegen
Publication date2019
Pages453-467
ISBN (electronic)978-3-96182-063-4
DOIs
Publication statusPublished - 2019
Event14. Internationale Tagung Wirtschaftsinformatik - WI 2019: Human Practice. Digital Ecologies. Our Future. - Universität Siegen, Institut für Wirtschaftsinformatik, Siegen, Germany
Duration: 24.02.201927.02.2019
Conference number: 14
https://wi2019.de/
https://wi2019.de/call-for-papers/
https://wi2019.de/

Links

DOI

Recently viewed

Publications

  1. On robustness properties in permanent magnet machine control by using decoupling controller
  2. Integrating the underlying structure of stochasticity into community ecology
  3. Globally asymptotic output feedback tracking of robot manipulators with actuator constraints
  4. Mathematical relation between extended connectivity and eigenvector coefficients.
  5. Should learners use their hands for learning? Results from an eye-tracking study
  6. »HOW TO MAKE YOUR OWN SAMPLES«
  7. Harvesting information from captions for weakly supervised semantic segmentation
  8. Fast, Fully Automated Analysis of Voriconazole from Serum by LC-LC-ESI-MS-MS with Parallel Column-Switching Technique
  9. Analysis And Comparison Of Dispatching RuleBased Scheduling In Dual-Resource Constrained Shop-Floor Scenarios
  10. Closed-form Solution for the Direct Kinematics Problem of the Planar 3-RPR Parallel Mechanism
  11. Exploration strategies, performance, and error consequences when learning a complex computer task
  12. Lessons learned for spatial modelling of ecosystem services in support of ecosystem accounting
  13. Construct Objectification and De-Objectification in Organization Theory
  14. Holistic and scalable ranking of RDF data
  15. Lyapunov Convergence Analysis for Asymptotic Tracking Using Forward and Backward Euler Approximation of Discrete Differential Equations
  16. Contextual movement models based on normalizing flows
  17. Global Finite-Time Stabilization of Planar Linear Systems With Actuator Saturation
  18. Analyzing User Journey Data In Digital Health: Predicting Dropout From A Digital CBT-I Intervention
  19. Web-scale extension of RDF knowledge bases from templated websites
  20. Clause identification using entropy guided transformation learning
  21. Experimentally established correlation of friction surfacing process temperature and deposit geometry
  22. Interpreting Strings, Weaving Threads
  23. Generating Energy Optimal Powertrain Force Trajectories with Dynamic Constraints
  24. Analyzing math teacher students' sensitivity for aspects of the complexity of problem oriented mathematics instruction
  25. FaST: A linear time stack trace alignment heuristic for crash report deduplication
  26. What does it mean to be sensitive for the complexity of (problem oriented) teaching?
  27. Improving students’ science text comprehension through metacognitive self-regulation when applying learning strategies
  28. A new way of assessing the interaction of a metallic phase precursor with a modified oxide support substrate as a source of information for predicting metal dispersion
  29. Computing regression statistics from grouped data
  30. Performance analysis for loss systems with many subscribers and concurrent services
  31. Stimulating Computing
  32. TARGET SETTING FOR OPERATIONAL PERFORMANCE IMPROVEMENTS - STUDY CASE -