Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon.
Original languageEnglish
Title of host publicationHuman Practice. Digital Ecologies. Our Future : 14. Internationale Tagung Wirtschaftsinformatik (WI 2019), Tagungsband
EditorsThomas Ludwig, Volkmar Pipek
Number of pages15
Place of PublicationSiegen
PublisherUniversitätsverlag Siegen
Publication date2019
Pages453-467
ISBN (electronic)978-3-96182-063-4
DOIs
Publication statusPublished - 2019
Event14. Internationale Tagung Wirtschaftsinformatik - WI 2019: Human Practice. Digital Ecologies. Our Future. - Universität Siegen, Institut für Wirtschaftsinformatik, Siegen, Germany
Duration: 24.02.201927.02.2019
Conference number: 14
https://wi2019.de/
https://wi2019.de/call-for-papers/
https://wi2019.de/

Links

DOI

Recently viewed

Publications

  1. Grazing, exploring and networking for sustainability-oriented innovations in learning-action networks
  2. Globally asymptotic output feedback tracking of robot manipulators with actuator constraints
  3. Lyapunov stability analysis to set up a PI controller for a mass flow system in case of a non-saturating input
  4. Modeling of Logistic Processes in Assembly Areas
  5. Different kinds of interactive exercises with response analysis on the web
  6. Species composition and forest structure explain the temperature sensitivity patterns of productivity in temperate forests
  7. Facing complexity through informed simplifications
  8. Measuring cognitive load with subjective rating scales during problem solving
  9. Hierarchical trait filtering at different spatial scales determines beetle assemblages in deadwood
  10. Introducing parametric uncertainty into a nonlinear friction model
  11. The Influence of Note-taking on Mathematical Solution Processes while Working on Reality-Based Tasks
  12. The role of learners’ memory in app-based language instruction: the case of Duolingo.
  13. Sensor Fusion for Power Line Sensitive Monitoring and Load State Estimation
  14. Web-scale extension of RDF knowledge bases from templated websites
  15. A Service-oriented Search framework for full text, geospatial and semantic search
  16. Homogenization methods for multi-phase elastic composites with non-elliptical reinforcements
  17. Universal Threshold Calculation for Fingerprinting Decoders using Mixture Models
  18. FaST: A linear time stack trace alignment heuristic for crash report deduplication
  19. Considerations on efficient touch interfaces - How display size influences the performance in an applied pointing task
  20. Computing regression statistics from grouped data
  21. On the Decoupling and Output Functional Controllability of Robotic Manipulation
  22. Mapping interest rate projections using neural networks under cointegration
  23. Partitioned beta diversity patterns of plants across sharp and distinct boundaries of quartz habitat islands
  24. Analysis of PI controllers with anti-windup techniques on level systems
  25. Study on the effects of tool design and process parameters on the robustness of deep drawing
  26. TRY plant trait database – enhanced coverage and open access
  27. An evaluation of BPR methodologies adopting NIMSAD: A systematic framework for understanding and evaluating methodologies
  28. On finding nonisomorphic connected subgraphs and distinct molecular substructures.
  29. 7th open challenge on question answering over linked data (QALD-7)
  30. An expert-based reference list of variables for characterizing and monitoring social-ecological systems
  31. A Review of Latent Variable Modeling Using R - A Step-by-Step-Guide
  32. Practical guide to SAP Netweaver PI-development
  33. Modelling and implementation of an Order2Cash Process in distributed systems