Topic Embeddings – A New Approach to Classify Very Short Documents Based on Predefined Topics

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Traditional unsupervised topic modeling approaches like Latent Dirichlet Allocation (LDA) lack the ability to classify documents into a predefined set of topics. On the other hand, supervised methods require significant amounts of labeled data to perform well on such tasks. We develop a new unsupervised method based on word embeddings to classify documents into predefined topics. We evaluate the predictive performance of this novel approach and compare it to seeded LDA. We use a real-world dataset from online advertising, which is comprised of markedly short documents. Our results indicate the two methods may complement one another well, leading to remarkable sensitivity and precision scores of ensemble learners trained thereupon.
Original languageEnglish
Title of host publicationHuman Practice. Digital Ecologies. Our Future : 14. Internationale Tagung Wirtschaftsinformatik (WI 2019), Tagungsband
EditorsThomas Ludwig, Volkmar Pipek
Number of pages15
Place of PublicationSiegen
PublisherUniversitätsverlag Siegen
Publication date2019
Pages453-467
ISBN (electronic)978-3-96182-063-4
DOIs
Publication statusPublished - 2019
Event14. Internationale Tagung Wirtschaftsinformatik - WI 2019: Human Practice. Digital Ecologies. Our Future. - Universität Siegen, Institut für Wirtschaftsinformatik, Siegen, Germany
Duration: 24.02.201927.02.2019
Conference number: 14
https://wi2019.de/
https://wi2019.de/call-for-papers/
https://wi2019.de/

Links

DOI

Recently viewed

Publications

  1. A New Framework for Production Planning and Control to Support the Positioning in Fields of Tension Created by Opposing Logistic Objectives
  2. Grazing, exploring and networking for sustainability-oriented innovations in learning-action networks
  3. Integrating the underlying structure of stochasticity into community ecology
  4. Using Complexity Metrics to Assess Silent Reading Fluency
  5. Parking space management through deep learning – an approach for automated, low-cost and scalable real-time detection of parking space occupancy
  6. Assembly Theory for Restoring Ecosystem Structure and Functioning
  7. »HOW TO MAKE YOUR OWN SAMPLES«
  8. PI and Fuzzy Controllers for Non-Linear Systems
  9. Harvesting information from captions for weakly supervised semantic segmentation
  10. Visualization of the Plasma Frequency by means of a Particle Simulation using a Normalized Periodic Model
  11. Analysis And Comparison Of Dispatching RuleBased Scheduling In Dual-Resource Constrained Shop-Floor Scenarios
  12. Exploration strategies, performance, and error consequences when learning a complex computer task
  13. Development and validation of a method for the determination of trace alkylphenols and phthalates in the atmosphere
  14. Dynamic Lot Size Optimization with Reinforcement Learning
  15. Use of Machine-Learning Algorithms Based on Text, Audio and Video Data in the Prediction of Anxiety and Post-Traumatic Stress in General and Clinical Populations
  16. Holistic and scalable ranking of RDF data
  17. Towards a spatial understanding of identity play
  18. Global Finite-Time Stabilization of Planar Linear Systems With Actuator Saturation
  19. Noise level estimation and detection
  20. Interpreting Strings, Weaving Threads
  21. Robust Flatness Based Control of an Electromagnetic Linear Actuator Using Adaptive PID Controller
  22. Investigation and modeling of the material behavior due to evolving dislocation microstructures in fcc and bcc metals
  23. Understanding storytelling in the context of information systems
  24. Analyzing math teacher students' sensitivity for aspects of the complexity of problem oriented mathematics instruction
  25. Real-time RDF extraction from unstructured data streams
  26. “Ideation is Fine, but Execution is Key”
  27. Supporting the Development and Realization of Data-Driven Business Models with Enterprise Architecture Modeling and Management
  28. Considerations on efficient touch interfaces - How display size influences the performance in an applied pointing task
  29. A new way of assessing the interaction of a metallic phase precursor with a modified oxide support substrate as a source of information for predicting metal dispersion
  30. Computing regression statistics from grouped data
  31. Foundations and applications of computer based material flow networks for einvironmental management
  32. Mapping interest rate projections using neural networks under cointegration
  33. Partitioned beta diversity patterns of plants across sharp and distinct boundaries of quartz habitat islands