TextCSN: A semi-supervised approach for text clustering using pairwise constraints and convolutional siamese network

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Clustering is a key problem in several applications. Although this task is originally unsupervised, there are many proposals leveraging different supervision signals in order to improve clustering performance. Some semi-supervised clustering methods employ pairwise constraints to inform the learning algorithm about pairs of instances that should be in the same cluster (must-link constraints or similar instances) and pairs that should be in different clusters (cannot-link constraints or dissimilar instances). In many applications, to provide pairwise constraints is cheaper than asking users for explicit labels on the data. More recently, deep clustering methods have been proposed in the literature. Such methods consists in learning a deep neural representation of the input data in order to improve clustering. In this paper, we present TextCSN, a deep clustering approach that combines (i) a Convolutional Siamese Network (CSN) based on pairwise constraints to perform representation learning and (ii) the traditional K-Means algorithm for unsupervised clustering using the learned representation. As far as we know, this is the first semi-supervised deep learning method based on pairwise constraints applied on text clustering. By means of eight text clustering tasks, we assess our approach comparing it with two baselines: MPC-KMeans, a semi-supervised clustering algorithm; and ordinary K-Means algorithm. Results indicate that the proposed approach outperforms the baselines in six of these datasets, and its performance increases with the number of constraints provided.

OriginalspracheEnglisch
TitelThe 35th Annual ACM Symposium on Applied Computing : Brno, Czech Republic, March 30 - April 3, 2020
Anzahl der Seiten8
ErscheinungsortNew York
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum30.03.2020
Seiten1135-1142
ISBN (elektronisch)978-1-4503-6866-7
DOIs
PublikationsstatusErschienen - 30.03.2020
Extern publiziertJa
VeranstaltungAnnual ACM Symposium on Applied Computing - SAC 2020 - Brno, Tschechische Republik
Dauer: 30.03.202003.04.2020
Konferenznummer: 35
https://www.sigapp.org/sac/sac2020/

DOI

Zuletzt angesehen

Forschende

  1. Georg Reischauer

Publikationen

  1. Land use affects dung beetle communities and their ecosystem service in forests and grasslands
  2. New incremental methods for springback compensation by stress superposition
  3. Existenzgründungen junger Handwerksmeister
  4. Same but different? Measurement invariance of the PIAAC motivation-to-learn scale across key socio-demographic groups
  5. Landscape modification and habitat fragmentation: a synthesis
  6. It is not what it is
  7. Newsfeed clutter as an inhibitor of sensemaking
  8. SMARTPHONE APPS FOR TINNITUS: A REVIEW ON INTERVENTION COMPONENTS AND BEHAVIOR CHANGE TECHNIQUES USED IN TINNITUS APPS
  9. y-Randomization and its variants in QSPR/QSAR
  10. Exports and productivity: A survey of the evidence from firm-level data
  11. Mythos
  12. Exports, R&D and Productivity
  13. Sigrid Kopfermann
  14. Effects of oral corrective feedback on the development of complex morphosyntax
  15. Quality and time-related indicators in inceptive plans
  16. Online to offline social networking
  17. Silver Work
  18. Towards a Real-world Laboratory
  19. Sustainable Statehood: Reflections on Critical (Pre-)Conditions, Requirements and Design Options
  20. Do it again
  21. Prologue: Analyzing the Fine Details of Political Commitment
  22. Towards an agri-environment index for biodiversity conservation payment schemes
  23. Online-Beratung für Eltern
  24. Constitutive views on csr communication
  25. NFDI4DS Gateway and Portal
  26. How work values relate to the intention to work after retirement
  27. MICSIM-4J - A General Microsimulation Model
  28. Update wurde nicht ausgeführt
  29. Self-selection, socialization, and risk perception
  30. The Social Case as a Business Case
  31. Automatic imitation of pro- and antisocial gestures
  32. Governing Climate Change by Diffusion
  33. Gramsci global.
  34. Power and control on the waterfront
  35. Diversity as Polyphony
  36. Long-term drought triggers severe declines in carabid beetles in a temperate forest
  37. The Invisualities of Capture in Amazon’s Logistical Operations
  38. Analyis of a Potential Single and Combined Business Model for Stationary Battery Storage Systems
  39. Die Eco-rational Path-Method (EPM)
  40. Stakeholder and citizen involvement for Water Framework Directive implementation in Spain