TextCSN: A semi-supervised approach for text clustering using pairwise constraints and convolutional siamese network

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Clustering is a key problem in several applications. Although this task is originally unsupervised, there are many proposals leveraging different supervision signals in order to improve clustering performance. Some semi-supervised clustering methods employ pairwise constraints to inform the learning algorithm about pairs of instances that should be in the same cluster (must-link constraints or similar instances) and pairs that should be in different clusters (cannot-link constraints or dissimilar instances). In many applications, to provide pairwise constraints is cheaper than asking users for explicit labels on the data. More recently, deep clustering methods have been proposed in the literature. Such methods consists in learning a deep neural representation of the input data in order to improve clustering. In this paper, we present TextCSN, a deep clustering approach that combines (i) a Convolutional Siamese Network (CSN) based on pairwise constraints to perform representation learning and (ii) the traditional K-Means algorithm for unsupervised clustering using the learned representation. As far as we know, this is the first semi-supervised deep learning method based on pairwise constraints applied on text clustering. By means of eight text clustering tasks, we assess our approach comparing it with two baselines: MPC-KMeans, a semi-supervised clustering algorithm; and ordinary K-Means algorithm. Results indicate that the proposed approach outperforms the baselines in six of these datasets, and its performance increases with the number of constraints provided.

Original languageEnglish
Title of host publicationThe 35th Annual ACM Symposium on Applied Computing : Brno, Czech Republic, March 30 - April 3, 2020
Number of pages8
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date30.03.2020
Pages1135-1142
ISBN (electronic)978-1-4503-6866-7
DOIs
Publication statusPublished - 30.03.2020
Externally publishedYes
EventAnnual ACM Symposium on Applied Computing - SAC 2020 - Brno, Czech Republic
Duration: 30.03.202003.04.2020
Conference number: 35
https://www.sigapp.org/sac/sac2020/

DOI

Recently viewed

Publications

  1. Where do the data live?
  2. Intraindividual variability in identity centrality
  3. I share because of who I am: values, identities, norms, and attitudes explain sharing intentions
  4. How development leads to democracy
  5. Intermediate `time-spaces' - The rediscovery of transition in spatial planning and environmental planning
  6. How can employment relations in global value networks be managed towards social responsibility?
  7. Analysis of Dynamic Response of a Two Degrees of Freedom (2-DOF) Ball Bearing Nonlinear Model
  8. Bioassay-directed chemical analysis utilizing LC-MS: a tool for identifying estrogenic compounds in water samples?
  9. ›A moving picture of thought‹
  10. DSM-IV und DSM-5
  11. Control of Permanent Magnet Synchronous Motors for Track Applications
  12. Winning Ugly
  13. "to expose, to show, to demonstrate, to inform, to offer. Artistic Practices around 1990"
  14. A trait-based framework linking the soil metabolome to plant–soil feedbacks
  15. Repeated sampling detects gene flow in a flightless ground beetle in a fragmented landscape
  16. Cognitive and affective processes in multimedia learning
  17. The Impact of Scale on Children’s Spatial Thought
  18. Amtsmenschen
  19. Risk Aversion and Sorting into Public Sector Employment
  20. Technological change and the dynamics of industries, theoretical issues and empirical evidence from Dutch manufacturing
  21. A Performance Motivator in one Country, A Non-Motivator in Another?
  22. Collective emotions in institutional creation work
  23. Propagating Maximum Capacities for Recommendation
  24. Probing turbulent superstructures in Rayleigh-Bénard convection by Lagrangian trajectory clusters
  25. To Fail or Not to Fail
  26. Inexistent Ink: Michael Cisco and Quentin Meillassoux on Writing Worlds
  27. Understanding european union law
  28. Realizing the full potential of behavioural science for climate change mitigation
  29. Remaining time and opportunities at work: Relationships between age, work characteristics, and occupational future time perspective
  30. Mechanisms behind elevational plant species richness patterns revealed by a trait-based approach
  31. The effect of storage medium on the elution of monomers from composite materials
  32. Does outcome expectancy predict outcomes in online depression prevention? Secondary analysis of randomised-controlled trials
  33. The negative interplay between national custodial sanctions and leniency
  34. Too precise to pursue
  35. Diskussionsinhalte der 10. Hamburger Auditing and Control Conference am 20./21.09.2012
  36. Elevated temperature and varied load response of AS41 at bolted joint
  37. Synchronic and Diachronic Pragmatic Variability
  38. Zielablösezyklus
  39. The impact of distributed leadership on teacher commitment
  40. Hohe Einkommen
  41. Imagining ways forward
  42. Modeling of Friction-Induced Vibrations during Tightening of Bolted Joints
  43. § 3b EStG
  44. Ästhetische Bildung der Differenz
  45. When (and how) ideas become arguments