TextCSN: A semi-supervised approach for text clustering using pairwise constraints and convolutional siamese network

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Clustering is a key problem in several applications. Although this task is originally unsupervised, there are many proposals leveraging different supervision signals in order to improve clustering performance. Some semi-supervised clustering methods employ pairwise constraints to inform the learning algorithm about pairs of instances that should be in the same cluster (must-link constraints or similar instances) and pairs that should be in different clusters (cannot-link constraints or dissimilar instances). In many applications, to provide pairwise constraints is cheaper than asking users for explicit labels on the data. More recently, deep clustering methods have been proposed in the literature. Such methods consists in learning a deep neural representation of the input data in order to improve clustering. In this paper, we present TextCSN, a deep clustering approach that combines (i) a Convolutional Siamese Network (CSN) based on pairwise constraints to perform representation learning and (ii) the traditional K-Means algorithm for unsupervised clustering using the learned representation. As far as we know, this is the first semi-supervised deep learning method based on pairwise constraints applied on text clustering. By means of eight text clustering tasks, we assess our approach comparing it with two baselines: MPC-KMeans, a semi-supervised clustering algorithm; and ordinary K-Means algorithm. Results indicate that the proposed approach outperforms the baselines in six of these datasets, and its performance increases with the number of constraints provided.

Original languageEnglish
Title of host publicationThe 35th Annual ACM Symposium on Applied Computing : Brno, Czech Republic, March 30 - April 3, 2020
Number of pages8
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date30.03.2020
Pages1135-1142
ISBN (electronic)978-1-4503-6866-7
DOIs
Publication statusPublished - 30.03.2020
Externally publishedYes
EventAnnual ACM Symposium on Applied Computing - SAC 2020 - Brno, Czech Republic
Duration: 30.03.202003.04.2020
Conference number: 35
https://www.sigapp.org/sac/sac2020/

DOI

Recently viewed

Publications

  1. An Exploration of humans‘ ability to recognize emotions displayed by robots
  2. Collaborative benchmarking of functional-structural root architecture models
  3. Schellings subjektivitätskritik
  4. Always on Call: Is There an Age Advantage in Dealing with Availability and Response Expectations?
  5. Contested Promises
  6. CDS spreads, systemic risk and interconnectedness
  7. Guest Editors' Introduction
  8. Ecosystem Services as a Contested Concept
  9. The Use of Anti-Windup Techniques in Didactic Level Systems
  10. An Optimization Approach for Crew Rostering in Public Bus Transit
  11. A hybrid actuator modelling and hysteresis effect identification in camless internal combustion engines control
  12. Construal level theory
  13. Method of Artificial Vision in Guide Cane for Visually Impaired People
  14. Insights into adoption of farming practices through multiple lenses
  15. Special issue on Variational Pragmatics
  16. Appreciating ecological complexity
  17. Customer Orientation of Service Employees—Toward a Conceptual Framework of a Key Relationship Marketing Construct
  18. Performance Saga: Interview 05
  19. Mathematik als Fremdsprache?
  20. Mechanics of sheet-bulk indentation
  21. ›A moving picture of thought‹
  22. DSM-IV und DSM-5
  23. Wie partizipativ sind Bottom-up-Transformationen?
  24. Model Predictive Control for Energy Optimization in Generators/Motors as Well as Converters and Inverters for Futuristic Integrated Power Networks
  25. Natality ‒ Philosophical Rudiments concerning a Generative Phenomenology
  26. Exploring the Use of the Pronoun I in German Academic Texts with Machine Learning
  27. The State and Healthcare
  28. Hot deformation behavior and processing map of Mg-3Sn-2Ca-0.4Al-0.4Zn alloy
  29. What do we know about new venture investment time patterns?
  30. Analysis of the forming behaviour of in-situ drawn sandwich sheets
  31. Accounting for capacity and flow of ecosystem services
  32. Ecosystem functions as indicators for heathland responses to nitrogen fertilisation
  33. Open-flow mixing and transfer operators
  34. Conveying the Ethics of Artificial Intelligence in K–12 and Academia: A Systematic Review of Teaching Methods
  35. Pragmatics broadly viewed
  36. 11. Methoden-Muster
  37. Sustainable Statehood: Reflections on Critical (Pre-)Conditions, Requirements and Design Options
  38. Propagating Maximum Capacities for Recommendation
  39. Fehler und Versuch. Parteispenden und ihre Regulierung
  40. Non-acceptances in context
  41. "Das Zeugnis Jesu"
  42. Redemption Restored: The Star in the Context of Modernity
  43. Altruism and egoism of the social planner in a dynamic context
  44. Conditionality of EU funds: an instrument to enforce EU fundamental values?