TextCSN: A semi-supervised approach for text clustering using pairwise constraints and convolutional siamese network

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

Clustering is a key problem in several applications. Although this task is originally unsupervised, there are many proposals leveraging different supervision signals in order to improve clustering performance. Some semi-supervised clustering methods employ pairwise constraints to inform the learning algorithm about pairs of instances that should be in the same cluster (must-link constraints or similar instances) and pairs that should be in different clusters (cannot-link constraints or dissimilar instances). In many applications, to provide pairwise constraints is cheaper than asking users for explicit labels on the data. More recently, deep clustering methods have been proposed in the literature. Such methods consists in learning a deep neural representation of the input data in order to improve clustering. In this paper, we present TextCSN, a deep clustering approach that combines (i) a Convolutional Siamese Network (CSN) based on pairwise constraints to perform representation learning and (ii) the traditional K-Means algorithm for unsupervised clustering using the learned representation. As far as we know, this is the first semi-supervised deep learning method based on pairwise constraints applied on text clustering. By means of eight text clustering tasks, we assess our approach comparing it with two baselines: MPC-KMeans, a semi-supervised clustering algorithm; and ordinary K-Means algorithm. Results indicate that the proposed approach outperforms the baselines in six of these datasets, and its performance increases with the number of constraints provided.

OriginalspracheEnglisch
TitelThe 35th Annual ACM Symposium on Applied Computing : Brno, Czech Republic, March 30 - April 3, 2020
Anzahl der Seiten8
ErscheinungsortNew York
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum30.03.2020
Seiten1135-1142
ISBN (elektronisch)978-1-4503-6866-7
DOIs
PublikationsstatusErschienen - 30.03.2020
Extern publiziertJa
VeranstaltungAnnual ACM Symposium on Applied Computing - SAC 2020 - Brno, Tschechische Republik
Dauer: 30.03.202003.04.2020
Konferenznummer: 35
https://www.sigapp.org/sac/sac2020/

DOI

Zuletzt angesehen

Publikationen

  1. Collaborative decision making in sustainable flood risk management
  2. Nmap: A novel neighborhood preservation space-filling algorithm
  3. An indirectly controlled high-speed servo valve for IC engines using piezo actuators
  4. Schreibentwicklung in der Hochschule
  5. To use or not to use learning data
  6. Better performance of organic than conventional tomato varieties in single and mixed cropping
  7. How to Communicate Science to the Public?
  8. Credit Constraints and the Extensive Margins of Exports
  9. Belowground top-down and aboveground bottom-up effects structure multitrophic community relationships in a biodiverse forest
  10. Do children with deficits in basic cognitive functions profit from mixed age primary schools?
  11. The public and CCS
  12. Fruit Detection and Yield Mass Estimation from a UAV Based RGB Dense Cloud for an Apple Orchard
  13. Genetically based differentiation in growth of multiple non-native plant species along a steep environmental gradient
  14. About the Sense of Useless Software
  15. How to Do Materialistic Dialectics with Words?
  16. Revidierbarkeit, ein Muster der Hypersphäre
  17. Differential mortality rates in major and subthreshold depression
  18. Passion, Performance and Soberness
  19. Tree and mycorrhizal fungal diversity drive intraspecific and intraindividual trait variation in temperate forests
  20. Assessment of age-correlated occupational strain as a prerequisite for age-appropriate work organization
  21. Auditors' Perceptions of Client Firms
  22. Control of geometry deviation by stiffness variation in polymer deep drawing tools
  23. Telomere length is a strong predictor of foraging behavior in a long-lived seabird
  24. Integrating multiple elements of environmental justice into urban blue space planning using public participation geographic information systems
  25. The role of supervisor support for dealing with customer verbal aggression. Differences between ethnic minority and ethnic majority workers
  26. Nitrate Pollution of Groundwater Long Exceeding Trigger Value