Joint optimization of an autoencoder for clustering and embedding

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

Deep embedded clustering has become a dominating approach to unsupervised categorization of objects with deep neural networks. The optimization of the most popular methods alternates between the training of a deep autoencoder and a k-means clustering of the autoencoder’s embedding. The diachronic setting, however, prevents the former to benefit from valuable information acquired by the latter. In this paper, we present an alternative where the autoencoder and the clustering are learned simultaneously. This is achieved by providing novel theoretical insight, where we show that the objective function of a certain class of Gaussian mixture models (GMM’s) can naturally be rephrased as the loss function of a one-hidden layer autoencoder thus inheriting the built-in clustering capabilities of the GMM. That simple neural network, referred to as the clustering module, can be integrated into a deep autoencoder resulting in a deep clustering model able to jointly learn a clustering and an embedding. Experiments confirm the equivalence between the clustering module and Gaussian mixture models. Further evaluations affirm the empirical relevance of our deep architecture as it outperforms related baselines on several data sets.

Original languageEnglish
JournalMachine Learning
Volume110
Issue number7
Pages (from-to)1901-1937
Number of pages37
ISSN0885-6125
DOIs
Publication statusPublished - 01.07.2021

Documents

DOI

Recently viewed

Publications

  1. Heuristic approximation and computational algorithms for closed networks
  2. Parsing Causal Models – An Instance Segmentation Approach
  3. Using haar wavelets for fault detection in technical processes
  4. Inversion of Fuzzy Neural Networks for the Reduction of Noise in the Control Loop for Automotive Applications
  5. ACL–adaptive correction of learning parameters for backpropagation based algorithms
  6. Finding Similar Movements in Positional Data Streams
  7. Learning Rotation Sensitive Neural Network for Deformed Objects' Detection in Fisheye Images
  8. Evaluating OWL 2 reasoners in the context of checking entity-relationship diagrams during software development
  9. Using trait-based filtering as a predictive framework for conservation
  10. A Multivariate Method for Dynamic System Analysis
  11. A geometric algorithm for the output functional controllability in general manipulation systems and mechanisms
  12. Modified dynamic programming approach for offline segmentation of long hydrometeorological time series
  13. Analysis of Complexity Reduction in Kalman Filters Through Decoupling Control With Chattered Inputs in PMSM
  14. Homogenization modeling of thin-layer-type microstructures
  15. Multi-view learning with dependent views
  16. Machine Learning and Knowledge Discovery in Databases
  17. Comparing the Sensitivity of Social Networks, Web Graphs, and Random Graphs with Respect to Vertex Removal
  18. Reading and Calculating in Word Problem Solving
  19. Using Complexity Metrics to Assess Silent Reading Fluency
  20. Microstructural development of as-cast AM50 during Constrained Friction Processing: grain refinement and influence of process parameters
  21. A coding scheme to analyse global text processing in computer supported collaborative learning: What eye movements can tell us
  22. Classical PI Controllers with Anti-Windup Techniques Applied on Level Systems
  23. A Gait Pattern Generator for Closed-Loop Position Control of a Soft Walking Robot
  24. A two-stage Kalman estimator for motion control using model predictive strategy
  25. Dynamically adjusting the k-values of the ATCS rule in a flexible flow shop scenario with reinforcement learning
  26. Switching Dispatching Rules with Gaussian Processes
  27. Modeling of lateness distributions depending on the sequencing method with respect to productivity effects
  28. Multi-view discriminative sequential learning
  29. Segment Introduction
  30. Parameters Estimation of a Lotka-Volterra Model in an Application for Market Graphics Processing Units
  31. Estimation and interpretation of a Heckman selection model with endogenous covariates
  32. An analytical approach to evaluating bivariate functions of fuzzy numbers with one local extremum
  33. Learning from Erroneous Examples: When and How do Students Benefit from them?
  34. Alternating between Partial and Complete Organization
  35. The fuzzy relationship of intelligence and problem solving in computer simulations
  36. Modeling and simulation of size effects in metallic glasses with non-local continuum mechanics theory