Active and semi-supervised data domain description

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Data domain description techniques aim at deriving concise descriptions of objects belonging to a category of interest. For instance, the support vector domain description (SVDD) learns a hypersphere enclosing the bulk of provided unlabeled data such that points lying outside of the ball are considered anomalous. However, relevant information such as expert and background knowledge remain unused in the unsupervised setting. In this paper, we rephrase data domain description as a semi-supervised learning task, that is, we propose a semi-supervised generalization of data domain description (SSSVDD) to process unlabeled and labeled examples. The corresponding optimization problem is non-convex. We translate it into an unconstraint, continuous problem that can be optimized accurately by gradient-based techniques. Furthermore, we devise an effective active learning strategy to query low-confidence observations. Our empirical evaluation on network intrusion detection and object recognition tasks shows that our SSSVDDs consistently outperform baseline methods in relevant learning settings.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases : European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I
EditorsWray Buntine, Marko Grobelnik, Dunja Mladenic, John Shawe-Taylor
Number of pages16
Place of PublicationBerlin, Heidelberg
PublisherSpringer Verlag
Publication date01.07.2009
Pages407-422
ISBN (print)978-3-642-04179-2
ISBN (electronic)978-3-642-04180-8
DOIs
Publication statusPublished - 01.07.2009
Externally publishedYes
EventEuropean Conference on Machine Learning and Knowledge Discovery in Databases - 2009 - Bled, Slovenia
Duration: 07.09.200911.09.2009
https://www.k4all.org/event/european-conference-on-machine-learning-and-principles-and-practice-of-knowledge-discovery-in-databases/

    Research areas

  • Informatics - Active Learning, Background knowledge, Baseline methods, Continuous problems, Data domain description, Empirical evaluations, Gradient based, Learning settings, Network intrusion detection, Optimization problems, Semi-supervised learning, upport vector domain description, Unlabeled data
  • Business informatics

Recently viewed

Publications

  1. Formative Perspectives on the Relation Between CSR Communication and CSR Practices
  2. Experimentally established correlation of friction surfacing process temperature and deposit geometry
  3. Development of a scoring parameter to characterize data quality of centroids in high-resolution mass spectra
  4. Partitioned beta diversity patterns of plants across sharp and distinct boundaries of quartz habitat islands
  5. What can conservation strategies learn from the ecosystem services approach?
  6. An Adaptive and Optimized Switching Observer for Sensorless Control of an Electromagnetic Valve Actuator in Camless Internal Combustion Engines
  7. ASSESS — automatic self-assessment using linked data
  8. Wavelet functions for rejecting spurious values
  9. Preventive Diagnostics for cardiovascular diseases based on probabilistic methods and description logic
  10. Knowledge-Enhanced Language Models Are Not Bias-Proof
  11. An analytical approach to evaluating monotonic functions of fuzzy numbers
  12. Self-regulation in error management training: emotion control and metacognition as mediators of performance effects
  13. Spaces for challenging experiences, indeterminacy, and experimentation
  14. Commitment to grand challenges in fluid forms of organizing
  15. A structural property of the wavelet packet transform method to localise incoherency of a signal
  16. Quantum Computing and the Analog/Digital Distinction
  17. A Multimethod Latent State-Trait Model for Structurally Different and Interchangeable Methods
  18. Factor structure and measurement invariance of the Students’ Self-report Checklist of Social and Learning Behaviour (SSL)
  19. Mechanism of dynamic recrystallization and evolution of texture in the hot working domains of the processing map for Mg-4Al-2Ba-2Ca Alloy
  20. A comparison of ML, WLSMV and Bayesian methods for multilevel structural equation models in small samples: A simulation study
  21. AGDISTIS - Graph-based disambiguation of named entities using linked data
  22. Species constancy depends on plot size - A problem for vegetation classification and how it can be solved
  23. A Cross-Classified CFA-MTMM Model for Structurally Different and Nonindependent Interchangeable Methods
  24. Using Conjoint Analysis to Elicit Preferences for Occupational Health Services in Small and Microenterprises
  25. Combining multiple investigative approaches to unravel functional responses to global change in the understorey of temperate forests