Active and semi-supervised data domain description

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Data domain description techniques aim at deriving concise descriptions of objects belonging to a category of interest. For instance, the support vector domain description (SVDD) learns a hypersphere enclosing the bulk of provided unlabeled data such that points lying outside of the ball are considered anomalous. However, relevant information such as expert and background knowledge remain unused in the unsupervised setting. In this paper, we rephrase data domain description as a semi-supervised learning task, that is, we propose a semi-supervised generalization of data domain description (SSSVDD) to process unlabeled and labeled examples. The corresponding optimization problem is non-convex. We translate it into an unconstraint, continuous problem that can be optimized accurately by gradient-based techniques. Furthermore, we devise an effective active learning strategy to query low-confidence observations. Our empirical evaluation on network intrusion detection and object recognition tasks shows that our SSSVDDs consistently outperform baseline methods in relevant learning settings.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases : European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I
EditorsWray Buntine, Marko Grobelnik, Dunja Mladenic, John Shawe-Taylor
Number of pages16
Place of PublicationBerlin, Heidelberg
PublisherSpringer Verlag
Publication date01.07.2009
Pages407-422
ISBN (print)978-3-642-04179-2
ISBN (electronic)978-3-642-04180-8
DOIs
Publication statusPublished - 01.07.2009
Externally publishedYes
EventEuropean Conference on Machine Learning and Knowledge Discovery in Databases - 2009 - Bled, Slovenia
Duration: 07.09.200911.09.2009
https://www.k4all.org/event/european-conference-on-machine-learning-and-principles-and-practice-of-knowledge-discovery-in-databases/

    Research areas

  • Informatics - Active Learning, Background knowledge, Baseline methods, Continuous problems, Data domain description, Empirical evaluations, Gradient based, Learning settings, Network intrusion detection, Optimization problems, Semi-supervised learning, upport vector domain description, Unlabeled data
  • Business informatics

Recently viewed

Publications

  1. Faulty Process Detection Using Machine Learning Techniques
  2. Contextual movement models based on normalizing flows
  3. Lyapunov Convergence Analysis for Asymptotic Tracking Using Forward and Backward Euler Approximation of Discrete Differential Equations
  4. A Lean Convolutional Neural Network for Vehicle Classification
  5. Analyzing User Journey Data In Digital Health: Predicting Dropout From A Digital CBT-I Intervention
  6. Recognition and approach responses toward threatening objects
  7. Effectiveness of a guided multicomponent internet and mobile gratitude training program - A pragmatic randomized controlled trial
  8. Formative Perspectives on the Relation Between CSR Communication and CSR Practices
  9. Global Finite-Time Stabilization of Planar Linear Systems With Actuator Saturation
  10. Sensitivity to complexity - an important prerequisite of problem solving mathematics teaching
  11. Towards a spatial understanding of identity play
  12. Supporting the Development and Implementation of a Digitalization Strategy in SMEs through a Lightweight Architecture-based Method
  13. Dispatching rule selection with Gaussian processes
  14. Web-scale extension of RDF knowledge bases from templated websites
  15. Interpreting Strings, Weaving Threads
  16. Constraints are the solution, not the problem
  17. An extended analytical approach to evaluating monotonic functions of fuzzy numbers
  18. Advantages and disadvantages of different text coding procedures for research and practice in a school context
  19. Parameters Estimation of a Lotka-Volterra Model in an Application for Market Graphics Processing Units
  20. Robust Flatness Based Control of an Electromagnetic Linear Actuator Using Adaptive PID Controller
  21. Segment Introduction
  22. Empowering materials processing and performance from data and AI
  23. Changes in the Complexity of Limb Movements during the First Year of Life across Different Tasks
  24. Comparison of Bio-Inspired Algorithms in a Case Study for Optimizing Capacitor Bank Allocation in Electrical Power Distribution
  25. Changing the Administration from within:
  26. Estimation and interpretation of a Heckman selection model with endogenous covariates
  27. Mining positional data streams
  28. From "cracking the orthographic code" to "playing with language"
  29. Who can receive the pass? A computational model for quantifying availability in soccer
  30. An analytical approach to evaluating nonmonotonic functions of fuzzy numbers
  31. Enhancing implicit change detection through action
  32. Development of a scoring parameter to characterize data quality of centroids in high-resolution mass spectra
  33. FaST: A linear time stack trace alignment heuristic for crash report deduplication
  34. Understanding the properties of isospectral points and pairs in graphs
  35. Analyzing math teacher students' sensitivity for aspects of the complexity of problem oriented mathematics instruction
  36. Trait correlation network analysis identifies biomass allocation traits and stem specific length as hub traits in herbaceous perennial plants
  37. The signal location task as a method quantifying the distribution of attention