Active and semi-supervised data domain description

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Data domain description techniques aim at deriving concise descriptions of objects belonging to a category of interest. For instance, the support vector domain description (SVDD) learns a hypersphere enclosing the bulk of provided unlabeled data such that points lying outside of the ball are considered anomalous. However, relevant information such as expert and background knowledge remain unused in the unsupervised setting. In this paper, we rephrase data domain description as a semi-supervised learning task, that is, we propose a semi-supervised generalization of data domain description (SSSVDD) to process unlabeled and labeled examples. The corresponding optimization problem is non-convex. We translate it into an unconstraint, continuous problem that can be optimized accurately by gradient-based techniques. Furthermore, we devise an effective active learning strategy to query low-confidence observations. Our empirical evaluation on network intrusion detection and object recognition tasks shows that our SSSVDDs consistently outperform baseline methods in relevant learning settings.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases : European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I
EditorsWray Buntine, Marko Grobelnik, Dunja Mladenic, John Shawe-Taylor
Number of pages16
Place of PublicationBerlin, Heidelberg
PublisherSpringer Verlag
Publication date01.07.2009
Pages407-422
ISBN (print)978-3-642-04179-2
ISBN (electronic)978-3-642-04180-8
DOIs
Publication statusPublished - 01.07.2009
Externally publishedYes
EventEuropean Conference on Machine Learning and Knowledge Discovery in Databases - 2009 - Bled, Slovenia
Duration: 07.09.200911.09.2009
https://www.k4all.org/event/european-conference-on-machine-learning-and-principles-and-practice-of-knowledge-discovery-in-databases/

    Research areas

  • Informatics - Active Learning, Background knowledge, Baseline methods, Continuous problems, Data domain description, Empirical evaluations, Gradient based, Learning settings, Network intrusion detection, Optimization problems, Semi-supervised learning, upport vector domain description, Unlabeled data
  • Business informatics

Recently viewed

Publications

  1. Formative Perspectives on the Relation Between CSR Communication and CSR Practices
  2. Sensitivity to complexity - an important prerequisite of problem solving mathematics teaching
  3. Combining multiple investigative approaches to unravel functional responses to global change in the understorey of temperate forests
  4. Dispatching rule selection with Gaussian processes
  5. An extended analytical approach to evaluating monotonic functions of fuzzy numbers
  6. Parameters Estimation of a Lotka-Volterra Model in an Application for Market Graphics Processing Units
  7. Estimation and interpretation of a Heckman selection model with endogenous covariates
  8. Comparison of Bio-Inspired Algorithms in a Case Study for Optimizing Capacitor Bank Allocation in Electrical Power Distribution
  9. Changing the Administration from within:
  10. Positioning Improvement for a Laser Scanning System using cSORPD control
  11. An analytical approach to evaluating nonmonotonic functions of fuzzy numbers
  12. Enhancing implicit change detection through action
  13. Mining positional data streams
  14. Who can receive the pass? A computational model for quantifying availability in soccer
  15. Development of a scoring parameter to characterize data quality of centroids in high-resolution mass spectra
  16. Trait correlation network analysis identifies biomass allocation traits and stem specific length as hub traits in herbaceous perennial plants
  17. Material flow during constrained friction processing and its effects on the local properties of AM50 rods
  18. Applications of the Simultaneous Modular Approach in the Field of Material Flow Analysis
  19. Understanding reading as a form of language-use
  20. HAWK - hybrid question answering using linked data
  21. Identification of conductive fiber parameters with transcutaneous electrical nerve stimulation signal using RLS algorithm
  22. Introducing split orders and optimizing operational policies in robotic mobile fulfillment systems
  23. Dynamic priority based dispatching of AGVs in flexible job shops
  24. Stability analysis of a linear model predictive control and its application in a water recovery process
  25. Supporting discourse in a synchronous learning environment
  26. From Knowledge to Application
  27. What can conservation strategies learn from the ecosystem services approach?
  28. Modeling items for text comprehension assessment using confirmatory factor analysis
  29. Text Comprehension as a Mediator in Solving Mathematical Reality-Based Tasks