Active and semi-supervised data domain description

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

Data domain description techniques aim at deriving concise descriptions of objects belonging to a category of interest. For instance, the support vector domain description (SVDD) learns a hypersphere enclosing the bulk of provided unlabeled data such that points lying outside of the ball are considered anomalous. However, relevant information such as expert and background knowledge remain unused in the unsupervised setting. In this paper, we rephrase data domain description as a semi-supervised learning task, that is, we propose a semi-supervised generalization of data domain description (SSSVDD) to process unlabeled and labeled examples. The corresponding optimization problem is non-convex. We translate it into an unconstraint, continuous problem that can be optimized accurately by gradient-based techniques. Furthermore, we devise an effective active learning strategy to query low-confidence observations. Our empirical evaluation on network intrusion detection and object recognition tasks shows that our SSSVDDs consistently outperform baseline methods in relevant learning settings.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases : European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I
EditorsWray Buntine, Marko Grobelnik, Dunja Mladenic, John Shawe-Taylor
Number of pages16
Place of PublicationBerlin, Heidelberg
PublisherSpringer Verlag
Publication date01.07.2009
Pages407-422
ISBN (print)978-3-642-04179-2
ISBN (electronic)978-3-642-04180-8
DOIs
Publication statusPublished - 01.07.2009
Externally publishedYes
EventEuropean Conference on Machine Learning and Knowledge Discovery in Databases - 2009 - Bled, Slovenia
Duration: 07.09.200911.09.2009
https://www.k4all.org/event/european-conference-on-machine-learning-and-principles-and-practice-of-knowledge-discovery-in-databases/

    Research areas

  • Informatics - Active Learning, Background knowledge, Baseline methods, Continuous problems, Data domain description, Empirical evaluations, Gradient based, Learning settings, Network intrusion detection, Optimization problems, Semi-supervised learning, upport vector domain description, Unlabeled data
  • Business informatics

Recently viewed

Publications

  1. TextGraphs 2024 Shared Task on Text-Graph Representations for Knowledge Graph Question Answering
  2. Analyzing User Journey Data In Digital Health: Predicting Dropout From A Digital CBT-I Intervention
  3. Recognition and approach responses toward threatening objects
  4. Effectiveness of a guided multicomponent internet and mobile gratitude training program - A pragmatic randomized controlled trial
  5. Formative Perspectives on the Relation Between CSR Communication and CSR Practices
  6. Multi-view discriminative sequential learning
  7. Sensitivity to complexity - an important prerequisite of problem solving mathematics teaching
  8. Web-scale extension of RDF knowledge bases from templated websites
  9. Clause identification using entropy guided transformation learning
  10. Intellectual property issues in the use and distribution of remote sensing data
  11. Mathematical Modeling for Robot 3D Laser Scanning in Complete Darkness Environments to Advance Pipeline Inspection
  12. Constraints are the solution, not the problem
  13. Investigation and modeling of the material behavior due to evolving dislocation microstructures in fcc and bcc metals
  14. A Service-oriented Search framework for full text, geospatial and semantic search
  15. Parameters Estimation of a Lotka-Volterra Model in an Application for Market Graphics Processing Units
  16. Empowering materials processing and performance from data and AI
  17. Changes in the Complexity of Limb Movements during the First Year of Life across Different Tasks
  18. Estimation and interpretation of a Heckman selection model with endogenous covariates
  19. Comparison of Bio-Inspired Algorithms in a Case Study for Optimizing Capacitor Bank Allocation in Electrical Power Distribution
  20. The signal location task as a method quantifying the distribution of attention
  21. Who can receive the pass? A computational model for quantifying availability in soccer
  22. Changing the Administration from within:
  23. FaST: A linear time stack trace alignment heuristic for crash report deduplication
  24. Towards a Bayesian Student Model for Detecting Decimal Misconceptions
  25. Mining positional data streams
  26. Universal Threshold Calculation for Fingerprinting Decoders using Mixture Models
  27. Analyzing math teacher students' sensitivity for aspects of the complexity of problem oriented mathematics instruction
  28. Real-time RDF extraction from unstructured data streams
  29. Combining a PI Controller with an Adaptive Feedforward Control in PMSM
  30. “Ideation is Fine, but Execution is Key”
  31. Age effects on controlling tools with sensorimotor transformations
  32. Applications of the Simultaneous Modular Approach in the Field of Material Flow Analysis
  33. Assessing Effects Through Semi-Field and Field Toxicity Testing
  34. Understanding reading as a form of language-use
  35. A new way of assessing the interaction of a metallic phase precursor with a modified oxide support substrate as a source of information for predicting metal dispersion