Active and semi-supervised data domain description
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I. ed. / Wray Buntine; Marko Grobelnik; Dunja Mladenic; John Shawe-Taylor. Berlin, Heidelberg: Springer, 2009. p. 407-422 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5781 LNAI, No. PART 1).
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Active and semi-supervised data domain description
AU - Görnitz, Nico
AU - Kloft, Marius
AU - Brefeld, Ulf
PY - 2009/7/1
Y1 - 2009/7/1
N2 - Data domain description techniques aim at deriving concise descriptions of objects belonging to a category of interest. For instance, the support vector domain description (SVDD) learns a hypersphere enclosing the bulk of provided unlabeled data such that points lying outside of the ball are considered anomalous. However, relevant information such as expert and background knowledge remain unused in the unsupervised setting. In this paper, we rephrase data domain description as a semi-supervised learning task, that is, we propose a semi-supervised generalization of data domain description (SSSVDD) to process unlabeled and labeled examples. The corresponding optimization problem is non-convex. We translate it into an unconstraint, continuous problem that can be optimized accurately by gradient-based techniques. Furthermore, we devise an effective active learning strategy to query low-confidence observations. Our empirical evaluation on network intrusion detection and object recognition tasks shows that our SSSVDDs consistently outperform baseline methods in relevant learning settings.
AB - Data domain description techniques aim at deriving concise descriptions of objects belonging to a category of interest. For instance, the support vector domain description (SVDD) learns a hypersphere enclosing the bulk of provided unlabeled data such that points lying outside of the ball are considered anomalous. However, relevant information such as expert and background knowledge remain unused in the unsupervised setting. In this paper, we rephrase data domain description as a semi-supervised learning task, that is, we propose a semi-supervised generalization of data domain description (SSSVDD) to process unlabeled and labeled examples. The corresponding optimization problem is non-convex. We translate it into an unconstraint, continuous problem that can be optimized accurately by gradient-based techniques. Furthermore, we devise an effective active learning strategy to query low-confidence observations. Our empirical evaluation on network intrusion detection and object recognition tasks shows that our SSSVDDs consistently outperform baseline methods in relevant learning settings.
KW - Informatics
KW - Active Learning
KW - Background knowledge
KW - Baseline methods
KW - Continuous problems
KW - Data domain description
KW - Empirical evaluations
KW - Gradient based
KW - Learning settings
KW - Network intrusion detection
KW - Optimization problems
KW - Semi-supervised learning
KW - upport vector domain description
KW - Unlabeled data
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=70350627210&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04180-8_44
DO - 10.1007/978-3-642-04180-8_44
M3 - Article in conference proceedings
AN - SCOPUS:70350627210
SN - 978-3-642-04179-2
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 407
EP - 422
BT - Machine Learning and Knowledge Discovery in Databases
A2 - Buntine, Wray
A2 - Grobelnik, Marko
A2 - Mladenic, Dunja
A2 - Shawe-Taylor, John
PB - Springer
CY - Berlin, Heidelberg
T2 - European Conference on Machine Learning and Knowledge Discovery in Databases - 2009
Y2 - 7 September 2009 through 11 September 2009
ER -