Discriminative Identification of Duplicates
Activity: Talk or presentation › Conference Presentations › Research
Peter Haider - Speaker
Ulf Brefeld - Speaker
Tobias Scheffer - Speaker
The problem of finding duplicates in data is ubiquitous in
data mining. We cast the problem of finding duplicates in sequential data
into a poly-cut problem on a fully connected graph. The edge weights can
be identified with parameterized pairwise similarities between objects
that are optimized by structural support vector machines on labeled
training sets. Our approach adapts the similarity measure to the data and
is independent of the number of clusters. We present three large margin
approximations of learning the pairwise similarities: an integrated QP-
formulation, a sequential multi-class approach and a pairwise classifier.
We report on experimental results
data mining. We cast the problem of finding duplicates in sequential data
into a poly-cut problem on a fully connected graph. The edge weights can
be identified with parameterized pairwise similarities between objects
that are optimized by structural support vector machines on labeled
training sets. Our approach adapts the similarity measure to the data and
is independent of the number of clusters. We present three large margin
approximations of learning the pairwise similarities: an integrated QP-
formulation, a sequential multi-class approach and a pairwise classifier.
We report on experimental results
18.09.2006 → 22.09.2006
Event
European Conference on Machine Learning
18.09.06 → 22.09.06
Berlin, Berlin, GermanyEvent: Conference
- Business informatics