Discriminative Identification of Duplicates

Activity: Talk or presentationConference PresentationsResearch

Peter Haider - Speaker

Ulf Brefeld - Speaker

Tobias Scheffer - Speaker

The problem of finding duplicates in data is ubiquitous in
data mining. We cast the problem of finding duplicates in sequential data
into a poly-cut problem on a fully connected graph. The edge weights can
be identified with parameterized pairwise similarities between objects
that are optimized by structural support vector machines on labeled
training sets. Our approach adapts the similarity measure to the data and
is independent of the number of clusters. We present three large margin
approximations of learning the pairwise similarities: an integrated QP-
formulation, a sequential multi-class approach and a pairwise classifier.
We report on experimental results
18.09.200622.09.2006

Event

European Conference on Machine Learning

18.09.0622.09.06

Berlin, Berlin, Germany

Event: Conference