Supervised clustering of streaming data for email batch detection

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.
OriginalspracheEnglisch
TitelProceedings of the 24th international conference on Machine learning
HerausgeberZoubin Ghahramani
Anzahl der Seiten8
ErscheinungsortNew York
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum2007
Seiten345-352
ISBN (Print)978-1-59593-793-3
DOIs
PublikationsstatusErschienen - 2007
Extern publiziertJa
VeranstaltungProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, USA / Vereinigte Staaten
Dauer: 20.06.200724.06.2007
Konferenznummer: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Zuletzt angesehen

Publikationen

  1. Modified dynamic programming approach for offline segmentation of long hydrometeorological time series
  2. A geometric algorithm for the output functional controllability in general manipulation systems and mechanisms
  3. Analysis of Complexity Reduction in Kalman Filters Through Decoupling Control With Chattered Inputs in PMSM
  4. Substructure, subgraph, and walk counts as measures of the complexity of graphs and molecules.
  5. Modeling precipitation kinetics for multi-phase and multi-component systems using particle size distributions via a moving grid technique
  6. Using haar wavelets for fault detection in technical processes
  7. Homogenization modeling of thin-layer-type microstructures
  8. Multi-view learning with dependent views
  9. Machine Learning and Knowledge Discovery in Databases
  10. Model inversion using fuzzy neural network with boosting of the solution
  11. Using Complexity Metrics to Assess Silent Reading Fluency
  12. Comparing the Sensitivity of Social Networks, Web Graphs, and Random Graphs with Respect to Vertex Removal
  13. Computational modeling of material flow networks
  14. A coding scheme to analyse global text processing in computer supported collaborative learning: What eye movements can tell us
  15. Reading and Calculating in Word Problem Solving
  16. XOperator - An extensible semantic agent for instant messaging networks
  17. Microstructural development of as-cast AM50 during Constrained Friction Processing: grain refinement and influence of process parameters
  18. A multi input sliding mode control for Peltier Cells using a cold-hot sliding surface
  19. Classical PI Controllers with Anti-Windup Techniques Applied on Level Systems