Supervised clustering of streaming data for email batch detection

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.
OriginalspracheEnglisch
TitelProceedings of the 24th international conference on Machine learning
HerausgeberZoubin Ghahramani
Anzahl der Seiten8
ErscheinungsortNew York
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum2007
Seiten345-352
ISBN (Print)978-1-59593-793-3
DOIs
PublikationsstatusErschienen - 2007
Extern publiziertJa
VeranstaltungProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, USA / Vereinigte Staaten
Dauer: 20.06.200724.06.2007
Konferenznummer: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Zuletzt angesehen

Publikationen

  1. Ant colony optimization algorithm and artificial immune system applied to a robot route
  2. Development of a Didactic Graphical Simulation Interface on MATLAB for Systems Control
  3. Detection and mapping of water pollution variation in the Nile Delta using multivariate clustering and GIS techniques
  4. Multidimensional Cross-Recurrence Quantification Analysis (MdCRQA)–A Method for Quantifying Correlation between Multivariate Time-Series
  5. Data-Generating Mechanisms Versus Constructively Defined Latent Variables in Multitrait–Multimethod Analysis:
  6. Graph Conditional Variational Models: Too Complex for Multiagent Trajectories?
  7. Using learning protocols for knowledge acquisition and problem solving with individual and group incentives
  8. Modeling and simulation of deformation behavior, orientation gradient development and heterogeneous hardening in thin sheets with coarse texture
  9. A geometric algorithm for the output functional controllability in general manipulation systems and mechanisms
  10. Contributions of declarative and procedural memory to accuracy and automatization during second language practice
  11. Towards a Dynamic Interpretation of Subjective and Objective Values
  12. Analysis of priority rule-based scheduling in dual-resource-constrained shop-floor scenarios
  13. Discourse Analyses in Chat-based CSCL with Learning Protocols
  14. Modeling precipitation kinetics for multi-phase and multi-component systems using particle size distributions via a moving grid technique
  15. Using haar wavelets for fault detection in technical processes
  16. A Quadrant Approach of Camera Calibration Method for Depth Estimation Using a Stereo Vision System