Supervised clustering of streaming data for email batch detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

Original languageEnglish
Title of host publicationProceedings of the 24th international conference on Machine learning
EditorsZoubin Ghahramani
Number of pages8
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2007
Pages345-352
ISBN (print)978-1-59593-793-3
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, United States
Duration: 20.06.200724.06.2007
Conference number: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Recently viewed

Publications

  1. Explaining Disagreement on Interest Rates in a Taylor-Rule Setting
  2. Firm wage premia, industrial relations, and rent sharing in Germany
  3. A localized boundary element method for the floating body problem
  4. Web-scale extension of RDF knowledge bases from templated websites
  5. Evidence on copula-based double-hurdle models with flexible margins
  6. Microstructure and corrosion of AZ91 with small amounts of cerium
  7. Comparing marginal effects between different models and/or samples
  8. The Influence Of Product Reuse On Production Planning and Control
  9. BUSINESS MODELS IN BANKING: A CLUSTER ANALYSIS USING ARCHIVAL DATA
  10. Dynamic control of internal force for visco-elastic contact grasps
  11. Distributable Modular Software Framework for Manufacturing Systems
  12. A welfare analysis of electricity transmission planning in Germany
  13. Das Erlernen digitaler Gesundheitskompetenz im schulischen Kontext
  14. Changeability of pre-service teachers’ beliefs about multilingualism
  15. Towards a Heuristic for Scheduling Offshore Installation Processes
  16. Introducing parametric uncertainty into a nonlinear friction model
  17. Pathways for Germany’s low-carbon energy transformation towards 2050
  18. Simulation of composite hot extrusion with high reinforcing Volumes
  19. Deep Rolling for Tailoring Residual Stresses of AA2024 Sheet Metals
  20. Predicate‐based model of problem‐solving for robotic actions planning
  21. Separating Cognitive and Content Domains in Mathematical Competence
  22. Neuere Ansätze des 'Verstehens' in der 'Historischen Bildungsforschung'
  23. Dynamic Inversion-Enhanced U-Control of Quadrotor Trajectory Tracking
  24. Microstructure, mechanical and corrosion properties of Mg-Gd-Zn alloys
  25. Make it your Break! Benefits of Person-Break Fit for Post-Break Affect
  26. Kompetenzorientiertes Fachwissen von Mathematik-Lehramtsstudierenden