Supervised clustering of streaming data for email batch detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

Original languageEnglish
Title of host publicationProceedings of the 24th international conference on Machine learning
EditorsZoubin Ghahramani
Number of pages8
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2007
Pages345-352
ISBN (print)978-1-59593-793-3
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, United States
Duration: 20.06.200724.06.2007
Conference number: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Recently viewed

Publications

  1. A Control Scheme for PMSMs using Model Predictive Control and a Feedforward Action in the Presence of Saturated Inputs
  2. Using cross-recurrence quantification analysis to compute similarity measures for time series of unequal length with applications to sleep stage analysis
  3. Transductive support vector machines for structured variables
  4. Supporting the Development and Realization of Data-Driven Business Models with Enterprise Architecture Modeling and Management
  5. 7th open challenge on question answering over linked data (QALD-7)
  6. Using transition management concepts for the evaluation of intersecting policy domains ('grand challenges')
  7. A cognitive mapping approach to understanding public objection to energy infrastructure
  8. Performance Saga: Interview 01
  9. Recognition and approach responses toward threatening objects
  10. On the Direct Kinematics Problem of Parallel Mechanisms
  11. Processing of CSR communication
  12. Sprachen in Liechtenstein
  13. Proof of concept
  14. Proceedings of the 7th Natural Language Interfaces for the Web of Data (NLIWoD)
  15. Latent trees for coreference resolution
  16. Sustainability performance measurement – a framework for context-specific applications
  17. Current overview of research on priority effects and its relevance to restoration
  18. New trends in pragmatics
  19. Current issues in competence modeling and assessment
  20. Worse is worse and better doesn't matter?
  21. Halb voll oder halb leer?
  22. Handlungstheorie
  23. The Influence of Adjustment Costs on Labour Adjustment: An Analysis Using Panel Data for Manufacturing Establishments in Lower Saxony