Supervised clustering of streaming data for email batch detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

Original languageEnglish
Title of host publicationProceedings of the 24th international conference on Machine learning
EditorsZoubin Ghahramani
Number of pages8
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2007
Pages345-352
ISBN (print)978-1-59593-793-3
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, United States
Duration: 20.06.200724.06.2007
Conference number: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Recently viewed

Publications

  1. Automatic enumeration of all connected subgraphs.
  2. Probabilistic approach to modelling of recession curves
  3. Performance and Comfort when Using Motion-Controlled Tools in Complex Tasks
  4. Graphism and Flatness. The Line as Mediator between Time and Space, Intuition and Concept
  5. Performance analysis for loss systems with many subscribers and concurrent services
  6. A localized boundary element method for the floating body problem
  7. Resource extraction technologies - is a more responsible path of development possible?
  8. Online-scheduling using past and real-time data
  9. Promising practices for dealing with complexity in research for development
  10. The identification of up-And downstream industries using input-output tables and a firm-level application to minority shareholdings
  11. Integrating the underlying structure of stochasticity into community ecology
  12. Species composition and forest structure explain the temperature sensitivity patterns of productivity in temperate forests
  13. A survey of empirical studies using transaction level data on exports and imports
  14. A geometric approach for the design and control of an electromagnetic actuator to optimize its dynamic performance
  15. Grounds different from, though equally solid with
  16. Semantic Evaluation Services for Web-Based Exercises
  17. Duration of Organizational Decision Processes in Organizations in View of Simulation Calculations
  18. Towards productive functions?
  19. Exploring the processes of emergent leadership in a netball team
  20. Biodegradability and genotoxicity of surface functionalized colloidal silica (SiO2) particles in the aquatic environment
  21. Theoretical Practices
  22. A piezo servo hydraulic actuator for use in camless combustion engines and its control with MPC
  23. Knowledge Generation and Sustainable Development
  24. FaQuAD
  25. An interdisciplinary methodological guide for quantifying associations between ecosystem services
  26. Guest Editorial