Supervised clustering of streaming data for email batch detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

Original languageEnglish
Title of host publicationProceedings of the 24th international conference on Machine learning
EditorsZoubin Ghahramani
Number of pages8
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2007
Pages345-352
ISBN (print)978-1-59593-793-3
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, United States
Duration: 20.06.200724.06.2007
Conference number: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Recently viewed

Activities

  1. Planar Multipole Resonance Probe: Comparison of a Functional Analytic Approach and Full 3D Electromagnetic Field Simulations
  2. Activist Sense and Affective Relaying in Alternative Media Practices
  3. International Flusser Lectures Day - 2010
  4. Praxis Englisch (Fachzeitschrift)
  5. Project Workshop on "Worker Flows, Match Quality, and Productivity" - 2019
  6. Validation of School Inspection Frameworks and Methods
  7. Traces of varying task formulations within the texts of students
  8. Organizing temporary co-presence to induce and cope with uncertainty in creative processes
  9. 28th American Conference on Information Systems 2022 - AMCIS 2022
  10. Engaging with the other ... or not. The representation of foreign nations in ABCs and picturebooks.
  11. Journal of Applied Ecology (Fachzeitschrift)
  12. Advances in large scale production of mircoalgae and mussels (Mytilus edulis)
  13. BioDiva Start-up Workshop 2010
  14. The XEROCHORE Conference on Drought Science and Policies - 2010
  15. Digital Capitalism meets Leberkaspeppi: Temporal Orientations in Business Models as a Source of Platform Power in Mature Industries
  16. Digital capitalism meets “Leberkaspepi”: Temporal orientations in business models as a source of platform power in mature industries
  17. International Relations (Fachzeitschrift)
  18. Pitfalls and potential of institutional change: Rain-index insurance and the sustainability of rangeland management
  19. University of North Carolina at Chapel Hill
  20. Vortrag: Nachhaltigkeitsbewusstsein unternehmensintern steigern
  21. Life Cycle Assessment and Cost Accounting on Corporations for Eco-Efficient Production Systems
  22. Biophysical variability and politico-economic singularity: Responses of livestock numbers in South Mongolian nomadic herding
  23. University of Copenhagen