Supervised clustering of streaming data for email batch detection

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.
OriginalspracheEnglisch
TitelProceedings of the 24th international conference on Machine learning
HerausgeberZoubin Ghahramani
Anzahl der Seiten8
ErscheinungsortNew York
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum2007
Seiten345-352
ISBN (Print)978-1-59593-793-3
DOIs
PublikationsstatusErschienen - 2007
Extern publiziertJa
VeranstaltungProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, USA / Vereinigte Staaten
Dauer: 20.06.200724.06.2007
Konferenznummer: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Zuletzt angesehen

Publikationen

  1. Inside-sediment partitioning of PAH, PCB and organochlorine compounds and inferences on sampling and normalization methods
  2. Learning from partially annotated sequences
  3. The generative drawing principle in multimedia learning
  4. From Knowledge to Application
  5. Bifactor Models for Predicting Criteria by General and Specific Factors
  6. On the Appropriate Methodologies for Data Science Projects
  7. Constructing strangeness
  8. Optimizing price levels in e-commerce applications
  9. Calibration of a simple method for determining ammonia loss in the field
  10. Using smart services as a key enabler for collaboration in global production networks
  11. Schreibt Ihr Unternehmen auch "grüne" Zahlen?
  12. Guest editorial
  13. Adaptor device for transmitting e.g. blood pressure data of human body from blood pressure measuring device of data communication system to e.g. personal computer, has controller for controlling transmission of data to communication module
  14. Dialectical conditions
  15. Panel Cointegration Testing in the Presence of a Time Trend
  16. Tree species and functional traits but not species richness affect interrill erosion processes in young subtropical forests
  17. Connected Text Reading and Differences in Text Reading Fluency in Adult Readers
  18. Biocultural approaches to pollinator conservation
  19. Market and network corruption
  20. Development and Validation of the Short Form of the Later Life Workplace Index (LLWI-SF)
  21. Global trait–environment relationships of plant communities
  22. Gemachter oder gelebter Tourismus?
  23. The research potential of new types of enterprise data based on surveys from official statistics in Germany
  24. The impact of goal specificity and goal type on learning outcome and cognitive load
  25. Matching between oral inward–outward movements of object names and oral movements associated with denoted objects
  26. How to Assess Knowledge Cumulation in Environmental Governance Research? Conceptual and Empirical Explorations
  27. Diverse values and a common utopia
  28. The Structure and Behavioural Effects of Revealed Social Identity Preferences
  29. Situated Institutions: The Role of Place, Space and Embeddedness in Institutional Dynamics
  30. Discourses for deep transformation
  31. Impact of prescribed burning on the nutrient balance of heathlands with particular reference to nitrogen and phosphorus