Supervised clustering of streaming data for email batch detection

Publikation: Beiträge in SammelwerkenAufsätze in KonferenzbändenForschungbegutachtet

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.
OriginalspracheEnglisch
TitelProceedings of the 24th international conference on Machine learning
HerausgeberZoubin Ghahramani
Anzahl der Seiten8
ErscheinungsortNew York
VerlagAssociation for Computing Machinery, Inc
Erscheinungsdatum2007
Seiten345-352
ISBN (Print)978-1-59593-793-3
DOIs
PublikationsstatusErschienen - 2007
Extern publiziertJa
VeranstaltungProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, USA / Vereinigte Staaten
Dauer: 20.06.200724.06.2007
Konferenznummer: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Zuletzt angesehen

Forschende

  1. Tim Dornis

Publikationen

  1. Data-Generating Mechanisms Versus Constructively Defined Latent Variables in Multitrait–Multimethod Analysis:
  2. A geometric algorithm for the output functional controllability in general manipulation systems and mechanisms
  3. Contributions of declarative and procedural memory to accuracy and automatization during second language practice
  4. Discourse Analyses in Chat-based CSCL with Learning Protocols
  5. Modeling precipitation kinetics for multi-phase and multi-component systems using particle size distributions via a moving grid technique
  6. A Quadrant Approach of Camera Calibration Method for Depth Estimation Using a Stereo Vision System
  7. Dynamic Performance Analysis and Fault Ride-Through Enhancement by a Modified Fault Current Protection Scheme of a Grid-Connected Doubly Fed Induction Generator
  8. Inversion of Fuzzy Neural Networks for the Reduction of Noise in the Control Loop for Automotive Applications
  9. Enabling Road Condition Monitoring with an on-board Vehicle Sensor Setup
  10. Efficient and accurate ℓ p-norm multiple kernel learning
  11. Multi-view learning with dependent views
  12. Modelling the Complexity of Measurement Estimation Situations - A Theoretical Framework for the Estimation of Lengths
  13. Model inversion using fuzzy neural network with boosting of the solution
  14. Fixed-term Contracts and Wages Revisited Using Linked Employer-Employee Data from Germany
  15. Evaluating entity annotators using GERBIL
  16. Emergency detection based on probabilistic modeling in AAL environments
  17. Modern Baselines for SPARQL Semantic Parsing
  18. Qualitätssicherung und Entwicklung in der Elementarpädagogik
  19. Quantification of phototrophically grown Galdieria sulphuraria and other microalgae using diphenylamine
  20. Commitment Strategies for Sustainability
  21. Cyberpunk
  22. Sudoko mathematics for and done by younger students
  23. Credit Constraints and Margins of Import
  24. Circularity in Automotive Electronics Design
  25. Empirical research on mathematical modelling
  26. Part III: Motion and control of autonomous unmanned aerial systems as a challenge in Industry 4.0 process
  27. Systemprogrammierung I
  28. Is Calluna vulgaris a suitable bio-monitor of management-mediated nutrient pools in heathland ecosystems?
  29. Stability matters: A dynamic process view on self-efficacy in training transfer.
  30. Anticipated imitation of multiple agents
  31. Characteristics of comprehension processes in mathematical modelling

Presse / Medien

  1. Duration