Supervised clustering of streaming data for email batch detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

Original languageEnglish
Title of host publicationProceedings of the 24th international conference on Machine learning
EditorsZoubin Ghahramani
Number of pages8
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2007
Pages345-352
ISBN (print)978-1-59593-793-3
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, United States
Duration: 20.06.200724.06.2007
Conference number: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Recently viewed

Publications

  1. Neural Combinatorial Optimization on Heterogeneous Graphs
  2. Implicit statistical learning and working memory predict EFL development and written task outcomes in adolescents
  3. Unidimensional and Multidimensional Methods for Recurrence Quantification Analysis with crqa
  4. Different approaches to learning from errors: Comparing the effectiveness of high reliability and error management approaches
  5. Dynamic adjustment of dispatching rule parameters in flow shops with sequence-dependent set-up times
  6. Evaluating the construct validity of Objective Personality Tests using a multitrait-multimethod-Multioccasion-(MTMM-MO)-approach
  7. A Wavelet Packet Tree Denoising Algorithm for Images of Atomic-Force Microscopy
  8. A coding scheme to analyse global text processing in computer supported collaborative learning: What eye movements can tell us
  9. Integrating Mobile Devices into AAL-Environments using Knowledge based Assistance Systems
  10. Ant colony optimization algorithm and artificial immune system applied to a robot route
  11. Development of a Didactic Graphical Simulation Interface on MATLAB for Systems Control
  12. Knowledge Graph Question Answering Using Graph-Pattern Isomorphism
  13. Graph Conditional Variational Models: Too Complex for Multiagent Trajectories?
  14. Random measurement and prediction errors limit the practical relevance of two velocity sensors to estimate the 1RM back squat
  15. Towards a Dynamic Interpretation of Subjective and Objective Values
  16. Substructure, subgraph, and walk counts as measures of the complexity of graphs and molecules.
  17. Using Decision Trees and Reinforcement Learning for the Dynamic Adjustment of Composite Sequencing Rules in a Flexible Manufacturing System
  18. Building Assistance Systems using Distributed Knowledge Representations
  19. DialogueMaps: Supporting interactive transdisciplinary dialogues with a web-based tool for multi-layer knowledge maps
  20. The learning net - an interactive representation of shared knowledge
  21. Set-oriented numerical computation of rotation sets
  22. Isocodal and isospectral points, edges, and pairs in graphs and how to cope with them in computerized symmetry recognition
  23. Making an Impression Through Openness
  24. A model predictive control for an aggregate actuator with a self-tuning initial condition procedure in combustion engines
  25. A discrete approximate solution for the asymptotic tracking problem in affine nonlinear systems
  26. Multi-Parallel Sending Coils for Movable Receivers in Inductive Charging Systems
  27. Control of a Sun Tracking Robot Based on Adaptive Sliding Mode Control with Kalman Filtering and Model Predictive Control
  28. Anomaly detection in formed sheet metals using convolutional autoencoders
  29. Framework for setting up and operating biobanks
  30. Perfect anti-windup in output tracking scheme with preaction
  31. Introducing a multivariate model for predicting driving performance
  32. Semantic Parsing for Knowledge Graph Question Answering with Large Language Models
  33. Reading and Calculating in Word Problem Solving
  34. Selection and Recognition of Statistically Defined Signals in Learning Systems
  35. Inversion of fuzzy neural networks for the reduction of noise in the control loop
  36. Age-related differences in processing visual device and task characteristics when using technical devices