Supervised clustering of streaming data for email batch detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

Original languageEnglish
Title of host publicationProceedings of the 24th international conference on Machine learning
EditorsZoubin Ghahramani
Number of pages8
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2007
Pages345-352
ISBN (print)978-1-59593-793-3
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, United States
Duration: 20.06.200724.06.2007
Conference number: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Recently viewed

Publications

  1. Comparing Two Voltage Observers in a Sensorsystem using Repetitive Control
  2. Insights from classifying visual concepts with multiple kernel learning
  3. Agile knowledge graph testing with TESTaLOD
  4. Emergency detection based on probabilistic modeling in AAL environments
  5. Neural Combinatorial Optimization on Heterogeneous Graphs
  6. Closed-loop control of product geometry by using an artificial neural network in incremental sheet forming with active medium
  7. Enhancing Performance of Level System Modeling with Pseudo-Random Signals
  8. Rebounded PSO Method for Sigmoid PID Controller for a Maglev System with Input Saturation
  9. Managing Business Process in Distributed Systems: Requirements, Models, and Implementation
  10. Vision-Based Deep Learning Algorithm for Detecting Potholes
  11. Learning Rotation Sensitive Neural Network for Deformed Objects' Detection in Fisheye Images
  12. The scaled boundary finite element method for computational homogenization of heterogeneous media
  13. Different approaches to learning from errors: Comparing the effectiveness of high reliability and error management approaches
  14. Optimizing sampling of flying insects using a modified window trap
  15. Evaluating the construct validity of Objective Personality Tests using a multitrait-multimethod-Multioccasion-(MTMM-MO)-approach
  16. Analyzing different types of moderated method effects in confirmatory factor models for structurally different methods
  17. A Python toolbox for the numerical solution of the Maxey-Riley equation
  18. A Wavelet Packet Tree Denoising Algorithm for Images of Atomic-Force Microscopy
  19. Automatic enumeration of all connected subgraphs.
  20. Methodologies for Noise and Gross Error Detection using Univariate Signal-Based Approaches in Industrial Application
  21. Binary Random Nets I
  22. Using Natural Language Processing Techniques to Tackle the Construct Identity Problem in Information Systems Research
  23. Modeling Effective and Ineffective Knowledge Communication and Learning Discourses in CSCL with Hidden Markov Models
  24. Methodologies for noise and gross error detection using univariate signal-based approaches in industrial applications
  25. Modelling tasks—The relation between linguistic skills, intra-mathematical skills, and context-related prior knowledge
  26. Authenticity and authentication in language learning
  27. Development of a Didactic Graphical Simulation Interface on MATLAB for Systems Control
  28. Knowledge Graph Question Answering Using Graph-Pattern Isomorphism
  29. Graph Conditional Variational Models: Too Complex for Multiagent Trajectories?
  30. Using learning protocols for knowledge acquisition and problem solving with individual and group incentives
  31. Analysis of Complexity Reduction in Kalman Filters Through Decoupling Control With Chattered Inputs in PMSM
  32. Towards a Dynamic Interpretation of Subjective and Objective Values
  33. Analysis of priority rule-based scheduling in dual-resource-constrained shop-floor scenarios
  34. Substructure, subgraph, and walk counts as measures of the complexity of graphs and molecules.
  35. Essentializing the binary self