Supervised clustering of streaming data for email batch detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

Original languageEnglish
Title of host publicationProceedings of the 24th international conference on Machine learning
EditorsZoubin Ghahramani
Number of pages8
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2007
Pages345-352
ISBN (print)978-1-59593-793-3
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, United States
Duration: 20.06.200724.06.2007
Conference number: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Recently viewed

Publications

  1. Development of a Didactic Graphical Simulation Interface on MATLAB for Systems Control
  2. A genetic algorithm for a self-learning parameterization of an aerodynamic part feeding system for high-speed assembly
  3. Using Euler Discrete Approximation to Control an Aggregate Actuator in Camless Engines
  4. Global text processing in CSCL with learning protocols
  5. Modelling and implementing business processes in distributed systems
  6. Noise level estimation using haar wavelet packet trees for sensor robust outlier detection
  7. Closed-loop control of product geometry by using an artificial neural network in incremental sheet forming with active medium
  8. Application of non-convex rate dependent gradient plasticity to the modeling and simulation of inelastic microstructure development and inhomogeneous material behavior
  9. Enhancing Performance of Level System Modeling with Pseudo-Random Signals
  10. Managing Business Process in Distributed Systems: Requirements, Models, and Implementation
  11. An on-line orthogonal wavelet denoising algorithm for high-resolution surface scans
  12. ACL–adaptive correction of learning parameters for backpropagation based algorithms
  13. Joint entity and relation linking using EARL
  14. Learning Rotation Sensitive Neural Network for Deformed Objects' Detection in Fisheye Images
  15. The scaled boundary finite element method for computational homogenization of heterogeneous media
  16. Modeling and simulation of inelastic microstructure development and inhomogeneous material behavior via non-convex rate dependent gradient plasticity
  17. A model predictive control in Robotino and its implementation using ROS system
  18. Dynamic adjustment of dispatching rule parameters in flow shops with sequence-dependent set-up times
  19. Evaluating the construct validity of Objective Personality Tests using a multitrait-multimethod-Multioccasion-(MTMM-MO)-approach
  20. Analyzing different types of moderated method effects in confirmatory factor models for structurally different methods
  21. A Wavelet Packet Tree Denoising Algorithm for Images of Atomic-Force Microscopy
  22. A coding scheme to analyse global text processing in computer supported collaborative learning: What eye movements can tell us
  23. Constrained Independence for Detecting Interesting Patterns
  24. A Multivariate Method for Dynamic System Analysis
  25. Automatic enumeration of all connected subgraphs.
  26. How to combine collaboration scripts and heuristic worked examples to foster mathematical argumentation - when working memory matters
  27. The Use of Genetic Algorithm for PID Controller Auto-Tuning in ARM CORTEX M4 Platform
  28. Methodologies for Noise and Gross Error Detection using Univariate Signal-Based Approaches in Industrial Application
  29. Analysis and comparison of two finite element algorithms for dislocation density based crystal plasticity
  30. Binary Random Nets I
  31. Using Natural Language Processing Techniques to Tackle the Construct Identity Problem in Information Systems Research
  32. Modeling Effective and Ineffective Knowledge Communication and Learning Discourses in CSCL with Hidden Markov Models
  33. Knowledge Graph Question Answering Using Graph-Pattern Isomorphism
  34. Graph Conditional Variational Models: Too Complex for Multiagent Trajectories?
  35. Modeling and simulation of deformation behavior, orientation gradient development and heterogeneous hardening in thin sheets with coarse texture