Supervised clustering of streaming data for email batch detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Authors

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

Original languageEnglish
Title of host publicationProceedings of the 24th international conference on Machine learning
EditorsZoubin Ghahramani
Number of pages8
Place of PublicationNew York
PublisherAssociation for Computing Machinery, Inc
Publication date2007
Pages345-352
ISBN (print)978-1-59593-793-3
DOIs
Publication statusPublished - 2007
Externally publishedYes
EventProceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, United States
Duration: 20.06.200724.06.2007
Conference number: 24
https://dl.acm.org/doi/proceedings/10.1145/1273496

DOI

Recently viewed

Publications

  1. Comparing Two Voltage Observers in a Sensorsystem using Repetitive Control
  2. Using haar wavelets for fault detection in technical processes
  3. Analysis of priority rule-based scheduling in dual-resource-constrained shop-floor scenarios
  4. An MPC for an Aggregate Actuator with a Self-Tuning Feedforward Control
  5. A Lightweight Simulation Model for Soft Robot's Locomotion and its Application to Trajectory Optimization
  6. Optimal trajectory generation using MPC in robotino and its implementation with ROS system
  7. Neural Combinatorial Optimization on Heterogeneous Graphs
  8. Inverting the Large Lecture Class: Active Learning in an Introductory International Relations Course
  9. Throttle valve control using an inverse local linear model tree based on a Fuzzy neural network
  10. Finding Similar Movements in Positional Data Streams
  11. Digital Control of a Camless Engine Using Lyapunov Approach with Backward Euler Approximation
  12. Different approaches to learning from errors: Comparing the effectiveness of high reliability and error management approaches
  13. Cross-document coreference resolution using latent features
  14. Methodologies for Noise and Gross Error Detection using Univariate Signal-Based Approaches in Industrial Application
  15. Ant colony optimization algorithm and artificial immune system applied to a robot route
  16. Development of a Didactic Graphical Simulation Interface on MATLAB for Systems Control
  17. Random measurement and prediction errors limit the practical relevance of two velocity sensors to estimate the 1RM back squat
  18. A Quadrant Approach of Camera Calibration Method for Depth Estimation Using a Stereo Vision System
  19. Homogenization modeling of thin-layer-type microstructures
  20. Evaluation of Time/Phase Parameters in Frequency Measurements for Inertial Navigation Systems
  21. A discrete approximate solution for the asymptotic tracking problem in affine nonlinear systems
  22. Neural network-based adaptive fault-tolerant control for strict-feedback nonlinear systems with input dead zone and saturation
  23. Multi-Parallel Sending Coils for Movable Receivers in Inductive Charging Systems
  24. The Use of Factorization and Multimode Parametric Spectra in Estimating Frequency and Spectral Parameters of Signal
  25. Perfect anti-windup in output tracking scheme with preaction
  26. Control of the inverse pendulum based on sliding mode and model predictive control
  27. Enhancing Performance of Level System Modeling with Pseudo-Random Signals
  28. Using Complexity Metrics to Assess Silent Reading Fluency
  29. Continuous 3D scanning mode using servomotors instead of stepping motors in dynamic laser triangulation
  30. Analyzing different types of moderated method effects in confirmatory factor models for structurally different methods
  31. Using the flatness of DC-Drives to emulate a generator for a decoupled MPC using a geometric approach for motion control in Robotino
  32. Dynamic Lot Size Optimization with Reinforcement Learning
  33. On robustness properties in permanent magnet machine control by using decoupling controller
  34. Classical PI Controllers with Anti-Windup Techniques Applied on Level Systems
  35. A model predictive control in Robotino and its implementation using ROS system
  36. Introducing parametric uncertainty into a nonlinear friction model
  37. Stepwise-based optimizing approaches for arrangements of loudspeaker in multi-zone sound field reproduction
  38. A geometric approach for controlling an electromagnetic actuator with the help of a linear Model Predictive Control
  39. A localized boundary element method for the floating body problem
  40. Mapping interest rate projections using neural networks under cointegration
  41. The Influence of Note-taking on Mathematical Solution Processes while Working on Reality-Based Tasks
  42. Robust Flatness Based Control of an Electromagnetic Linear Actuator Using Adaptive PID Controller
  43. Gaussian processes for dispatching rule selection in production scheduling
  44. Performance analysis for loss systems with many subscribers and concurrent services
  45. Comments on "Tracking Control of Robotic Manipulators With Uncertain Kinematics and Dynamics"
  46. A guided simulated annealing search for solving the pick-up and delivery problem with time windows and capacity constraints
  47. An analytical approach to evaluating bivariate functions of fuzzy numbers with one local extremum
  48. On the Nonlinearity Compensation in Permanent Magnet Machine Using a Controller Based on a Controlled Invariant Subspace
  49. An Orthogonal Wavelet Denoising Algorithm for Surface Images of Atomic Force Microscopy