Supervised clustering of streaming data for email batch detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Supervised clustering of streaming data for email batch detection. / Haider, Peter; Brefeld, Ulf; Scheffer, Tobias.
Proceedings of the 24th international conference on Machine learning. ed. / Zoubin Ghahramani. New York: Association for Computing Machinery, Inc, 2007. p. 345-352.

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Haider, P, Brefeld, U & Scheffer, T 2007, Supervised clustering of streaming data for email batch detection. in Z Ghahramani (ed.), Proceedings of the 24th international conference on Machine learning. Association for Computing Machinery, Inc, New York, pp. 345-352, Proceedings of the 24th international conference on Machine learning - ICML 2007, Corvalis, OR, Oregon, United States, 20.06.07. https://doi.org/10.1145/1273496.1273540

APA

Haider, P., Brefeld, U., & Scheffer, T. (2007). Supervised clustering of streaming data for email batch detection. In Z. Ghahramani (Ed.), Proceedings of the 24th international conference on Machine learning (pp. 345-352). Association for Computing Machinery, Inc. https://doi.org/10.1145/1273496.1273540

Vancouver

Haider P, Brefeld U, Scheffer T. Supervised clustering of streaming data for email batch detection. In Ghahramani Z, editor, Proceedings of the 24th international conference on Machine learning. New York: Association for Computing Machinery, Inc. 2007. p. 345-352 doi: 10.1145/1273496.1273540

Bibtex

@inbook{c18a5632af0d4bb78aadc3a531618a5c,
title = "Supervised clustering of streaming data for email batch detection",
abstract = "We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.",
keywords = "Informatics, Business informatics",
author = "Peter Haider and Ulf Brefeld and Tobias Scheffer",
year = "2007",
doi = "10.1145/1273496.1273540",
language = "English",
isbn = "978-1-59593-793-3",
pages = "345--352",
editor = "Zoubin Ghahramani",
booktitle = "Proceedings of the 24th international conference on Machine learning",
publisher = "Association for Computing Machinery, Inc",
address = "United States",
note = "Proceedings of the 24th international conference on Machine learning - ICML 2007, ICML 2007 ; Conference date: 20-06-2007 Through 24-06-2007",
url = "https://dl.acm.org/doi/proceedings/10.1145/1273496",

}

RIS

TY - CHAP

T1 - Supervised clustering of streaming data for email batch detection

AU - Haider, Peter

AU - Brefeld, Ulf

AU - Scheffer, Tobias

N1 - Conference code: 24

PY - 2007

Y1 - 2007

N2 - We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

AB - We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=34547983265&partnerID=8YFLogxK

U2 - 10.1145/1273496.1273540

DO - 10.1145/1273496.1273540

M3 - Article in conference proceedings

AN - SCOPUS:34547983265

SN - 978-1-59593-793-3

SP - 345

EP - 352

BT - Proceedings of the 24th international conference on Machine learning

A2 - Ghahramani, Zoubin

PB - Association for Computing Machinery, Inc

CY - New York

T2 - Proceedings of the 24th international conference on Machine learning - ICML 2007

Y2 - 20 June 2007 through 24 June 2007

ER -

DOI

Recently viewed

Publications

  1. Learning Rotation Sensitive Neural Network for Deformed Objects' Detection in Fisheye Images
  2. The scaled boundary finite element method for computational homogenization of heterogeneous media
  3. Optimizing sampling of flying insects using a modified window trap
  4. A Python toolbox for the numerical solution of the Maxey-Riley equation
  5. A coding scheme to analyse global text processing in computer supported collaborative learning: What eye movements can tell us
  6. Methodologies for Noise and Gross Error Detection using Univariate Signal-Based Approaches in Industrial Application
  7. A genetic algorithm for a self-learning parameterization of an aerodynamic part feeding system for high-speed assembly
  8. Using Natural Language Processing Techniques to Tackle the Construct Identity Problem in Information Systems Research
  9. Modeling Effective and Ineffective Knowledge Communication and Learning Discourses in CSCL with Hidden Markov Models
  10. Ant colony optimization algorithm and artificial immune system applied to a robot route
  11. Development of a Didactic Graphical Simulation Interface on MATLAB for Systems Control
  12. Knowledge Graph Question Answering Using Graph-Pattern Isomorphism
  13. Graph Conditional Variational Models: Too Complex for Multiagent Trajectories?
  14. Proceedings of the SeMantic Answer Type and Relation Prediction Task at ISWC 2021 Semantic Web Challenge (SMART2021)
  15. Analysis of priority rule-based scheduling in dual-resource-constrained shop-floor scenarios
  16. Using protochirons for three-dimensional coding of certain chemical structures.
  17. Essentializing the binary self
  18. Using haar wavelets for fault detection in technical processes
  19. Using mixture distribution models to test the construct validity of the Physical Self-Description Questionnaire
  20. Adaptive and Dynamic Feedback Loops between Production System and Production Network based on the Asset Administration Shell
  21. A sufficient asymptotic stability condition in generalised model predictive control to avoid input saturation
  22. Predicting the Difficulty of Exercise Items for Dynamic Difficulty Adaptation in Adaptive Language Tutoring
  23. The Scalable Question Answering Over Linked Data (SQA) Challenge 2018
  24. The learning net - an interactive representation of shared knowledge
  25. Optimal regulation for dynamic hybrid systems based on dynamic programming in the case of an intelligent vehicle drive assistant
  26. Expertise in research integration and implementation for tackling complex problems
  27. An MPC for an Aggregate Actuator with a Self-Tuning Feedforward Control
  28. Making an Impression Through Openness
  29. Building a process layer for business applications using the blackboard pattern
  30. Emergency detection based on probabilistic modeling in AAL environments
  31. Global text processing in CSCL with learning protocols
  32. Unity and diversity in the law of state responsibility
  33. N3 - A collection of datasets for named entity recognition and disambiguation in the NLP interchange format
  34. Multi-Parallel Sending Coils for Movable Receivers in Inductive Charging Systems
  35. Anomaly detection in formed sheet metals using convolutional autoencoders
  36. Control of a Sun Tracking Robot Based on Adaptive Sliding Mode Control with Kalman Filtering and Model Predictive Control