Supervised clustering of streaming data for email batch detection

Peter Haider; Ulf Brefeld; Tobias Scheffer

doi:10.1145/1273496.1273540

Supervised clustering of streaming data for email batch detection

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

Authors

Peter Haider
Ulf Brefeld
Tobias Scheffer

We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

Original language	English
Title of host publication	Proceedings of the 24th international conference on Machine learning
Editors	Zoubin Ghahramani
Number of pages	8
Place of Publication	New York
Publisher	Association for Computing Machinery, Inc
Publication date	2007
Pages	345-352
ISBN (print)	978-1-59593-793-3
DOIs	https://doi.org/10.1145/1273496.1273540
Publication status	Published - 2007
Externally published	Yes
Event	Proceedings of the 24th international conference on Machine learning - ICML 2007 - Corvalis, OR, United States Duration: 20.06.2007 → 24.06.2007 Conference number: 24 https://dl.acm.org/doi/proceedings/10.1145/1273496

Research areas

Informatics
Business informatics

Other publications by the same author(s)

Interactive sequential generative models for team sports

Fassmeyer, D., Cordes, M. & Brefeld, U., 02.2025, In: Machine Learning. 114, 2, 15 p., 38.

Research output: Journal contributions › Journal articles › Research › peer-review

Joint Item Response Models for Manual and Automatic Scores on Open-Ended Test Items

Bengs, D., Brefeld, U., Kroehne, U. & Zehner, F., 01.09.2025, In: Psychometrika. 90, 4, p. 1346-1367 22 p.

Research output: Journal contributions › Journal articles › Research › peer-review

Machine Learning and Data Mining for Sports Analytics: 11th International Workshop, MLSA 2024, Vilnius, Lithuania, September 9, 2024, Revised Selected Papers

Brefeld, U. (Editor), Davis, J. (Editor), Van Haaren, J. (Editor) & Zimmermann, A. (Editor), 2025, Cham: Springer Verlag. 119 p. (Communications in Computer and Information Science; vol. 2460)

Research output: Books and anthologies › Conference proceedings › Research

Masked autoencoder for multiagent trajectories

Rudolph, Y. & Brefeld, U., 02.2025, In: Machine Learning. 114, 2, 18 p., 44.

Research output: Journal contributions › Journal articles › Research › peer-review

Self-improvement for Computerized Adaptive Testing

Rudolph, Y., Neubauer, K. & Brefeld, U., 2026, Machine Learning and Knowledge Discovery in Databases - Research Track: European Conference, ECML PKDD 2025, Porto, Portugal, September 15–19, 2025, Proceedings. Ribeiro, R. P., Jorge, A. M., Soares, C., Gama, J., Pfahringer, B., Japkowicz, N., Larrañaga, P. & Abreu, P. H. (eds.). Cham: Springer International Publishing, Vol. 2. p. 70-86 17 p. (Lecture Notes in Computer Science; vol. 16014 LNCS).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

DOI

https://doi.org/10.1145/1273496.1273540
Final published version

Supervised clustering of streaming data for email batch detection

Authors

Research areas

Other publications by the same author(s)

Interactive sequential generative models for team sports

Joint Item Response Models for Manual and Automatic Scores on Open-Ended Test Items

Machine Learning and Data Mining for Sports Analytics: 11th International Workshop, MLSA 2024, Vilnius, Lithuania, September 9, 2024, Revised Selected Papers

Masked autoencoder for multiagent trajectories

Self-improvement for Computerized Adaptive Testing

DOI

Recently viewed

Researchers

Activities

Publications