Supervised clustering of streaming data for email batch detection

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Standard

Supervised clustering of streaming data for email batch detection. / Haider, Peter; Brefeld, Ulf; Scheffer, Tobias.

Proceedings of the 24th international conference on Machine learning. ed. / Zoubin Ghahramani. New York : Association for Computing Machinery, Inc, 2007. p. 345-352.

Research output: Contributions to collected editions/worksArticle in conference proceedingsResearchpeer-review

Harvard

Haider, P, Brefeld, U & Scheffer, T 2007, Supervised clustering of streaming data for email batch detection. in Z Ghahramani (ed.), Proceedings of the 24th international conference on Machine learning. Association for Computing Machinery, Inc, New York, pp. 345-352, Proceedings of the 24th international conference on Machine learning - ICML 2007, Corvalis, OR, Oregon, United States, 20.06.07. https://doi.org/10.1145/1273496.1273540

APA

Haider, P., Brefeld, U., & Scheffer, T. (2007). Supervised clustering of streaming data for email batch detection. In Z. Ghahramani (Ed.), Proceedings of the 24th international conference on Machine learning (pp. 345-352). Association for Computing Machinery, Inc. https://doi.org/10.1145/1273496.1273540

Vancouver

Haider P, Brefeld U, Scheffer T. Supervised clustering of streaming data for email batch detection. In Ghahramani Z, editor, Proceedings of the 24th international conference on Machine learning. New York: Association for Computing Machinery, Inc. 2007. p. 345-352 doi: 10.1145/1273496.1273540

Bibtex

@inbook{c18a5632af0d4bb78aadc3a531618a5c,
title = "Supervised clustering of streaming data for email batch detection",
abstract = "We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.",
keywords = "Informatics, Business informatics",
author = "Peter Haider and Ulf Brefeld and Tobias Scheffer",
year = "2007",
doi = "10.1145/1273496.1273540",
language = "English",
isbn = "978-1-59593-793-3",
pages = "345--352",
editor = "Zoubin Ghahramani",
booktitle = "Proceedings of the 24th international conference on Machine learning",
publisher = "Association for Computing Machinery, Inc",
address = "United States",
note = "Proceedings of the 24th international conference on Machine learning - ICML 2007, ICML 2007 ; Conference date: 20-06-2007 Through 24-06-2007",
url = "https://dl.acm.org/doi/proceedings/10.1145/1273496",

}

RIS

TY - CHAP

T1 - Supervised clustering of streaming data for email batch detection

AU - Haider, Peter

AU - Brefeld, Ulf

AU - Scheffer, Tobias

N1 - Conference code: 24

PY - 2007

Y1 - 2007

N2 - We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

AB - We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made - - owing to the streaming nature of the data - - then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails.

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=34547983265&partnerID=8YFLogxK

U2 - 10.1145/1273496.1273540

DO - 10.1145/1273496.1273540

M3 - Article in conference proceedings

AN - SCOPUS:34547983265

SN - 978-1-59593-793-3

SP - 345

EP - 352

BT - Proceedings of the 24th international conference on Machine learning

A2 - Ghahramani, Zoubin

PB - Association for Computing Machinery, Inc

CY - New York

T2 - Proceedings of the 24th international conference on Machine learning - ICML 2007

Y2 - 20 June 2007 through 24 June 2007

ER -

DOI