Constrained Independence for Detecting Interesting Patterns
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). ed. / Gabriella Pasi; James Kwok; Osmar Zaiane; Patrick Gallinari; Eric Gaussier; Longbing Cao. IEEE - Institute of Electrical and Electronics Engineers Inc., 2015. 7344897 (Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015).
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Constrained Independence for Detecting Interesting Patterns
AU - Delacroix, Thomas
AU - Boubekki, Ahcène
AU - Lenca, Philippe
AU - Lallich, Stéphane
PY - 2015/12/2
Y1 - 2015/12/2
N2 - Among other criteria, a pattern may be interesting if it is not redundant with other discovered patterns. A general approach to determining redundancy is to consider a probabilistic model for frequencies of patterns, based on those of patterns already mined, and compare observed frequencies to the model. Such probabilistic models include the independence model, partition models or more complex models which are approached via randomization for a lack of an adequate tool in probability theory allowing a direct approach. We define constrained independence, a generalization to the notion of independence. This tool allows us to describe probabilistic models for evaluating redundancy in frequent itemset mining. We provide algorithms, integrated within the mining process, for determining non-redundant itemsets. Through experimentations, we show that the models used reveal high rates of redundancy among frequent itemsets and we extract the most interesting ones.
AB - Among other criteria, a pattern may be interesting if it is not redundant with other discovered patterns. A general approach to determining redundancy is to consider a probabilistic model for frequencies of patterns, based on those of patterns already mined, and compare observed frequencies to the model. Such probabilistic models include the independence model, partition models or more complex models which are approached via randomization for a lack of an adequate tool in probability theory allowing a direct approach. We define constrained independence, a generalization to the notion of independence. This tool allows us to describe probabilistic models for evaluating redundancy in frequent itemset mining. We provide algorithms, integrated within the mining process, for determining non-redundant itemsets. Through experimentations, we show that the models used reveal high rates of redundancy among frequent itemsets and we extract the most interesting ones.
KW - Informatics
KW - Mathematics
KW - Business informatics
UR - http://www.scopus.com/inward/record.url?scp=84962853098&partnerID=8YFLogxK
U2 - 10.1109/DSAA.2015.7344897
DO - 10.1109/DSAA.2015.7344897
M3 - Article in conference proceedings
T3 - Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015
BT - 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)
A2 - Pasi, Gabriella
A2 - Kwok, James
A2 - Zaiane, Osmar
A2 - Gallinari, Patrick
A2 - Gaussier, Eric
A2 - Cao, Longbing
PB - IEEE - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE International Conference on Data Science and Advanced Analytics - DSAA 2015
Y2 - 19 October 2015 through 21 October 2015
ER -