How Much Tracking Is Necessary? - The Learning Curve in Bayesian User Journey Analysis
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
Proceedings of the Twenty-Third European Conference on Information Systems. AIS eLibrary, 2015.
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - How Much Tracking Is Necessary? - The Learning Curve in Bayesian User Journey Analysis
AU - Stange, Martin
AU - Funk, Burkhardt
N1 - Conference code: 23
PY - 2015/5/29
Y1 - 2015/5/29
N2 - Extracting value from big data is one of today’s business challenges. In online marketing, for instance, advertisers use high volume clickstream data to increase the efficiency of their campaigns. To prevent collecting, storing, and processing of irrelevant data, it is crucial to determine how much data to analyze to achieve acceptable model performance. We propose a general procedure that employs the learning curve sampling method to determine the optimal sample size with respect to cost/benefit considerations. Applied in two case studies, we model the users' click behavior based on clickstream data and offline channel data. We observe saturation effects of the predictive accuracy when the sample size is increased and, thus, demonstrate that advertisers only have to analyze a very small subset of the full dataset to obtain an acceptable predictive accuracy and to optimize profits from advertising activities. In both case studies we observe that a random intercept logistic model outperforms a non-hierarchical model in terms of predictive accuracy. Given the high infrastructure costs and the users' growing awareness for tracking activities, our results have managerial implications for companies in the online marketing field.
AB - Extracting value from big data is one of today’s business challenges. In online marketing, for instance, advertisers use high volume clickstream data to increase the efficiency of their campaigns. To prevent collecting, storing, and processing of irrelevant data, it is crucial to determine how much data to analyze to achieve acceptable model performance. We propose a general procedure that employs the learning curve sampling method to determine the optimal sample size with respect to cost/benefit considerations. Applied in two case studies, we model the users' click behavior based on clickstream data and offline channel data. We observe saturation effects of the predictive accuracy when the sample size is increased and, thus, demonstrate that advertisers only have to analyze a very small subset of the full dataset to obtain an acceptable predictive accuracy and to optimize profits from advertising activities. In both case studies we observe that a random intercept logistic model outperforms a non-hierarchical model in terms of predictive accuracy. Given the high infrastructure costs and the users' growing awareness for tracking activities, our results have managerial implications for companies in the online marketing field.
KW - Business informatics
KW - Big Data
KW - Online Marketing
KW - User Journey Analysis
KW - Learning Curve
KW - Bayesian Models
U2 - 10.18151/7217484
DO - 10.18151/7217484
M3 - Article in conference proceedings
SN - 978-3-00-050284-2
BT - Proceedings of the Twenty-Third European Conference on Information Systems
PB - AIS eLibrary
T2 - 23rd European Conference on Information Systems - ECIS 2015
Y2 - 26 May 2015 through 29 May 2015
ER -