How Big Does Big Data Need to Be?

Martin Stange; Burkhardt Funk

doi:10.4018/978-1-5225-0293-7.ch001

How Big Does Big Data Need to Be?

Research output: Contributions to collected editions/works › Contributions to collected editions/anthologies › Research › peer-review

Authors

Professorship for Information Systems, in particular Data Science

Collecting and storing of as many data as possible is common practice in many companies these days. To reduce costs of collecting and storing data that is not relevant, it is important to define which analytical questions are to be answered and how much data is needed to answer these questions. In this chapter,
a process to define an optimal sampling size is proposed. Based on benefit/cost considerations, the authors show how to find the sample size that maximizes the utility of predictive analytics. By applying the proposed process to a case study is shown that only a very small fraction of the available data set is needed to make accurate predictions.

Original language	English
Title of host publication	Enterprise Big Data Engineering, Analytics, and Management
Editors	Martin Atzmueller, Samia Oussena, Thomas Roth-Berghofer
Number of pages	12
Place of Publication	Hershey
Publisher	Business Science Reference
Publication date	06.2016
Pages	1-12
ISBN (print)	9781522502937
ISBN (electronic)	9781522502944
DOIs	https://doi.org/10.4018/978-1-5225-0293-7.ch001
Publication status	Published - 06.2016

Research areas

Business informatics - Big Data, Predictive Analytics, Learning Curve

Other publications by the same author(s)

Capitalizing on natural language processing (NLP) to automate the evaluation of coach implementation fidelity in guided digital cognitive-behavioral therapy (GdCBT)

Zainal, N. H., Eckhardt, R., Rackoff, G. N., Fitzsimmons-Craft, E. E., Rojas-Ashe, E., Barr Taylor, C., Funk, B., Eisenberg, D., Wilfley, D. E. & Newman, M. G., 02.04.2025, In: Psychological Medicine. 55, e106.

Research output: Journal contributions › Journal articles › Research › peer-review

Construct relation extraction from scientific papers: Is it automatable yet?

Funk, B. & Scharfenberger, J., 07.01.2025, Proceedings of the 58th Hawaii International Conference on System Sciences, HICSS 2025. Bui, T. X. (ed.). Honolulu: University of Hawaii at Manoa, p. 4675-4684 10 p. (Hawaii International Conference on System Sciences (HICSS); vol. 2025).

Research output: Contributions to collected editions/works › Published abstract in conference proceedings › Research › peer-review

From Feedback to Formative Guidance: Leveraging LLMs for Personalized Support in Programming Projects

Ghoochani, F., Scharfenberger, J., Funk, B., Doublan, R., Jakharabhai Odedra, M. & Etsiwah, B., 12.06.2025, UMAP 2025 - Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization. Conati, C., Narducci, F., Rossiello, G., Musto, C. & Vassileva, J. (eds.). Association for Computing Machinery, Inc, p. 398-403 6 p. (UMAP 2025 - Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization).

Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review

The promise and challenges of computer mouse trajectories in DMHIs – A feasibility study on pre-treatment dropout predictions

Zantvoort, K., Matthiesen, J., Bjurner, P., Bendix, M., Brefeld, U., Funk, B. & Kaldo, V., 06.2025, In: Internet Interventions. 40, 7 p., 100828.

Research output: Journal contributions › Journal articles › Research › peer-review

A Universal Digital Stress Management Intervention for Employees: Randomized Controlled Trial with Health-Economic Evaluation

Freund, J., Smit, F., Lehr, D., Zarski, A. C., Berking, M., Riper, H., Funk, B., Ebert, D. D. & Buntrock, C., 22.10.2024, In: Journal of Medical Internet Research. 26, 13 p., e48481.

Research output: Journal contributions › Journal articles › Research › peer-review

DOI

https://doi.org/10.4018/978-1-5225-0293-7.ch001
Final published version

How Big Does Big Data Need to Be?

Authors

Research areas

Other publications by the same author(s)

Capitalizing on natural language processing (NLP) to automate the evaluation of coach implementation fidelity in guided digital cognitive-behavioral therapy (GdCBT)

Construct relation extraction from scientific papers: Is it automatable yet?

From Feedback to Formative Guidance: Leveraging LLMs for Personalized Support in Programming Projects

The promise and challenges of computer mouse trajectories in DMHIs – A feasibility study on pre-treatment dropout predictions

A Universal Digital Stress Management Intervention for Employees: Randomized Controlled Trial with Health-Economic Evaluation

DOI

Recently viewed

Activities

Publications