How Big Does Big Data Need to Be?

Research output: Contributions to collected editions/worksContributions to collected editions/anthologiesResearchpeer-review

Authors

Collecting and storing of as many data as possible is common practice in many companies these days. To reduce costs of collecting and storing data that is not relevant, it is important to define which analytical questions are to be answered and how much data is needed to answer these questions. In this chapter,
a process to define an optimal sampling size is proposed. Based on benefit/cost considerations, the authors show how to find the sample size that maximizes the utility of predictive analytics. By applying the proposed process to a case study is shown that only a very small fraction of the available data set is needed to make accurate predictions.
Original languageEnglish
Title of host publicationEnterprise Big Data Engineering, Analytics, and Management
EditorsMartin Atzmueller, Samia Oussena, Thomas Roth-Berghofer
Number of pages12
Place of PublicationHershey
PublisherBusiness Science Reference
Publication date06.2016
Pages1-12
ISBN (print)9781522502937
ISBN (electronic)9781522502944
DOIs
Publication statusPublished - 06.2016

Recently viewed

Publications

  1. Forest structure and heterogeneity increase diversity and alter composition of host–parasitoid networks
  2. Usage pattern-based exposure screening as a simple tool for the regional priority-setting in environmental risk assessment of veterinary antibiotics
  3. Release of monomers from four different composite materials after halogen and LED curing
  4. Modelling scenarios to identify a combined sediment-water management strategy for the large reservoirs of the Tuyamuyun hydro-complex
  5. Consular Assistance: Rights, Remedies, and Responsibility Comments on the ICJ's Judgment in the LaGrand Case
  6. Understanding Similarities and Differences of Digital Health Platforms
  7. Effect of salinity-changing rates on filtration activity of mussels from two sites within the Baltic Mytilus hybrid zone
  8. Der "fachdidaktische Code" der Lebenswelt- und/oder (?) Situationsorientierung
  9. Putting Architecture in its Social Space: the Fields and Skills of Planning Maastricht
  10. Das relationale Apriori Wiens / Das städtische Apriori des Relationalismus
  11. Monitoring of methotrexate chlorination in water
  12. Time for the Environment: The Tutzing Time Ecology Project
  13. Evidence for singlet state β cleavage in the photoreaction of α-(2,6-dimethoxyphenoxy)-acetophenone inferred from time-resolved CIDNP spectroscopy
  14. The complementary relationship of exploration and exploitation in professional service firms: An exploratory study of IT consulting firms
  15. Multivariate Optimization of Analytical Methodology and a First Attempt to an Environmental Risk Assessment of β-Blockers in Hospital Wastewater
  16. The rise and decline of regional power
  17. Modeling Interactions and Dependencies in Production Planning and Control
  18. How to specify the structure of substituted blade-like zigzag diamondoids
  19. X Machina and the World of Tomorrow