Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

  • Kirsten Zantvoort
  • Barbara Nacke
  • Dennis Görlich
  • Silvan Hornstein
  • Corinna Jacobi
  • Burkhardt Funk

Artificial intelligence promises to revolutionize mental health care, but small dataset sizes and lack of robust methods raise concerns about result generalizability. To provide insights on minimal necessary data set sizes, we explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study (ISRCTN13716228, 26/02/2016). Prediction performance is analyzed based on dataset size (N = 100–3654), feature groups (F = 2–129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (N ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While N = 500 mitigated overfitting, performance did not converge until N = 750–1500. Consequently, we propose minimum dataset sizes of N = 500–1000. As such, this study offers an empirical reference for researchers designing or interpreting AI studies on Digital Mental Health Intervention data.

Original languageEnglish
Article number361
Journalnpj Digital Medicine
Volume7
Issue number1
Number of pages10
DOIs
Publication statusPublished - 12.2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Recently viewed

Activities

  1. Antitrust and Beyond - The Democratic Task of Antitrust Law in the Light of Heinrich Kronstein's Work
  2. CyberKant, a Timely Response to the Eclipse of Reason by Mechanical Rationality - 2020
  3. How is Research Creation as Other Knowledge?
  4. Die Dokumentarische Methode
  5. On the relation between perceived intensity and pleasantness of olfactory stimuli and brain activity observed using functional Magnetic Resonance Imaging (fMRI)
  6. GDCP Jahrestagung 2020
  7. Are Self-Employed Time and Money Poor? Dynamics of Interpendent Multidimensional Poverty with German Time Use Diary Data
  8. Universität Wien
  9. Control transfers and remediation across the Upper Rhine. Scientific and technical conference in the framework of the Science Week Upper Rhine 2012
  10. Affective polarization and the support for different forms of governance. Evidence from Germany
  11. Open-source Citizenship Research: Learning from Anti-corporate Campaigning Methodologies
  12. Guest lecture Carbon performance and disclosure: Governance-related determinants and their firms’ financial consequences
  13. Undoing the Demos?
  14. Assessment of adult’s mathematical competence and the use of mathematics in work and daily life
  15. Decentralised Integrated Analysis and Enhancement of Awareness through Collaborative Modelling and Management of Flood Risk [DIANE-CM] 2009
  16. BMC Medical Informatics and Decision Making (Zeitschrift)
  17. Masterprogramme (Organisation)
  18. Implementing internet-based interventions for symptoms of depression and stress - results from a german routine care project
  19. Towards Wikidata: How to Transform Provenance with AI
  20. Competition Law in Developing Countries
  21. The Value of Time and the Production of Heritage
  22. Liquidity, Flows, Circulation: The Cultural Logic of. Environmentalization - 2020
  23. Qualitative Comparative Analysis (QCA) - 2015
  24. Presentation of the ADORE-project
  25. Academy of Management (Externe Organisation)

Publications

  1. Principled Interpolation in Normalizing Flows
  2. Comparison of Backpropagation and Kalman Filter-based Training for Neural Networks
  3. Evidence for singlet state β cleavage in the photoreaction of α-(2,6-dimethoxyphenoxy)-acetophenone inferred from time-resolved CIDNP spectroscopy
  4. Self-perception of the internal audit function within the corporate governance system - Empirical evidence for the European Union
  5. Cognitive load in reading a foreign language text with multimedia aids and the influence of verbal and spatial abilities
  6. Investigating Internal CSR Communication: Building a Theoretical Framework
  7. Horizontal, but not vertical canopy structure is related to stand functional diversity in a subtropical slope forest
  8. AUC Maximizing Support Vector Learning
  9. Digital Business Transformation and the Changing Role of the IT Function
  10. Reducing problematic alcohol use in employees: economic evaluation of guided and unguided web-based interventions alongside a three-arm randomized controlled trial
  11. A decoupling dynamic estimator for online parameters indentification of permanent magnet three-phase synchronous motors
  12. Analysis of the forming behaviour of in-situ drawn sandwich sheets
  13. Plural valuation in space: mapping values of grasslands and their ecosystem services
  14. In situ synchrotron radiation diffraction during solidification of Mgl5Gd
  15. Fallstudie
  16. Adaptive Item Selection Under Matroid Constraints
  17. Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions
  18. The influence of vertical integration and property rights on network access charges in the German electricity market