Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Authors

  • Kirsten Zantvoort
  • Barbara Nacke
  • Dennis Görlich
  • Silvan Hornstein
  • Corinna Jacobi
  • Burkhardt Funk

Artificial intelligence promises to revolutionize mental health care, but small dataset sizes and lack of robust methods raise concerns about result generalizability. To provide insights on minimal necessary data set sizes, we explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study (ISRCTN13716228, 26/02/2016). Prediction performance is analyzed based on dataset size (N = 100–3654), feature groups (F = 2–129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (N ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While N = 500 mitigated overfitting, performance did not converge until N = 750–1500. Consequently, we propose minimum dataset sizes of N = 500–1000. As such, this study offers an empirical reference for researchers designing or interpreting AI studies on Digital Mental Health Intervention data.

OriginalspracheEnglisch
Aufsatznummer361
Zeitschriftnpj Digital Medicine
Jahrgang7
Ausgabenummer1
Anzahl der Seiten10
DOIs
PublikationsstatusErschienen - 12.2024

Bibliographische Notiz

Publisher Copyright:
© The Author(s) 2024.

DOI

Zuletzt angesehen

Publikationen

  1. Conveying the Ethics of Artificial Intelligence in K–12 and Academia: A Systematic Review of Teaching Methods
  2. The case of the composite Higgs
  3. How Did It Get So Late So Soon? The Effects of Time Management Knowledge and Practice on Students’ Time Management Skills and Academic Performance
  4. Introduction
  5. Cascaded Kalman Filters for a Sliding Mode Control in a Peltier Structure for an Innovative Manufacturing System
  6. Multivariate Optimization of Analytical Methodology and a First Attempt to an Environmental Risk Assessment of β-Blockers in Hospital Wastewater
  7. Frame-based Optimal Design
  8. Integration of expertise or collaborative practice?
  9. A Bayesian EAP-Based Nonlinear Extension of Croon and Van Veldhoven’s Model for Analyzing Data from Micro–Macro Multilevel Designs
  10. More Evidence for Three Types of Cognitive Style
  11. Competence-Oriented Teaching
  12. The rise and decline of regional power
  13. We'll get them to do anything! Funny Inventions and Marketing
  14. Testing for Economies of Scope in European Railways
  15. HPLC and chemometrics-assisted UV-spectroscopy methods for the simultaneous determination of ambroxol and doxycycline in capsule.
  16. Towards a Model for Building Trust and Acceptance of Artificial Intelligence Aided Medical Assessment Systems
  17. The Invisualities of Capture in Amazon’s Logistical Operations
  18. Parameters identification in a permanent magnet three-phase synchronous motor for velocity planning to optimize traction losses in a City-Bus
  19. Path dependence and technological expectations in transport policy
  20. Foundation of digital badges and micro-credentials
  21. Exploring crowdworker participation on digital work platforms
  22. From Claiming to Creating Value