Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

  • Kirsten Zantvoort
  • Barbara Nacke
  • Dennis Görlich
  • Silvan Hornstein
  • Corinna Jacobi
  • Burkhardt Funk

Artificial intelligence promises to revolutionize mental health care, but small dataset sizes and lack of robust methods raise concerns about result generalizability. To provide insights on minimal necessary data set sizes, we explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study (ISRCTN13716228, 26/02/2016). Prediction performance is analyzed based on dataset size (N = 100–3654), feature groups (F = 2–129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (N ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While N = 500 mitigated overfitting, performance did not converge until N = 750–1500. Consequently, we propose minimum dataset sizes of N = 500–1000. As such, this study offers an empirical reference for researchers designing or interpreting AI studies on Digital Mental Health Intervention data.

Original languageEnglish
Article number361
Journalnpj Digital Medicine
Volume7
Issue number1
Number of pages10
ISSN2398-6352
DOIs
Publication statusPublished - 12.2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Recently viewed

Researchers

  1. Oliver Mock

Publications

  1. Towards Computer Simulations of Virtue Ethics
  2. On the role of linguistic features for comprehension and learning from STEM texts. A meta-analysis
  3. Careless responding detection revisited
  4. Do consumers prefer pasture-raised dual-purpose cattle when considering meat products? A hypothetical discrete choice experiment for the case of minced beef
  5. The Pricing of Default-free Interest Rate Cap, Floor, and Collar Agreements
  6. Frames of systems change in sustainability transformations: Lessons from sociotechnical systems and circular economy case studies
  7. Explorations in Social Spaces
  8. Urgent need for updating the slogan of global climate actions from 'tree planting' to 'restore native vegetation'
  9. From railroad imperialism to neoliberal reprimarization: Lessons from regime-shifts in the Global Soybean Complex
  10. Planning for Sea Spaces I: Processes, Practices and Future Perspectives
  11. “If It Bleeds It Leads”
  12. Matching between oral inward–outward movements of object names and oral movements associated with denoted objects
  13. Hindering and Facilitating Factors for Developing and Implementing HR Measures for Older Workers
  14. Industrial Clusters as a Factor for Innovative Drive- in Regions of Transformation and Structural Change
  15. Wer wird subventioniert?
  16. When to sample in an inaccessible landscape
  17. Industry Transformations for High Service Provisioning with Lower Energy and Material Demand
  18. The Utilization of Artificial Intelligence in Higher Education Institutions in Germany
  19. Storytelling in instant messenger communication