Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

  • Kirsten Zantvoort
  • Barbara Nacke
  • Dennis Görlich
  • Silvan Hornstein
  • Corinna Jacobi
  • Burkhardt Funk

Artificial intelligence promises to revolutionize mental health care, but small dataset sizes and lack of robust methods raise concerns about result generalizability. To provide insights on minimal necessary data set sizes, we explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study (ISRCTN13716228, 26/02/2016). Prediction performance is analyzed based on dataset size (N = 100–3654), feature groups (F = 2–129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (N ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While N = 500 mitigated overfitting, performance did not converge until N = 750–1500. Consequently, we propose minimum dataset sizes of N = 500–1000. As such, this study offers an empirical reference for researchers designing or interpreting AI studies on Digital Mental Health Intervention data.

Original languageEnglish
Article number361
Journalnpj Digital Medicine
Volume7
Issue number1
Number of pages10
DOIs
Publication statusPublished - 12.2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Recently viewed

Activities

  1. CeBIT 2014
  2. The impact of information on diabetes patients' acceptance of internet-based depression interventions: a randomized controlled trial.
  3. Multi-level Governance, Policy Implementation & Participation: The Mandated Participatory Planning Approach to Implementing Environmental Policy
  4. CSR und Big Data
  5. Breaks and Age Related Strain in Continuous Physical Work
  6. Virtualität
  7. Management of ecosystem resilience as optimal investment in self-protection. A simple, but often non-convex problem
  8. Gaming the system: Harnessing the power of commercial computer games for foreign language learning
  9. Interdisziplinarität in kuratorischen Netzwerken
  10. Extending Working Lives in Organizations: The Later Life Workplace Index for Successful Management of an Aging Workforce
  11. Clustering (Spatial) Relationships to unveil small-scale Schooling Markets
  12. ERA-Net CRUE Kick-off Meeting - 2009
  13. Using cardiovascular measures to integrate two theories: motivational intensity theory and mental contrasting
  14. 24th International Conference on System Theory, Control and Computing - ICSTCC 2020
  15. Architecture, a Mathematical Science - Teaching Experimental CAAD: Proportions and Harmonies
  16. 5th Critical Transitions in Complex Systems Workshop - CRITICS 2018
  17. Implementing internet-based interventions for symptoms of depression and stress - results from a german routine care project
  18. Inside the Writers’ Room: Theorizing Relationality in Creative Collaboration
  19. ECPR Joint Sessions, Workshop: ‚Local political leadership in a changing context’ - 2006
  20. Phase-mixture Modeling of Nanocrystalline Materials Including Gradient Effects of Higher Order
  21. HyperKult XVIII - Computer als Medium: The Cloud
  22. Absatztheorie
  23. ECPR Winter School in Methods and Techniques
  24. Plenary Speaker at the 2021 8th International Conference on Computing and Communication Technologies (WCCCT 2021) with a talk entitled: „A Real Application of Fuzzy Based Control Strategy for a Nonholonomic Car-Like Robot„, January 23-25, Dalian, China.
  25. Public Lecture Series "Global Politics" 2015
  26. Mercator School of Management
  27. Spectral kinetic simulation of the Planar Multipole-Resonance-Probe

Publications

  1. CHANGING RECREATIONAL ACTIVITIES FOR REDUCING INSOMNIA SEVERITY? RESULTS FROM A SERIAL MEDIATION ANALYSIS ON THE IMPACT OF RECREATIONAL BEHAVIOR AS A MECHANISM OF CHANGE IN DIGITAL INTERVENTIONS FOR INSOMNIA
  2. (De)Composing Public Value
  3. The Network Dynamics of Movements
  4. Microstructural and Mechanical Aspects of Reinforcement Welds for Lightweight Components Produced by Friction Hydro Pillar Processing
  5. The development of an eco-label for software products
  6. A Unified Contextual Bandit Framework for Long- and Short-Term Recommendations
  7. Legitimation problems of participatory processes in technology assessment and technology policy
  8. Exploring the Unknown
  9. Internal forces in robotic manipulation and in general mechanisms using a geometric approach
  10. Encoding the law of State responsibility with courage and resolve
  11. The Influence of Robots’ Emotion Expressions on the Uncanny-Valley-Effect
  12. Exploring biocultural diversity
  13. Explaining Disagreement on Interest Rates in a Taylor-Rule Setting
  14. Study of the solidification of AS alloys combining in situ synchrotron diffraction and differential scanning calorimetry
  15. What role for frames in scalar conflicts?
  16. Requests for mathematical reasoning in textbooks for primary-level students
  17. Comparison of three methods of length compensation in a parallel kinematic and their equivalence conditions
  18. Theme zones in English media discourse
  19. Deregulating to No Avail: How the Omnibus Package Falls Short in Simplifying Key EU Green Deal Instruments
  20. Influences of RVE topology, discretization and boundary conditions in practical multiscaling - a comparison
  21. Design for Product Care—Development of Design Strategies and a Toolkit for Sustainable Consumer Behaviour