Estimation of minimal data sets sizes for machine learning predictions in digital mental health interventions

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Authors

  • Kirsten Zantvoort
  • Barbara Nacke
  • Dennis Görlich
  • Silvan Hornstein
  • Corinna Jacobi
  • Burkhardt Funk

Artificial intelligence promises to revolutionize mental health care, but small dataset sizes and lack of robust methods raise concerns about result generalizability. To provide insights on minimal necessary data set sizes, we explore domain-specific learning curves for digital intervention dropout predictions based on 3654 users from a single study (ISRCTN13716228, 26/02/2016). Prediction performance is analyzed based on dataset size (N = 100–3654), feature groups (F = 2–129), and algorithm choice (from Naive Bayes to Neural Networks). The results substantiate the concern that small datasets (N ≤ 300) overestimate predictive power. For uninformative feature groups, in-sample prediction performance was negatively correlated with dataset size. Sophisticated models overfitted in small datasets but maximized holdout test results in larger datasets. While N = 500 mitigated overfitting, performance did not converge until N = 750–1500. Consequently, we propose minimum dataset sizes of N = 500–1000. As such, this study offers an empirical reference for researchers designing or interpreting AI studies on Digital Mental Health Intervention data.

OriginalspracheEnglisch
Aufsatznummer361
Zeitschriftnpj Digital Medicine
Jahrgang7
Ausgabenummer1
Anzahl der Seiten10
DOIs
PublikationsstatusErschienen - 12.2024

Bibliographische Notiz

Publisher Copyright:
© The Author(s) 2024.

DOI

Zuletzt angesehen

Publikationen

  1. Organizing Half-Things: Knowing, Theorizing and Studying Atmospheres
  2. The differential effects of self-view in virtual meetings when speaking vs. listening
  3. Implementation of EU labour law directives by way of national collective agreements
  4. Timing and fragmentation of daily working hours arrangements and income inequality
  5. A web- And mobile-based intervention for comorbid, recurrent depression in patients with chronic back pain on sick leave (get.back)
  6. Desynchronization of the Public and the Private
  7. Temporal and thermodynamic irreversibility in production theory
  8. You Are Where You Eat: A Theoretical Perspective on Why Identity Matters in Local Food Groups
  9. X Machina and the World of Tomorrow
  10. Schreiben in der Sekundarstufe II
  11. Lernsoftware im Unterricht
  12. Comparison between UKF and EKF in Sensorless Synchronous Reluctance Motor Drives
  13. Leveling up? An inter-neighborhood experiment on parochialism and the efficiency of multi-level public goods provision
  14. Was gibt´s heute?
  15. The theory of socio-cultural evolution
  16. Ideological Foundations of Perceived Contract Breach Associated With Downsizing
  17. Linking socio-technical transition studies and organisational change management
  18. How selective are real wage cuts?
  19. Modernisierung und Partizipation
  20. From Claiming to Creating Value
  21. Towards greener and sustainable ionic liquids using naturally occurring and nature-inspired pyridinium structures
  22. Towards a Relational Materialism
  23. The representative turn in EU studies
  24. Optimising Patterns of Life Conduct
  25. Systematic distributions of interaction strengths across tree interaction networks yield positive diversity–productivity relationships
  26. “The whole is greater than the sum of its parts” – Exploring teachers’ technology readiness profiles and its relation to their emotional state during COVID-19 emergency remote teaching
  27. rudimentäre Schreibung
  28. Umweltrechtsschutz in China
  29. Unplanned, Unanticipated and Unsupported?
  30. Entrepreneurship: the missing link for democratization and development in fragile nations?
  31. Liveness Formats
  32. Impacts of offshore wind farms on sediment structure and the water column during construction, and changes in bottom topography during the operation phase
  33. Shared Storybook Reading and Oral Language Development
  34. Digital Design Strategies
  35. New methods for the analysis of links between international firm activities and firm performance
  36. A comparative assessment of the transformation products of S-metolachlor and its commercial product Mercantor Gold® and their fate in the aquatic environment by employing a combination of experimental and in silico methods
  37. The Making of Urban Computing Environments
  38. Pragmatic and discourse-analytic approaches to present-day English