Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions

Kirsten Zantvoort; Nils Hentati Isacsson; Burkhardt Funk; Viktor Kaldo

doi:10.1177/20552076241248920

Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions

Publikation: Beiträge in Zeitschriften › Zeitschriftenaufsätze › Forschung › begutachtet

Standard

Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions. / Zantvoort, Kirsten; Hentati Isacsson, Nils; Funk, Burkhardt et al.
in: Digital Health, Jahrgang 10, 15.05.2024.

Publikation: Beiträge in Zeitschriften › Zeitschriftenaufsätze › Forschung › begutachtet

Bibtex

@article{97f4128813864120883073b59b35bce7,

title = "Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions",

abstract = "Objective: This study proposes a way of increasing dataset sizes for machine learning tasks in Internet-based Cognitive Behavioral Therapy through pooling interventions. To this end, it (1) examines similarities in user behavior and symptom data among online interventions for patients with depression, social anxiety, and panic disorder and (2) explores whether these similarities suffice to allow for pooling the data together, resulting in more training data when prediction intervention dropout. Methods: A total of 6418 routine care patients from the Internet Psychiatry in Stockholm are analyzed using (1) clustering and (2) dropout prediction models. For the latter, prediction models trained on each individual intervention's data are compared to those trained on all three interventions pooled into one dataset. To investigate if results vary with dataset size, the prediction is repeated using small and medium dataset sizes. Results: The clustering analysis identified three distinct groups that are almost equally spread across interventions and are instead characterized by different activity levels. In eight out of nine settings investigated, pooling the data improves prediction results compared to models trained on a single intervention dataset. It is further confirmed that models trained on small datasets are more likely to overestimate prediction results. Conclusion: The study reveals similar patterns of patients with depression, social anxiety, and panic disorder regarding online activity and intervention dropout. As such, this work offers pooling different interventions{\textquoteright} data as a possible approach to counter the problem of small dataset sizes in psychological research.",

keywords = "dropout, e-mental health, ICBT, machine learning, prediction, Informatics, Business informatics",

author = "Kirsten Zantvoort and {Hentati Isacsson}, Nils and Burkhardt Funk and Viktor Kaldo",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

month = may,

day = "15",

doi = "10.1177/20552076241248920",

language = "English",

volume = "10",

journal = "Digital Health",

issn = "2055-2076",

publisher = "SAGE Publications Inc.",

}

RIS

TY - JOUR

T1 - Dataset size versus homogeneity

T2 - A machine learning study on pooling intervention data in e-mental health dropout predictions

AU - Zantvoort, Kirsten

AU - Hentati Isacsson, Nils

AU - Funk, Burkhardt

AU - Kaldo, Viktor

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/5/15

Y1 - 2024/5/15

N2 - Objective: This study proposes a way of increasing dataset sizes for machine learning tasks in Internet-based Cognitive Behavioral Therapy through pooling interventions. To this end, it (1) examines similarities in user behavior and symptom data among online interventions for patients with depression, social anxiety, and panic disorder and (2) explores whether these similarities suffice to allow for pooling the data together, resulting in more training data when prediction intervention dropout. Methods: A total of 6418 routine care patients from the Internet Psychiatry in Stockholm are analyzed using (1) clustering and (2) dropout prediction models. For the latter, prediction models trained on each individual intervention's data are compared to those trained on all three interventions pooled into one dataset. To investigate if results vary with dataset size, the prediction is repeated using small and medium dataset sizes. Results: The clustering analysis identified three distinct groups that are almost equally spread across interventions and are instead characterized by different activity levels. In eight out of nine settings investigated, pooling the data improves prediction results compared to models trained on a single intervention dataset. It is further confirmed that models trained on small datasets are more likely to overestimate prediction results. Conclusion: The study reveals similar patterns of patients with depression, social anxiety, and panic disorder regarding online activity and intervention dropout. As such, this work offers pooling different interventions’ data as a possible approach to counter the problem of small dataset sizes in psychological research.

AB - Objective: This study proposes a way of increasing dataset sizes for machine learning tasks in Internet-based Cognitive Behavioral Therapy through pooling interventions. To this end, it (1) examines similarities in user behavior and symptom data among online interventions for patients with depression, social anxiety, and panic disorder and (2) explores whether these similarities suffice to allow for pooling the data together, resulting in more training data when prediction intervention dropout. Methods: A total of 6418 routine care patients from the Internet Psychiatry in Stockholm are analyzed using (1) clustering and (2) dropout prediction models. For the latter, prediction models trained on each individual intervention's data are compared to those trained on all three interventions pooled into one dataset. To investigate if results vary with dataset size, the prediction is repeated using small and medium dataset sizes. Results: The clustering analysis identified three distinct groups that are almost equally spread across interventions and are instead characterized by different activity levels. In eight out of nine settings investigated, pooling the data improves prediction results compared to models trained on a single intervention dataset. It is further confirmed that models trained on small datasets are more likely to overestimate prediction results. Conclusion: The study reveals similar patterns of patients with depression, social anxiety, and panic disorder regarding online activity and intervention dropout. As such, this work offers pooling different interventions’ data as a possible approach to counter the problem of small dataset sizes in psychological research.

KW - dropout

KW - e-mental health

KW - ICBT

KW - machine learning

KW - prediction

KW - Informatics

KW - Business informatics

UR - http://www.scopus.com/inward/record.url?scp=85193326208&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/49a70e86-edf7-383b-9bfe-25b73aec3f8f/

U2 - 10.1177/20552076241248920

DO - 10.1177/20552076241248920

M3 - Journal articles

C2 - 38757087

AN - SCOPUS:85193326208

VL - 10

JO - Digital Health

JF - Digital Health

SN - 2055-2076

ER -

In der gleichen Zeitschrift

Digital health literacy and well-being among university students: Mediating roles of fear of COVID-19, information satisfaction, and internet information search

Chen, S.-C., Hong Nguyen, N. T., Lin, C.-Y., Huy, L. D., Lai, C.-F., Dang, L. T., Truong, N. L. T., Hoang, N. Y., Nguyen, T. T. P., Phaṇ, T. N., Dadaczynski, K., Okan, O. & Duong, T. V., 06.2023, in: Digital Health. 9, S. 1-10 10 S.

Publikation: Beiträge in Zeitschriften › Zeitschriftenaufsätze › Forschung › begutachtet

Weitere Publikationen dieser Person(en)

Capitalizing on natural language processing (NLP) to automate the evaluation of coach implementation fidelity in guided digital cognitive-behavioral therapy (GdCBT)

Zainal, N. H., Eckhardt, R., Rackoff, G. N., Fitzsimmons-Craft, E. E., Rojas-Ashe, E., Barr Taylor, C., Funk, B., Eisenberg, D., Wilfley, D. E. & Newman, M. G., 02.04.2025, in: Psychological Medicine. 55, e106.

Publikation: Beiträge in Zeitschriften › Zeitschriftenaufsätze › Forschung › begutachtet

Construct relation extraction from scientific papers: Is it automatable yet?

Funk, B. & Scharfenberger, J., 07.01.2025, Proceedings of the 58th Hawaii International Conference on System Sciences, HICSS 2025. Bui, T. X. (Hrsg.). Honolulu: University of Hawaii at Manoa, S. 4675-4684 10 S. (Hawaii International Conference on System Sciences (HICSS); Band 2025).

Publikation: Beiträge in Sammelwerken › Abstracts in Konferenzbänden › Forschung › begutachtet

From Feedback to Formative Guidance: Leveraging LLMs for Personalized Support in Programming Projects

Ghoochani, F., Scharfenberger, J., Funk, B., Doublan, R., Jakharabhai Odedra, M. & Etsiwah, B., 12.06.2025, UMAP 2025 - Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization. Conati, C., Narducci, F., Rossiello, G., Musto, C. & Vassileva, J. (Hrsg.). Association for Computing Machinery, Inc, S. 398-403 6 S. (UMAP 2025 - Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization).

Publikation: Beiträge in Sammelwerken › Aufsätze in Konferenzbänden › Forschung › begutachtet

The promise and challenges of computer mouse trajectories in DMHIs – A feasibility study on pre-treatment dropout predictions

Zantvoort, K., Matthiesen, J., Bjurner, P., Bendix, M., Brefeld, U., Funk, B. & Kaldo, V., 06.2025, in: Internet Interventions. 40, 7 S., 100828.

Publikation: Beiträge in Zeitschriften › Zeitschriftenaufsätze › Forschung › begutachtet

A Universal Digital Stress Management Intervention for Employees: Randomized Controlled Trial with Health-Economic Evaluation

Freund, J., Smit, F., Lehr, D., Zarski, A. C., Berking, M., Riper, H., Funk, B., Ebert, D. D. & Buntrock, C., 22.10.2024, in: Journal of Medical Internet Research. 26, 13 S., e48481.

Publikation: Beiträge in Zeitschriften › Zeitschriftenaufsätze › Forschung › begutachtet

DOI

https://doi.org/10.1177/20552076241248920
Endgültige, publizierte Fassung

Dataset size versus homogeneity: A machine learning study on pooling intervention data in e-mental health dropout predictions

Standard

Harvard

APA

Vancouver

Bibtex

RIS

In der gleichen Zeitschrift

Digital health literacy and well-being among university students: Mediating roles of fear of COVID-19, information satisfaction, and internet information search

Weitere Publikationen dieser Person(en)

Capitalizing on natural language processing (NLP) to automate the evaluation of coach implementation fidelity in guided digital cognitive-behavioral therapy (GdCBT)

Construct relation extraction from scientific papers: Is it automatable yet?

From Feedback to Formative Guidance: Leveraging LLMs for Personalized Support in Programming Projects

The promise and challenges of computer mouse trajectories in DMHIs – A feasibility study on pre-treatment dropout predictions

A Universal Digital Stress Management Intervention for Employees: Randomized Controlled Trial with Health-Economic Evaluation

DOI

Zuletzt angesehen

Aktivitäten