Discussion on the validity of commonly used reliability indices in sports medicine and exercise science: a critical review with data simulations

Publikation: Beiträge in ZeitschriftenÜbersichtsarbeitenForschung

Standard

Harvard

APA

Vancouver

Bibtex

@article{f6fb752497c64375beceb7ee2decf727,
title = "Discussion on the validity of commonly used reliability indices in sports medicine and exercise science: a critical review with data simulations",
abstract = "Apart from objectivity and validity, reliability is considered a precondition for testing within scientific works, as unreliable testing protocols limit conclusions, especially for practical application. Classification guidelines commonly refer to relative reliability, focusing on Pearson correlation coefficients (rp) and intraclass correlation coefficients (ICC). On those, the standard error of measurement (SEM) and the minimal detectable change (MDC) are often calculated in addition to the variability coefficient (CV). These, however, do not account for systematic or random errors (e.g., standardization problems). To illustrate, we applied common reliability statistics in sports science on simulated data which extended the sample size of two original counter-movement-jump sessions from (youth) elite basketball players. These show that excellent rp and ICC (≥ 0.9) without a systematic bias were accompanied by a mean absolute percentage error of over 20%. Furthermore, we showed that the ICC does not account for systematic errors and has only limited value for accuracy, which can cause misleading conclusions of data. While a simple re-organization of data caused an improvement in relative reliability and reduced limits of agreement meaningfully, systematic errors occurred. This example underlines the lack of validity and objectivity of commonly used ICC-based reliability statistics (SEM, MDC) to quantify the primary and secondary variance sources. After revealing several caveats in the literature (e.g., neglecting of the systematic and random error or not distinguishing between protocol and device reliability), we suggest a methodological approach to provide reliable data collections as a precondition for valid conclusions by, e.g., recommending pre-set acceptable measurement errors.",
keywords = "Accuracy, Practical relevance, Precision, Random errors, Systematic errors, Physical education and sports",
author = "Konstantin Warneke and Thomas Gronwald and Sebastian Wallot and Alessia Magno and Martin Hillebrecht and Klaus Wirth",
year = "2025",
month = feb,
day = "13",
doi = "10.1007/s00421-025-05720-6",
language = "English",
journal = "European Journal of Applied Physiology",
issn = "1439-6319",
publisher = "Springer Science and Business Media Deutschland GmbH",

}

RIS

TY - JOUR

T1 - Discussion on the validity of commonly used reliability indices in sports medicine and exercise science

T2 - a critical review with data simulations

AU - Warneke, Konstantin

AU - Gronwald, Thomas

AU - Wallot, Sebastian

AU - Magno, Alessia

AU - Hillebrecht, Martin

AU - Wirth, Klaus

PY - 2025/2/13

Y1 - 2025/2/13

N2 - Apart from objectivity and validity, reliability is considered a precondition for testing within scientific works, as unreliable testing protocols limit conclusions, especially for practical application. Classification guidelines commonly refer to relative reliability, focusing on Pearson correlation coefficients (rp) and intraclass correlation coefficients (ICC). On those, the standard error of measurement (SEM) and the minimal detectable change (MDC) are often calculated in addition to the variability coefficient (CV). These, however, do not account for systematic or random errors (e.g., standardization problems). To illustrate, we applied common reliability statistics in sports science on simulated data which extended the sample size of two original counter-movement-jump sessions from (youth) elite basketball players. These show that excellent rp and ICC (≥ 0.9) without a systematic bias were accompanied by a mean absolute percentage error of over 20%. Furthermore, we showed that the ICC does not account for systematic errors and has only limited value for accuracy, which can cause misleading conclusions of data. While a simple re-organization of data caused an improvement in relative reliability and reduced limits of agreement meaningfully, systematic errors occurred. This example underlines the lack of validity and objectivity of commonly used ICC-based reliability statistics (SEM, MDC) to quantify the primary and secondary variance sources. After revealing several caveats in the literature (e.g., neglecting of the systematic and random error or not distinguishing between protocol and device reliability), we suggest a methodological approach to provide reliable data collections as a precondition for valid conclusions by, e.g., recommending pre-set acceptable measurement errors.

AB - Apart from objectivity and validity, reliability is considered a precondition for testing within scientific works, as unreliable testing protocols limit conclusions, especially for practical application. Classification guidelines commonly refer to relative reliability, focusing on Pearson correlation coefficients (rp) and intraclass correlation coefficients (ICC). On those, the standard error of measurement (SEM) and the minimal detectable change (MDC) are often calculated in addition to the variability coefficient (CV). These, however, do not account for systematic or random errors (e.g., standardization problems). To illustrate, we applied common reliability statistics in sports science on simulated data which extended the sample size of two original counter-movement-jump sessions from (youth) elite basketball players. These show that excellent rp and ICC (≥ 0.9) without a systematic bias were accompanied by a mean absolute percentage error of over 20%. Furthermore, we showed that the ICC does not account for systematic errors and has only limited value for accuracy, which can cause misleading conclusions of data. While a simple re-organization of data caused an improvement in relative reliability and reduced limits of agreement meaningfully, systematic errors occurred. This example underlines the lack of validity and objectivity of commonly used ICC-based reliability statistics (SEM, MDC) to quantify the primary and secondary variance sources. After revealing several caveats in the literature (e.g., neglecting of the systematic and random error or not distinguishing between protocol and device reliability), we suggest a methodological approach to provide reliable data collections as a precondition for valid conclusions by, e.g., recommending pre-set acceptable measurement errors.

KW - Accuracy

KW - Practical relevance

KW - Precision

KW - Random errors

KW - Systematic errors

KW - Physical education and sports

UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=leuphana_woslite&SrcAuth=WosAPI&KeyUT=WOS:001418897100001&DestLinkType=FullRecord&DestApp=WOS_CPL

U2 - 10.1007/s00421-025-05720-6

DO - 10.1007/s00421-025-05720-6

M3 - Scientific review articles

C2 - 39939564

JO - European Journal of Applied Physiology

JF - European Journal of Applied Physiology

SN - 1439-6319

ER -

DOI