Discussion on the validity of commonly used reliability indices in sports medicine and exercise science: a critical review with data simulations
Publikation: Beiträge in Zeitschriften › Übersichtsarbeiten › Forschung
Standard
in: European Journal of Applied Physiology, 13.02.2025.
Publikation: Beiträge in Zeitschriften › Übersichtsarbeiten › Forschung
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - JOUR
T1 - Discussion on the validity of commonly used reliability indices in sports medicine and exercise science
T2 - a critical review with data simulations
AU - Warneke, Konstantin
AU - Gronwald, Thomas
AU - Wallot, Sebastian
AU - Magno, Alessia
AU - Hillebrecht, Martin
AU - Wirth, Klaus
PY - 2025/2/13
Y1 - 2025/2/13
N2 - Apart from objectivity and validity, reliability is considered a precondition for testing within scientific works, as unreliable testing protocols limit conclusions, especially for practical application. Classification guidelines commonly refer to relative reliability, focusing on Pearson correlation coefficients (rp) and intraclass correlation coefficients (ICC). On those, the standard error of measurement (SEM) and the minimal detectable change (MDC) are often calculated in addition to the variability coefficient (CV). These, however, do not account for systematic or random errors (e.g., standardization problems). To illustrate, we applied common reliability statistics in sports science on simulated data which extended the sample size of two original counter-movement-jump sessions from (youth) elite basketball players. These show that excellent rp and ICC (≥ 0.9) without a systematic bias were accompanied by a mean absolute percentage error of over 20%. Furthermore, we showed that the ICC does not account for systematic errors and has only limited value for accuracy, which can cause misleading conclusions of data. While a simple re-organization of data caused an improvement in relative reliability and reduced limits of agreement meaningfully, systematic errors occurred. This example underlines the lack of validity and objectivity of commonly used ICC-based reliability statistics (SEM, MDC) to quantify the primary and secondary variance sources. After revealing several caveats in the literature (e.g., neglecting of the systematic and random error or not distinguishing between protocol and device reliability), we suggest a methodological approach to provide reliable data collections as a precondition for valid conclusions by, e.g., recommending pre-set acceptable measurement errors.
AB - Apart from objectivity and validity, reliability is considered a precondition for testing within scientific works, as unreliable testing protocols limit conclusions, especially for practical application. Classification guidelines commonly refer to relative reliability, focusing on Pearson correlation coefficients (rp) and intraclass correlation coefficients (ICC). On those, the standard error of measurement (SEM) and the minimal detectable change (MDC) are often calculated in addition to the variability coefficient (CV). These, however, do not account for systematic or random errors (e.g., standardization problems). To illustrate, we applied common reliability statistics in sports science on simulated data which extended the sample size of two original counter-movement-jump sessions from (youth) elite basketball players. These show that excellent rp and ICC (≥ 0.9) without a systematic bias were accompanied by a mean absolute percentage error of over 20%. Furthermore, we showed that the ICC does not account for systematic errors and has only limited value for accuracy, which can cause misleading conclusions of data. While a simple re-organization of data caused an improvement in relative reliability and reduced limits of agreement meaningfully, systematic errors occurred. This example underlines the lack of validity and objectivity of commonly used ICC-based reliability statistics (SEM, MDC) to quantify the primary and secondary variance sources. After revealing several caveats in the literature (e.g., neglecting of the systematic and random error or not distinguishing between protocol and device reliability), we suggest a methodological approach to provide reliable data collections as a precondition for valid conclusions by, e.g., recommending pre-set acceptable measurement errors.
KW - Accuracy
KW - Practical relevance
KW - Precision
KW - Random errors
KW - Systematic errors
KW - Physical education and sports
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=leuphana_woslite&SrcAuth=WosAPI&KeyUT=WOS:001418897100001&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1007/s00421-025-05720-6
DO - 10.1007/s00421-025-05720-6
M3 - Scientific review articles
C2 - 39939564
JO - European Journal of Applied Physiology
JF - European Journal of Applied Physiology
SN - 1439-6319
ER -