The Impact of Sample Size on Reliability Metrics Stability in Isokinetic Strength Assessments: Does Size Matter?

Konstantin Warneke; Michael Keiner; Sebastian Wallot; Stanislav D. Siegel; Christian Günther; Klaus Wirth; Sebastian Puschkasch-Möck

doi:10.1080/1091367X.2025.2494998

The Impact of Sample Size on Reliability Metrics Stability in Isokinetic Strength Assessments: Does Size Matter?

Research output: Journal contributions › Journal articles › Research › peer-review

Standard

The Impact of Sample Size on Reliability Metrics Stability in Isokinetic Strength Assessments: Does Size Matter? / Warneke, Konstantin; Keiner, Michael; Wallot, Sebastian et al.
In: Measurement in Physical Education and Exercise Science, 23.04.2025.

Research output: Journal contributions › Journal articles › Research › peer-review

Bibtex

@article{e853d8dcef704d7b85311a44e69b95cc,

title = "The Impact of Sample Size on Reliability Metrics Stability in Isokinetic Strength Assessments: Does Size Matter?",

abstract = "The ability to reliably capture performance parameters must be considered as crucially important to produce valid study results. The ICC and the inclusion of the calculation of the standard error of measurement and the minimal detectable change became the most common way to justify subsequent testing procedures to be reliable. However, early studies around the new millennium identified weaknesses of the ICC and proposed the implementation of more elaborate procedures, including the quantification of the systematic bias and the quantification of the random error via the mean absolute error or mean absolute percentage error. According to the law of large number and earlier research indicating that relative indices such as correlation coefficients necessitate a minimum sample size to stabilize, it was hypothesized that reliability indices follow an optimal sample size trend. In accordance with previous studies in correlation coefficients, this study highlights the importance of including high numbers of participants to receive stable reliability measures. The random error was not significantly affected by increased samples while providing important information about the performed standardization success in the testing, the study also underlines the relevance of reporting not only ICC-based reliability statistics but also the quantification of random errors.",

keywords = "intraclass correlation coefficient, law of large numbers, measurement errors, reliability, repeatability, Psychology",

author = "Konstantin Warneke and Michael Keiner and Sebastian Wallot and Siegel, {Stanislav D.} and Christian G{\"u}nther and Klaus Wirth and Sebastian Puschkasch-M{\"o}ck",

note = "Publisher Copyright: {\textcopyright} 2025 The Author(s). Published with license by Taylor & Francis Group, LLC.",

year = "2025",

month = apr,

day = "23",

doi = "10.1080/1091367X.2025.2494998",

language = "English",

journal = "Measurement in Physical Education and Exercise Science",

issn = "1091-367X",

publisher = "Routledge Taylor & Francis Group",

}

RIS

TY - JOUR

T1 - The Impact of Sample Size on Reliability Metrics Stability in Isokinetic Strength Assessments

T2 - Does Size Matter?

AU - Warneke, Konstantin

AU - Keiner, Michael

AU - Wallot, Sebastian

AU - Siegel, Stanislav D.

AU - Günther, Christian

AU - Wirth, Klaus

AU - Puschkasch-Möck, Sebastian

PY - 2025/4/23

Y1 - 2025/4/23

N2 - The ability to reliably capture performance parameters must be considered as crucially important to produce valid study results. The ICC and the inclusion of the calculation of the standard error of measurement and the minimal detectable change became the most common way to justify subsequent testing procedures to be reliable. However, early studies around the new millennium identified weaknesses of the ICC and proposed the implementation of more elaborate procedures, including the quantification of the systematic bias and the quantification of the random error via the mean absolute error or mean absolute percentage error. According to the law of large number and earlier research indicating that relative indices such as correlation coefficients necessitate a minimum sample size to stabilize, it was hypothesized that reliability indices follow an optimal sample size trend. In accordance with previous studies in correlation coefficients, this study highlights the importance of including high numbers of participants to receive stable reliability measures. The random error was not significantly affected by increased samples while providing important information about the performed standardization success in the testing, the study also underlines the relevance of reporting not only ICC-based reliability statistics but also the quantification of random errors.

AB - The ability to reliably capture performance parameters must be considered as crucially important to produce valid study results. The ICC and the inclusion of the calculation of the standard error of measurement and the minimal detectable change became the most common way to justify subsequent testing procedures to be reliable. However, early studies around the new millennium identified weaknesses of the ICC and proposed the implementation of more elaborate procedures, including the quantification of the systematic bias and the quantification of the random error via the mean absolute error or mean absolute percentage error. According to the law of large number and earlier research indicating that relative indices such as correlation coefficients necessitate a minimum sample size to stabilize, it was hypothesized that reliability indices follow an optimal sample size trend. In accordance with previous studies in correlation coefficients, this study highlights the importance of including high numbers of participants to receive stable reliability measures. The random error was not significantly affected by increased samples while providing important information about the performed standardization success in the testing, the study also underlines the relevance of reporting not only ICC-based reliability statistics but also the quantification of random errors.

KW - intraclass correlation coefficient

KW - law of large numbers

KW - measurement errors

KW - reliability

KW - repeatability

KW - Psychology

UR - http://www.scopus.com/inward/record.url?scp=105003144436&partnerID=8YFLogxK

U2 - 10.1080/1091367X.2025.2494998

DO - 10.1080/1091367X.2025.2494998

M3 - Journal articles

AN - SCOPUS:105003144436

JO - Measurement in Physical Education and Exercise Science

JF - Measurement in Physical Education and Exercise Science

SN - 1091-367X

ER -

Related by journal

Using rating scales for the assessment of physical self-concept: Why the number of response categories matters

Freund, P. A., Tietjens, M. & Strauss, B., 01.10.2013, In: Measurement in Physical Education and Exercise Science. 17, 4, p. 249-263 15 p.

Research output: Journal contributions › Journal articles › Research › peer-review

Other publications by the same author(s)

Can measurement errors explain variance in the relationship between muscle- and tendon stiffness and range of motion?—a blinded reliability and objectivity study

Warneke, K., Meder, J., Plöschberger, G., Oraže, M., Zechner, M., Jochum, D., Siegel, S. D. & Konrad, A., 09.2025, In: European Journal of Applied Physiology. 125, 9, p. 2415-2430 16 p.

Research output: Journal contributions › Journal articles › Research › peer-review

Can the velocity profile in the bench press and the bench pull sufficiently estimate the one repetition maximum in youth elite cross-country ski and biathlon athletes?

Wagner, C. M., Keiner, M., Puschkasch-Möck, S., Wirth, K., Schiemann, S. & Warneke, K., 12.2025, In: BMC Sports Science, Medicine and Rehabilitation. 17, 1, 11 p., 102.

Research output: Journal contributions › Journal articles › Research › peer-review

Comment on “Stretching intervention can prevent muscle injuries: a systematic review and meta-analysis”

Afonso, J., Costa, P. B. & Warneke, K., 06.2025, In: Sport Sciences for Health. 21, 2, p. 1311-1312 2 p.

Research output: Journal contributions › Comments / Debate / Reports › Research

Discussion on the validity of commonly used reliability indices in sports medicine and exercise science: a critical review with data simulations

Warneke, K., Gronwald, T., Wallot, S., Magno, A., Hillebrecht, M. & Wirth, K., 06.2025, In: European Journal of Applied Physiology. 125, 6, p. 1511-1526 16 p., e0216065.

Research output: Journal contributions › Scientific review articles › Research

Examiner experience moderates reliability of human lower extremity muscle ultrasound measurement – a double blinded measurement error study

Warneke, K., Siegel, S. D., Drabow, J., Lohmann, L. H., Jochum, D., Freitas, S. R., Afonso, J. & Konrad, A., 12.2025, In: Ultrasound Journal. 17, 1, 13 p., 20.

Research output: Journal contributions › Journal articles › Research › peer-review

DOI

https://doi.org/10.1080/1091367X.2025.2494998
Final published version

The Impact of Sample Size on Reliability Metrics Stability in Isokinetic Strength Assessments: Does Size Matter?

Standard

Harvard

APA

Vancouver

Bibtex

RIS

Related by journal

Using rating scales for the assessment of physical self-concept: Why the number of response categories matters

Other publications by the same author(s)

Can measurement errors explain variance in the relationship between muscle- and tendon stiffness and range of motion?—a blinded reliability and objectivity study

Can the velocity profile in the bench press and the bench pull sufficiently estimate the one repetition maximum in youth elite cross-country ski and biathlon athletes?

Comment on “Stretching intervention can prevent muscle injuries: a systematic review and meta-analysis”

Discussion on the validity of commonly used reliability indices in sports medicine and exercise science: a critical review with data simulations

Examiner experience moderates reliability of human lower extremity muscle ultrasound measurement – a double blinded measurement error study

DOI

Recently viewed

Activities

Publications