Best Practices in AI and Data Science Models Evaluation
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Standard
INFORMATIK 2025 : The Wide Open - Offenheit von Source bis Science, 16.-19.September 2025 Potsdam. ed. / Ulrike Lucke; Stefan Stieglitz; Falk Uebernickel; Anna-Lena Lamprecht; Maike Klein. Bonn: Gesellschaft für Informatik, Bonn, 2025. p. 1211-1219 (Lecture Notes in Informatics; Vol. P366).
Research output: Contributions to collected editions/works › Article in conference proceedings › Research › peer-review
Harvard
APA
Vancouver
Bibtex
}
RIS
TY - CHAP
T1 - Best Practices in AI and Data Science Models Evaluation
AU - Banerjee, Debayan
AU - Taffa, Tilahun Abedissa
AU - Usbeck, Ricardo
PY - 2025
Y1 - 2025
N2 - Evaluating Artificial Intelligence (AI) and data science models is crucial to ensure their reliability, fairness, and applicability in real-world scenarios. This paper highlights best practices for model evaluation, emphasizing the importance of selecting appropriate metrics aligned with business or research goals. Key considerations include using robust validation strategies (e.g., cross-validation), monitoring for overfitting, and ensuring data splits preserve class distributions. Fairness, interpretability, and reproducibility are essential, particularly in high-stakes domains like healthcare or finance. Additionally, evaluating models across multiple datasets or demographic subgroups helps uncover biases and improve generalizability. Adopting standardized reporting practices andopen-source benchmarks further strengthens the evaluation process. By adhering to these practices, practitioners can build more trustworthy and effective AI systems.
AB - Evaluating Artificial Intelligence (AI) and data science models is crucial to ensure their reliability, fairness, and applicability in real-world scenarios. This paper highlights best practices for model evaluation, emphasizing the importance of selecting appropriate metrics aligned with business or research goals. Key considerations include using robust validation strategies (e.g., cross-validation), monitoring for overfitting, and ensuring data splits preserve class distributions. Fairness, interpretability, and reproducibility are essential, particularly in high-stakes domains like healthcare or finance. Additionally, evaluating models across multiple datasets or demographic subgroups helps uncover biases and improve generalizability. Adopting standardized reporting practices andopen-source benchmarks further strengthens the evaluation process. By adhering to these practices, practitioners can build more trustworthy and effective AI systems.
KW - Business informatics
KW - AI
KW - Data science
KW - Best Practices
KW - Machine learning
KW - Evaluation
UR - https://dl.gi.de/items/1739d595-ebff-416d-b476-8a5344e0e9d6
UR - https://dl.gi.de/collections/910b20e6-455a-4929-a0cd-6f12210ce5f4
U2 - 10.18420/inf2025_105
DO - 10.18420/inf2025_105
M3 - Article in conference proceedings
T3 - Lecture Notes in Informatics
SP - 1211
EP - 1219
BT - INFORMATIK 2025
A2 - Lucke, Ulrike
A2 - Stieglitz, Stefan
A2 - Uebernickel, Falk
A2 - Lamprecht, Anna-Lena
A2 - Klein, Maike
PB - Gesellschaft für Informatik, Bonn
CY - Bonn
ER -
