y-Randomization and its variants in QSPR/QSAR

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Standard

y-Randomization and its variants in QSPR/QSAR. / Rücker, Christoph; Rücker, G.; Meringer, M.
in: Journal of Chemical Information and Modeling, Jahrgang 47, Nr. 6, 11.2007, S. 2345-2357.

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Harvard

APA

Vancouver

Rücker C, Rücker G, Meringer M. y-Randomization and its variants in QSPR/QSAR. Journal of Chemical Information and Modeling. 2007 Nov;47(6):2345-2357. doi: 10.1021/ci700157b

Bibtex

@article{8f6fa0ee897a42dc8750a2ad415f222f,
title = "y-Randomization and its variants in QSPR/QSAR",
abstract = "y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r 2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r 2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r 2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r 2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.",
keywords = "Chemistry",
author = "Christoph R{\"u}cker and G. R{\"u}cker and M. Meringer",
year = "2007",
month = nov,
doi = "10.1021/ci700157b",
language = "English",
volume = "47",
pages = "2345--2357",
journal = "Journal of Chemical Information and Modeling",
issn = "1549-9596",
publisher = "American Chemical Society",
number = "6",

}

RIS

TY - JOUR

T1 - y-Randomization and its variants in QSPR/QSAR

AU - Rücker, Christoph

AU - Rücker, G.

AU - Meringer, M.

PY - 2007/11

Y1 - 2007/11

N2 - y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r 2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r 2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r 2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r 2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.

AB - y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r 2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r 2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r 2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r 2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.

KW - Chemistry

UR - http://www.scopus.com/inward/record.url?scp=37349097759&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/7083c187-0c81-3b02-95f2-e1f12d23d990/

U2 - 10.1021/ci700157b

DO - 10.1021/ci700157b

M3 - Journal articles

VL - 47

SP - 2345

EP - 2357

JO - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

SN - 1549-9596

IS - 6

ER -

DOI

Zuletzt angesehen

Publikationen

  1. § 4 Grundzüge des materiellen Fusionskontrollrechts
  2. Von Freihandelsabkommen zu Nachhaltigkeitsabkommen
  3. Biopolitical Interventions in the Urban Data Space
  4. Anmerkung zu BGH, Beschluss v. 20.05.2015 - 4 StR 555/14
  5. Rezension Kieran Maguire, 2020, The Price of Football
  6. The German skills machine, ed. by Pepper D. Culpepper ...
  7. Von Menschen, Mächten und Gewalten, eine Himmelslehre
  8. Action rate models for predicting actions in soccer
  9. Psychological approaches to entrepreneurial success
  10. Give and take frames in shared-resource negotiations
  11. Professionswissen in den Naturwissenschaften (ProwiN)
  12. BAuA-Arbeitszeitbefragung: Vergleich 2015 – 2017 – 2019
  13. Aesthetics of Sustainability for the Ecological Age
  14. 2D QSAR of PPARγ agonist binding and transactivation.
  15. Performance Saga: Performances, Interviews, Ereignisse
  16. Sudoko mathematics for and done by younger students
  17. Wiederherstellung degradierter Sandheidelebensräume
  18. The role of private litigation in market regulation
  19. Investigacion con celulas troncales humanas adultas
  20. Unterscheidung ist noch lange keine Diskriminierung
  21. Julius Meier-Graefe und die plurale Logik der Bilder
  22. Algorithmic Catastrophe - the Revenge of Contingency
  23. Time Headway and Comfort in Adaptive Cruise Control
  24. Mit "gesunder" Mischung zu Stabilität und Integration?
  25. What is sustainable agriculture? A systematic review
  26. A Psychiatric Clinic, a Monastery, a City and a River
  27. Comfortable Time Headways in Adaptive Cruise Control:
  28. Intellectual humility links to metacognitive ability
  29. What do we know about empirical joint audit research?
  30. Lagged effects in the Balanced Scorecard - Case Study
  31. Co-adoption pathways toward a low-carbon energy system
  32. Family firm identity and capital structure decisions
  33. Democracy and the global spread of progressive taxes
  34. Developing and Evaluating Entrepreneurship Curricula
  35. Towards a Cyclical Concept of Real-World Laboratories
  36. Newspapers and the circulation of academic knowledge