y-Randomization and its variants in QSPR/QSAR

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Standard

y-Randomization and its variants in QSPR/QSAR. / Rücker, Christoph; Rücker, G.; Meringer, M.
in: Journal of Chemical Information and Modeling, Jahrgang 47, Nr. 6, 11.2007, S. 2345-2357.

Publikation: Beiträge in ZeitschriftenZeitschriftenaufsätzeForschungbegutachtet

Harvard

APA

Vancouver

Rücker C, Rücker G, Meringer M. y-Randomization and its variants in QSPR/QSAR. Journal of Chemical Information and Modeling. 2007 Nov;47(6):2345-2357. doi: 10.1021/ci700157b

Bibtex

@article{8f6fa0ee897a42dc8750a2ad415f222f,
title = "y-Randomization and its variants in QSPR/QSAR",
abstract = "y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r 2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r 2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r 2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r 2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.",
keywords = "Chemistry",
author = "Christoph R{\"u}cker and G. R{\"u}cker and M. Meringer",
year = "2007",
month = nov,
doi = "10.1021/ci700157b",
language = "English",
volume = "47",
pages = "2345--2357",
journal = "Journal of Chemical Information and Modeling",
issn = "1549-9596",
publisher = "American Chemical Society",
number = "6",

}

RIS

TY - JOUR

T1 - y-Randomization and its variants in QSPR/QSAR

AU - Rücker, Christoph

AU - Rücker, G.

AU - Meringer, M.

PY - 2007/11

Y1 - 2007/11

N2 - y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r 2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r 2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r 2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r 2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.

AB - y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r 2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r 2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r 2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r 2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.

KW - Chemistry

UR - http://www.scopus.com/inward/record.url?scp=37349097759&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/7083c187-0c81-3b02-95f2-e1f12d23d990/

U2 - 10.1021/ci700157b

DO - 10.1021/ci700157b

M3 - Journal articles

VL - 47

SP - 2345

EP - 2357

JO - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

SN - 1549-9596

IS - 6

ER -

DOI

Zuletzt angesehen

Publikationen

  1. Entwicklung und Qualitätssicherung von Anwendungssoftware
  2. Loan managers’ trust and credit access for SMEs
  3. Sustainability assessments designed for multiple functions
  4. Proletarische und bürgerliche Jünglinge in der Moderne
  5. Divergente Antiken. Benns Dorische Welt
  6. Governing Baltimore by Music
  7. Are levels of democracy affected by mass attitudes? Testing attainment and sustainment effects on democracy
  8. Case-based Mutual Learning Sessions
  9. Pervasive Intelligence
  10. Interactions of CaO with pure Mg and Mg-Ca alloys—an in situ synchrotron radiation diffraction study
  11. Offene Rituale in der Gruppenarbeit mit Kindern
  12. Fostering preservice teachers’ noticing with structured video feedback: Results of an online- and video-based intervention study
  13. Mediengestütze Elternarbeit - ein Überblick
  14. Schulsystem, Selektion und Schulzufriedenheit in Frankreich
  15. Long-term results of a web-based guided self-help intervention for employees with depressive symptoms
  16. Riskante Übergänge
  17. Effect of die design on the welding quality during solid state recycling of AA6060 chips by hot extrusion
  18. Validity and Reliability of Willingness-to-Pay Estimates
  19. Consequences, morality, and time in environmental risk evaluation
  20. Inner conflict resolution and self-empowerment as contribution for personal sustainability on the case of intentional community practices
  21. Home for Hire
  22. Life Cycle Assessment (LCA)
  23. Local organochlorine pesticide concentrations in soil put into a global perspective
  24. “Teach like you do in America”—Personal reflections from teaching across borders in Tanzania and Germany
  25. Small Input Devices Used by the Elderly -
  26. Organizational Decline and Innovation in Manufacturing
  27. Branding the campus