y-Randomization and its variants in QSPR/QSAR

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r 2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r 2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r 2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r 2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.

Translated title of the contributiony-Randomisierung in QSPR/QSAR
Original languageEnglish
JournalJournal of Chemical Information and Modeling
Volume47
Issue number6
Pages (from-to)2345-2357
Number of pages13
ISSN1549-9596
DOIs
Publication statusPublished - 11.2007
Externally publishedYes

DOI

Recently viewed

Publications

  1. What is a Digital Object?
  2. Two high-mountain burnet moth species (Lepidoptera, Zygaenidae) react differently to the global change drivers climate and land-use
  3. I can make it!
  4. Update wurde nicht ausgeführt
  5. Researching participation in environmental governance through the implementation of the European Water Framework Directive
  6. Luhmann-Handbuch
  7. Reform of the Injunctions Directive and Compensation for Consumers
  8. Lernen und Wiederlernen in chatbasiertem Computer-Supported Collaborative Learning
  9. "Echte Kerle lesen doch!?"
  10. Business innovation symposium "At what Price? IP-Related Thoughts on New Business Models for Space Information"
  11. Resistance against cyber-surveillance within social movements and how surveillance adapts
  12. Diderot, or The Power of Critique
  13. Das Verblassen des Unsichtbaren
  14. Foreign affairs
  15. "Göttersymbole"
  16. Multiple streams, leaked opportunities, and entrepreneurship in the EU agenda against tax avoidance
  17. Evaluating entrepreneurship curricula
  18. Do African Parties Contribute to Democracy? Some Findings from Kenya, Ghana and Nigeria
  19. Vorbemerkung zu den §§ 15 ff AktG
  20. Fraktionsdisziplin
  21. Die Kunst des Defensivspiels. Der Briefwechsel zwischen Hans Blumenberg und Jacob Taubes
  22. Kompetenzentwicklung über die Lebensspanne
  23. Qualitätssicherung der Tätigkeit des unabhängigen Finanzexperten
  24. Punk - Keine Macht für Niemand
  25. Klassenwiederholen in PISA-I-Plus
  26. Rezension zu: Understanding the city, contemporary and future perspectives, John Eade and Christopher Mele (eds.), Oxford, UK Blackwell, 2002, 384 pp.