y-Randomization and its variants in QSPR/QSAR

Research output: Journal contributionsJournal articlesResearchpeer-review

Authors

y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r 2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r 2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r 2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r 2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.

Translated title of the contributiony-Randomisierung in QSPR/QSAR
Original languageEnglish
JournalJournal of Chemical Information and Modeling
Volume47
Issue number6
Pages (from-to)2345-2357
Number of pages13
ISSN1549-9596
DOIs
Publication statusPublished - 11.2007
Externally publishedYes

DOI

Recently viewed

Publications

  1. Modeling Self-Organization
  2. The dependency of the banks’ assets and liabilities
  3. Payments for ecosystem services – for efficiency and for equity?
  4. Understanding role models for change
  5. Computer-Kriegs-Spiele oder: eine Kultur der Gewalt
  6. A new didactic approach in Engineering Education for conceptual understanding of Euler's Formula
  7. Numerical responses of saproxylic beetles to rapid increases in dead wood availability following geometrid moth outbreaks in sub-arctic mountain birch forest
  8. Credit Constraints and Exports
  9. Erosion modelling designed for water quality simulation
  10. Lernende in der Hauptschule
  11. Mentoring in International Assignments
  12. An idea and a person whose time had come
  13. Update wurde nicht ausgeführt
  14. Canopy functional trait variation across Earth’s tropical forests
  15. Learning from the richness of diversity
  16. Artificial intelligence in sustainable development research
  17. Researching participation in environmental governance through the implementation of the European Water Framework Directive
  18. Deriving Collaboration Cases in Production Networks Considering Smart Services
  19. Ist jetzt alles Netzwerk?
  20. The human resource practices of small businesses
  21. Regulating Globalization
  22. Modellierung effektiver und ineffektiver Chatdiskurse im Computer-Supported Collaborative Learning mit Hiden Markov Models
  23. Multiscale process simulation of residual stress fields of laser beam welded precipitation hardened AA6082
  24. One-year follow-up results of unsupported online computerized cognitive behavioural therapy for depression in primary care
  25. The Objects of Scientific Study
  26. Pesticide and metabolite fate, release and transport modelling at catchment scale
  27. Bereichsrezensionen
  28. How smart do you think you are?
  29. Geliebtes Geheimnis, das bin ja ich selbst