The evaluation of linear regression QSAR models performances, both in fitting and external prediction, is of pivotal importance. In the last decade different external validation parameters have been proposed: Q2F1 (Shi), Q2F2 (Schüürmann), Q2F3 (Consonni), r2m (Roy) and the Golbraikh- Tropsha method. These parameters usually are in accordance, making one confident of a model predictivity, but doubts arise when they give contradictory results: we are thus looking for a simpler method to evaluate the external predictivity of the models, independently on the set composition. In our opinion, the simplest method consists in the quantification of the similarity among the experimental data of external test set versus the corresponding values calculated by the model: we thus propose the concordance correlation coefficient. In this study we use our proposal as a reference evaluating the agreement with the other validation parameters by means of 210.000 simulated datasets: concerning the more realistic ones, 95% of agreement has been found and the concordance correlation coefficient proved to be the most precautionary. We studied two possible disagreement scenarios: a) the external data points are well predicted, while at least one of the validation parameters rejects the model (rare), b) the matching is not good and one or more validation parameters accept the model (less rare). The second alternative is dangerous for QSAR models. Our method, verified also on real models, is proposed in this presentation as a tool to be used in addition to the aforementioned external validation parameters to find possible unpredictive critical models.
On the agreement of External validation parameters for linear regression QSAR models
CHIRICO, NICOLA;PAPA, ESTER;GRAMATICA, PAOLA
2011-01-01
Abstract
The evaluation of linear regression QSAR models performances, both in fitting and external prediction, is of pivotal importance. In the last decade different external validation parameters have been proposed: Q2F1 (Shi), Q2F2 (Schüürmann), Q2F3 (Consonni), r2m (Roy) and the Golbraikh- Tropsha method. These parameters usually are in accordance, making one confident of a model predictivity, but doubts arise when they give contradictory results: we are thus looking for a simpler method to evaluate the external predictivity of the models, independently on the set composition. In our opinion, the simplest method consists in the quantification of the similarity among the experimental data of external test set versus the corresponding values calculated by the model: we thus propose the concordance correlation coefficient. In this study we use our proposal as a reference evaluating the agreement with the other validation parameters by means of 210.000 simulated datasets: concerning the more realistic ones, 95% of agreement has been found and the concordance correlation coefficient proved to be the most precautionary. We studied two possible disagreement scenarios: a) the external data points are well predicted, while at least one of the validation parameters rejects the model (rare), b) the matching is not good and one or more validation parameters accept the model (less rare). The second alternative is dangerous for QSAR models. Our method, verified also on real models, is proposed in this presentation as a tool to be used in addition to the aforementioned external validation parameters to find possible unpredictive critical models.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.