The soil sorption coefficient (Koc) is a key physicochemical parameter to assess the environmental risk of organic compounds. To predict soil sorption coefficient in a more effective and economical way, here, quantitative structure-property relationship (QSPR) models were developed based on a large diverse dataset including 964 non-ionic organic compounds. Multiple linear regression (MLR), local lazy regression (LLR) and least squares support vector machine (LS-SVM) were utilized to develop QSPR models based on the four most relevant theoretical molecular descriptors selected by genetic algorithms-variable subset selection (GA-VSS) procedure. The QSPR development strictly followed the OECD principles for QSPR model validation, thus great attentions were paid to internal and external validations, applicability domain and mechanistic interpretation. The obtained results indicate that the LS-SVM model performed better than the MLR and the LLR models. For best LS-SVM model, the correlation coefficients (R2) for the training set was 0.913 and concordance correlation coefficient (CCC) for the prediction set was 0.917. The root-mean square errors (RMSE) were 0.330 and 0.426, respectively. The results of internal and external validations together with applicability domain analysis indicate that the QSPR models proposed in our work are predictive and could provide a useful tool for prediction soil sorption coefficient of new compounds.

Integrated QSPR models to predict the soil sorption coefficient for a large diverse set of compounds by using different modeling methods

GRAMATICA, PAOLA
2014-01-01

Abstract

The soil sorption coefficient (Koc) is a key physicochemical parameter to assess the environmental risk of organic compounds. To predict soil sorption coefficient in a more effective and economical way, here, quantitative structure-property relationship (QSPR) models were developed based on a large diverse dataset including 964 non-ionic organic compounds. Multiple linear regression (MLR), local lazy regression (LLR) and least squares support vector machine (LS-SVM) were utilized to develop QSPR models based on the four most relevant theoretical molecular descriptors selected by genetic algorithms-variable subset selection (GA-VSS) procedure. The QSPR development strictly followed the OECD principles for QSPR model validation, thus great attentions were paid to internal and external validations, applicability domain and mechanistic interpretation. The obtained results indicate that the LS-SVM model performed better than the MLR and the LLR models. For best LS-SVM model, the correlation coefficients (R2) for the training set was 0.913 and concordance correlation coefficient (CCC) for the prediction set was 0.917. The root-mean square errors (RMSE) were 0.330 and 0.426, respectively. The results of internal and external validations together with applicability domain analysis indicate that the QSPR models proposed in our work are predictive and could provide a useful tool for prediction soil sorption coefficient of new compounds.
2014
Genetic algorithms; Local lazy regression; LS-SVM; OECD principles; QSPR; Soil sorption coefficient
Shao, Y.; Liu, J.; Wang, M.; Shi, L.; Yao, X.; Gramatica, Paola
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/1872122
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 34
  • ???jsp.display-item.citation.isi??? 32
social impact