Statistical external validation and consensus modeling: A QSPR case study for Koc prediction

IRIS - Institutional Research Information System
IRIS è il sistema di gestione integrata dei dati della ricerca (persone, progetti, pubblicazioni, attività) adottato dall'Università degli Studi dell’Insubria.

IRInSubria - Institutional Repository Insubria
IRInSubria raccoglie, conserva, documenta e dissemina le informazioni sulla produzione scientifica dell'Università degli Studi dell’Insubria anche ai fini della valutazione della ricerca.

The soil sorption partition coefficient (log Koc) of a heterogeneous set of 643 organic non-ionic compounds, with a range of more than 6 log units, is predicted by a statistically validated QSAR modeling approach. The applied multiple linear regression (ordinary least squares, OLS) is based on a variety of theoretical molecular descriptors selected by the genetic algorithms-variable subset selection (GA-VSS) procedure. The models were validated for predictivity by different internal and external validation approaches. For external validation we applied self organizing maps (SOM) to split the original data set: the best four-dimensional model, developed on a reduced training set of 93 chemicals, has a predictivity of 78% when applied on 550 validation chemicals (prediction set). The selected molecular descriptors, which could be interpreted through their mechanistic meaning, were compared with the more common physico-chemical descriptors log Kow and log Sw. The chemical applicability domain of each model was verified by the leverage approach in order to propose only reliable data. The best predicted data were obtained by consensus modeling from 10 different models in the genetic algorithm model population.

Statistical external validation and consensus modeling: A QSPR case study for Koc prediction

GRAMATICA, PAOLA;GIANI E;PAPA, ESTER

2007-01-01

Abstract

The soil sorption partition coefficient (log Koc) of a heterogeneous set of 643 organic non-ionic compounds, with a range of more than 6 log units, is predicted by a statistically validated QSAR modeling approach. The applied multiple linear regression (ordinary least squares, OLS) is based on a variety of theoretical molecular descriptors selected by the genetic algorithms-variable subset selection (GA-VSS) procedure. The models were validated for predictivity by different internal and external validation approaches. For external validation we applied self organizing maps (SOM) to split the original data set: the best four-dimensional model, developed on a reduced training set of 93 chemicals, has a predictivity of 78% when applied on 550 validation chemicals (prediction set). The selected molecular descriptors, which could be interpreted through their mechanistic meaning, were compared with the more common physico-chemical descriptors log Kow and log Sw. The chemical applicability domain of each model was verified by the leverage approach in order to propose only reliable data. The best predicted data were obtained by consensus modeling from 10 different models in the genetic algorithm model population.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2007
			
	Rivista
	
				JOURNAL OF MOLECULAR GRAPHICS & MODELLING
			
	DOI
	
				https://dx.doi.org/10.1016/j.jmgm.2006.06.005
			
	Codice PUBMED
	
				16890002
			
	Codice Web of Science
	
				WOS:000245802900001
			
	Codice Scopus
	
				2-s2.0-33846809281
			
	Parole chiave
	
				Theoretical molecular descriptors, Genetic algorithms, Splitting, Soil sorption coefficient, Koc, QSAR, OECD principles
			
	Tutti gli autori
	
						Gramatica, Paola; Giani, E; Papa, Ester
					
	Appare nelle tipologie:
	
				Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
2007_Koc.pdf non disponibili Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 880.46 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	880.46 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/1668884

Attenzione

L'Ateneo sottopone a validazione solo i file PDF allegati

Citazioni

17

237

228

social impact