Statistical analysis of proteomics data: A review on feature selection

IRIS - Institutional Research Information System
IRIS è il sistema di gestione integrata dei dati della ricerca (persone, progetti, pubblicazioni, attività) adottato dall'Università degli Studi dell’Insubria.

IRInSubria - Institutional Repository Insubria
IRInSubria raccoglie, conserva, documenta e dissemina le informazioni sulla produzione scientifica dell'Università degli Studi dell’Insubria anche ai fini della valutazione della ricerca.

The spread of “-omics” strategies has strongly changed the way of thinking about the scientific method. Indeed, managing huge amounts of data imposes the replacement of the classical deductive approach with a data-driven inductive approach, so to generate mechanistical hypotheses from data. Data reduction is a crucial step in the process of proteomics data analysis, because of the sparsity of significant features in big datasets. Thus, feature selection methods are applied to obtain a set of features based on which a proteomics signature can be drawn, with a functional significance (e.g., classification, diagnosis, prognosis). In this frame, the aim of the present review article is to give an overview of the methods available for proteomics data analysis, with a focus on biomedical translational research. Suggestions for the choice of the most appropriate standard statistical procedures are presented to perform data reduction by feature selection, cross-validation and functional analysis of proteomics profiles. Significance: The proteome, including all so-called “proteoforms” represents the highest level of complexity of biomolecules when compared to the other “-omes” (i.e., genome, transcriptome). For this reason, the use of proper data reduction strategies is mandatory for proteomics data analysis. However, the strategies to be employed for feature selection must be carefully chosen, since many different approaches exist based on both input data and desired output. So far, a well-established decision-making workflow for proteomics data analysis is lacking, opening up to misleading and incorrect data analysis and interpretation. In this review article many statistical approaches are described and compared for their application in the field of biomedical research, in order to suggest the reader the most suitable analysis pathway and to avoid mistakes.

Statistical analysis of proteomics data: A review on feature selection

Lualdi M.;Fasano M.

2019-01-01

Abstract

The spread of “-omics” strategies has strongly changed the way of thinking about the scientific method. Indeed, managing huge amounts of data imposes the replacement of the classical deductive approach with a data-driven inductive approach, so to generate mechanistical hypotheses from data. Data reduction is a crucial step in the process of proteomics data analysis, because of the sparsity of significant features in big datasets. Thus, feature selection methods are applied to obtain a set of features based on which a proteomics signature can be drawn, with a functional significance (e.g., classification, diagnosis, prognosis). In this frame, the aim of the present review article is to give an overview of the methods available for proteomics data analysis, with a focus on biomedical translational research. Suggestions for the choice of the most appropriate standard statistical procedures are presented to perform data reduction by feature selection, cross-validation and functional analysis of proteomics profiles. Significance: The proteome, including all so-called “proteoforms” represents the highest level of complexity of biomolecules when compared to the other “-omes” (i.e., genome, transcriptome). For this reason, the use of proper data reduction strategies is mandatory for proteomics data analysis. However, the strategies to be employed for feature selection must be carefully chosen, since many different approaches exist based on both input data and desired output. So far, a well-established decision-making workflow for proteomics data analysis is lacking, opening up to misleading and incorrect data analysis and interpretation. In this review article many statistical approaches are described and compared for their application in the field of biomedical research, in order to suggest the reader the most suitable analysis pathway and to avoid mistakes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Rivista
	
				JOURNAL OF PROTEOMICS
			
	Url
	
				http://www.elsevier.com
			
	DOI
	
				https://dx.doi.org/10.1016/j.jprot.2018.12.004
			
	Codice PUBMED
	
				30529743
			
	Codice Web of Science
	
				WOS:000463122800004
			
	Codice Scopus
	
				2-s2.0-85058392134
			
	Parole chiave
	
				Dimensionality and Sparsity; Feature selection; Inductive reasoning; Proteomics signature
			
	Tutti gli autori
	
						Lualdi, M.; Fasano, M.
					
	Appare nelle tipologie:
	
				Articolo su Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2086784

Attenzione

L'Ateneo sottopone a validazione solo i file PDF allegati

Citazioni

19

75

63

social impact