Features Selection and Extraction in Statistical Analysis of Proteomics Datasets

IRIS - Institutional Research Information System
IRIS è il sistema di gestione integrata dei dati della ricerca (persone, progetti, pubblicazioni, attività) adottato dall'Università degli Studi dell’Insubria.

IRInSubria - Institutional Repository Insubria
IRInSubria raccoglie, conserva, documenta e dissemina le informazioni sulla produzione scientifica dell'Università degli Studi dell’Insubria anche ai fini della valutazione della ricerca.

“Omics” techniques (e.g., proteomics, genomics, metabolomics), from which huge datasets can nowadays be obtained, require a different way of thinking about data analysis that can be summarized with the idea that, when data are enough, they can speak for themselves. Indeed, managing huge amounts of data imposes the replacement of the classical deductive approach (hypothesis-driven) with a data-driven hypothesis-generating inductive approach, so to generate mechanistical hypotheses from data. Data reduction is a crucial step in proteomics data analysis, because of the sparsity of significant features in big datasets. Thus, feature selection/extraction methods are applied to obtain a set of features based on which a proteomics signature can be drawn, with a functional significance (e.g., classification, diagnosis, prognosis). Despite big data generated almost daily by proteomics studies, a well-established statistical workflow for data analysis in proteomics is still lacking, opening up to misleading and incorrect data analysis and interpretation. This chapter will give an overview of the methods available for feature selection/extraction in proteomics datasets and how to choose the most appropriate one based on the type of dataset.

Features Selection and Extraction in Statistical Analysis of Proteomics Datasets

Lualdi M.;Fasano M.

2021-01-01

Abstract

“Omics” techniques (e.g., proteomics, genomics, metabolomics), from which huge datasets can nowadays be obtained, require a different way of thinking about data analysis that can be summarized with the idea that, when data are enough, they can speak for themselves. Indeed, managing huge amounts of data imposes the replacement of the classical deductive approach (hypothesis-driven) with a data-driven hypothesis-generating inductive approach, so to generate mechanistical hypotheses from data. Data reduction is a crucial step in proteomics data analysis, because of the sparsity of significant features in big datasets. Thus, feature selection/extraction methods are applied to obtain a set of features based on which a proteomics signature can be drawn, with a functional significance (e.g., classification, diagnosis, prognosis). Despite big data generated almost daily by proteomics studies, a well-established statistical workflow for data analysis in proteomics is still lacking, opening up to misleading and incorrect data analysis and interpretation. This chapter will give an overview of the methods available for feature selection/extraction in proteomics datasets and how to choose the most appropriate one based on the type of dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Rivista
	
				JOURNAL OF PROTEOMICS
			
	DOI
	
				https://dx.doi.org/10.1007/978-1-0716-1641-3_9
			
	Codice PUBMED
	
				34236660
			
	Codice Web of Science
	
				WOS:000793618100010
			
	Codice Scopus
	
				2-s2.0-85109981733
			
	Parole chiave
	
				Cross-validation; Discriminant analysis; Features extraction; Features selection; Principal component analysis; Proteomics; Signature; Sparsity; Supervised/unsupervised methods; Univariate/multivariate methods
			
	Tutti gli autori
	
						Lualdi, M.; Fasano, M.
					
	Appare nelle tipologie:
	
				Articolo su Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2119166

Attenzione

L'Ateneo sottopone a validazione solo i file PDF allegati

Citazioni

1

1

1

social impact