“Omics” techniques (e.g., proteomics, genomics, metabolomics), from which huge datasets can nowadays be obtained, require a different way of thinking about data analysis that can be summarized with the idea that, when data are enough, they can speak for themselves. Indeed, managing huge amounts of data imposes the replacement of the classical deductive approach (hypothesis-driven) with a data-driven hypothesis-generating inductive approach, so to generate mechanistical hypotheses from data. Data reduction is a crucial step in proteomics data analysis, because of the sparsity of significant features in big datasets. Thus, feature selection/extraction methods are applied to obtain a set of features based on which a proteomics signature can be drawn, with a functional significance (e.g., classification, diagnosis, prognosis). Despite big data generated almost daily by proteomics studies, a well-established statistical workflow for data analysis in proteomics is still lacking, opening up to misleading and incorrect data analysis and interpretation. This chapter will give an overview of the methods available for feature selection/extraction in proteomics datasets and how to choose the most appropriate one based on the type of dataset.

Features Selection and Extraction in Statistical Analysis of Proteomics Datasets

Lualdi M.;Fasano M.
2021-01-01

Abstract

“Omics” techniques (e.g., proteomics, genomics, metabolomics), from which huge datasets can nowadays be obtained, require a different way of thinking about data analysis that can be summarized with the idea that, when data are enough, they can speak for themselves. Indeed, managing huge amounts of data imposes the replacement of the classical deductive approach (hypothesis-driven) with a data-driven hypothesis-generating inductive approach, so to generate mechanistical hypotheses from data. Data reduction is a crucial step in proteomics data analysis, because of the sparsity of significant features in big datasets. Thus, feature selection/extraction methods are applied to obtain a set of features based on which a proteomics signature can be drawn, with a functional significance (e.g., classification, diagnosis, prognosis). Despite big data generated almost daily by proteomics studies, a well-established statistical workflow for data analysis in proteomics is still lacking, opening up to misleading and incorrect data analysis and interpretation. This chapter will give an overview of the methods available for feature selection/extraction in proteomics datasets and how to choose the most appropriate one based on the type of dataset.
2021
Cross-validation; Discriminant analysis; Features extraction; Features selection; Principal component analysis; Proteomics; Signature; Sparsity; Supervised/unsupervised methods; Univariate/multivariate methods
Lualdi, M.; Fasano, M.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2119166
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact