The spread of “-omics” strategies has strongly changed the way of thinking about the scientific method. Indeed, managing huge amounts of data imposes the replacement of the classical deductive approach with a data-driven inductive approach, so to generate mechanistical hypotheses from data. Data reduction is a crucial step in the process of proteomics data analysis, because of the sparsity of significant features in big datasets. Thus, feature selection methods are applied to obtain a set of features based on which a proteomics signature can be drawn, with a functional significance (e.g., classification, diagnosis, prognosis). In this frame, the aim of the present review article is to give an overview of the methods available for proteomics data analysis, with a focus on biomedical translational research. Suggestions for the choice of the most appropriate standard statistical procedures are presented to perform data reduction by feature selection, cross-validation and functional analysis of proteomics profiles. Significance: The proteome, including all so-called “proteoforms” represents the highest level of complexity of biomolecules when compared to the other “-omes” (i.e., genome, transcriptome). For this reason, the use of proper data reduction strategies is mandatory for proteomics data analysis. However, the strategies to be employed for feature selection must be carefully chosen, since many different approaches exist based on both input data and desired output. So far, a well-established decision-making workflow for proteomics data analysis is lacking, opening up to misleading and incorrect data analysis and interpretation. In this review article many statistical approaches are described and compared for their application in the field of biomedical research, in order to suggest the reader the most suitable analysis pathway and to avoid mistakes.

Statistical analysis of proteomics data: A review on feature selection

Lualdi M.;Fasano M.
2019-01-01

Abstract

The spread of “-omics” strategies has strongly changed the way of thinking about the scientific method. Indeed, managing huge amounts of data imposes the replacement of the classical deductive approach with a data-driven inductive approach, so to generate mechanistical hypotheses from data. Data reduction is a crucial step in the process of proteomics data analysis, because of the sparsity of significant features in big datasets. Thus, feature selection methods are applied to obtain a set of features based on which a proteomics signature can be drawn, with a functional significance (e.g., classification, diagnosis, prognosis). In this frame, the aim of the present review article is to give an overview of the methods available for proteomics data analysis, with a focus on biomedical translational research. Suggestions for the choice of the most appropriate standard statistical procedures are presented to perform data reduction by feature selection, cross-validation and functional analysis of proteomics profiles. Significance: The proteome, including all so-called “proteoforms” represents the highest level of complexity of biomolecules when compared to the other “-omes” (i.e., genome, transcriptome). For this reason, the use of proper data reduction strategies is mandatory for proteomics data analysis. However, the strategies to be employed for feature selection must be carefully chosen, since many different approaches exist based on both input data and desired output. So far, a well-established decision-making workflow for proteomics data analysis is lacking, opening up to misleading and incorrect data analysis and interpretation. In this review article many statistical approaches are described and compared for their application in the field of biomedical research, in order to suggest the reader the most suitable analysis pathway and to avoid mistakes.
2019
http://www.elsevier.com
Dimensionality and Sparsity; Feature selection; Inductive reasoning; Proteomics signature
Lualdi, M.; Fasano, M.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2086784
 Attenzione

L'Ateneo sottopone a validazione solo i file PDF allegati

Citazioni
  • ???jsp.display-item.citation.pmc??? 14
  • Scopus 56
  • ???jsp.display-item.citation.isi??? 49
social impact