Overlapping genes represent a fascinating evolutionary puzzle, since they encode two functionally unrelated proteins from the same DNA sequence. They originate by a mechanism of overprinting, in which point mutations in an existing frame allow the expression (the "birth") of a completely new protein from a second frame. In viruses, in which overlapping genes are abundant, these new proteins often play a critical role in infection, yet they are frequently overlooked during genome annotation. This results in erroneous interpretation of mutational studies and in a significant waste of resources. Therefore, overlapping genes need to be correctly detected, especially since they are now thought to be abundant also in eukaryotes. Developing better detection methods and conducting systematic evolutionary studies require a large, reliable benchmark dataset of known cases. We thus assembled a highquality dataset of 80 viral overlapping genes whose expression is experimentally proven. Many of them were not present in databases. We found that overall, overlapping genes differ significantly from non-overlapping genes in their nucleotide and amino acid composition. In particular, the proteins they encode are enriched in high-degeneracy amino acids and depleted in low-degeneracy ones, which may alleviate the evolutionary constraints acting on overlapping genes. Principal component analysis revealed that the vast majority of overlapping genes follow a similar composition bias, despite their heterogeneity in length and function. Six proven mammalian overlapping genes also followed this bias. We propose that this apparently near-universal composition bias may either favour the birth of overlapping genes, or/and result from selection pressure acting on them.

Overlapping genes and the proteins they encode differ significantly in their sequence composition from non-overlapping genes

Vianelli, Alberto
Co-primo
Writing – Original Draft Preparation
;
Chirico, Nicola
Co-primo
Software
;
2018-01-01

Abstract

Overlapping genes represent a fascinating evolutionary puzzle, since they encode two functionally unrelated proteins from the same DNA sequence. They originate by a mechanism of overprinting, in which point mutations in an existing frame allow the expression (the "birth") of a completely new protein from a second frame. In viruses, in which overlapping genes are abundant, these new proteins often play a critical role in infection, yet they are frequently overlooked during genome annotation. This results in erroneous interpretation of mutational studies and in a significant waste of resources. Therefore, overlapping genes need to be correctly detected, especially since they are now thought to be abundant also in eukaryotes. Developing better detection methods and conducting systematic evolutionary studies require a large, reliable benchmark dataset of known cases. We thus assembled a highquality dataset of 80 viral overlapping genes whose expression is experimentally proven. Many of them were not present in databases. We found that overall, overlapping genes differ significantly from non-overlapping genes in their nucleotide and amino acid composition. In particular, the proteins they encode are enriched in high-degeneracy amino acids and depleted in low-degeneracy ones, which may alleviate the evolutionary constraints acting on overlapping genes. Principal component analysis revealed that the vast majority of overlapping genes follow a similar composition bias, despite their heterogeneity in length and function. Six proven mammalian overlapping genes also followed this bias. We propose that this apparently near-universal composition bias may either favour the birth of overlapping genes, or/and result from selection pressure acting on them.
2018
2018
https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0202513&type=printable
Dataset; Genome composition; Multivariate statistics; Virus evolution
Pavesi, Angelo; Vianelli, Alberto; Chirico, Nicola; Bao, Yiming; Blinkova, Olga; Belshaw, Robert; Firth, Andrew; Karlin, David
File in questo prodotto:
File Dimensione Formato  
journal.pone.0202513.pdf

accesso aperto

Descrizione: Articolo principale
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 1.43 MB
Formato Adobe PDF
1.43 MB Adobe PDF Visualizza/Apri
Overlapping genes and the proteins they encode differ in sequence composition_SUPPL MAT.pdf

accesso aperto

Descrizione: Materiale supplementare
Tipologia: Altro materiale allegato
Licenza: Creative commons
Dimensione 461.62 kB
Formato Adobe PDF
461.62 kB Adobe PDF Visualizza/Apri
Overlapping genes and the proteins they encode differ in sequence_S1 DATASET OF OVERLAPS.xls

accesso aperto

Descrizione: Dataset
Tipologia: Altro materiale allegato
Licenza: Creative commons
Dimensione 1.01 MB
Formato Microsoft Excel
1.01 MB Microsoft Excel Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11383/2074688
Citazioni
  • ???jsp.display-item.citation.pmc??? 22
  • Scopus 36
  • ???jsp.display-item.citation.isi??? 37
social impact