Frontiers in Genetics | |
Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data | |
Eve Syrkin Wurtele1  Urminder Singh1  Zebulun Arendsee1  Jing Li2  | |
[1] Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, United States;Center for Metabolic Biology, Iowa State University, Ames, IA, United States;Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States;Genetics and Genomics Graduate Program, Iowa State University, Ames, IA, United States; | |
关键词: orphan gene; de novo; RNA-Seq; Ribo-seq; gene function; cluster analysis; | |
DOI : 10.3389/fgene.2021.722981 | |
来源: DOAJ |
【 摘 要 】
The “dark transcriptome” can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins (“orphan-ORFs”); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.
【 授权许可】
Unknown