期刊论文详细信息
BMC Genomics
GASS: genome structural annotation for Eukaryotes based on species similarity
Xiaoye Lei1  Nianfeng Song1  Lina Chen1  Ying Wang1 
[1] Department of Automation, School of Information Science and Technology, Xiamen University, Xiamen 361005, Fujian, China
关键词: Rhesus genome;    Dynamic programming;    Species similarity;    Computational method;    Structural genome annotation;   
Others  :  1135431
DOI  :  10.1186/s12864-015-1353-3
 received in 2014-07-16, accepted in 2015-02-18,  发布年份 2015
PDF
【 摘 要 】

Background

With the development of high-throughput sequencing techniques, more and more genomes were sequenced and assembled. However, annotating a genome’s structure rapidly and expressly remains challenging. Current eukaryotic genome annotations require various, abundant supporting data, such as: species-specific and cross-species protein sequences, ESTs, cDNA and RNA-Seq data. Collecting those data and merging their analytical results to achieve a consistent complete annotation is a complex, time and cost consuming task.

Results

In our study, we proposed a fast and easy-to-use computational tool: GASS (Genome Annotation based on Species Similarity). It annotates a eukaryotic genome based on only the annotations from another similar species. With aligning the exons’ sequences of an annotated similar species to the un-annotated genome, GASS detects the optimal transcript annotations with a shortest-path model. In our study, GASS was used to achieve the rhesus annotations based on the human annotations. The produced annotations were evaluated by comparing them to the two existing rhesus annotation databases (RefSeq and Ensembl) directly and being aligned with three RNA-Seq data of rhesus. The experiment results showed that more than 65% RefSeq exons and splicing junctions were exactly found by GASS. GASS’s sensitivity was higher than RefSeq’s, and was close to Ensembl’s. GASS had higher specificities than Ensembl at gene, transcript, exon and splicing junction levels. We also found the mis-assemblies of rheMac3 genome, which led to the 2 bp shifts in annotating position on exons’ boundary and then the incomplete splicing canonical sites in Refseq annotations. These detections were further supported by various data sources.

Conclusions

GASS quickly produces structural genome annotations in sufficient abundance and accuracy. With simple and rapid running of GASS, small labs can create quick views of genome annotations for an un-annotated species, without the necessity to create, collect, analyze and synthesize extra various data sources, or wait several months for the annotations from professional organizations. GASS can be applied to many study occasions, such as the analysis of RNA-Seq datasets from the unannotated species whose genome drafts are available but the annotations are not.

【 授权许可】

   
2015 Wang et al.; licensee BioMed Central.

【 预 览 】
附件列表
Files Size Format View
20150309031115664.pdf 1955KB PDF download
Figure 7. 81KB Image download
Figure 6. 27KB Image download
Figure 5. 36KB Image download
Figure 4. 28KB Image download
Figure 3. 33KB Image download
Figure 2. 16KB Image download
Figure 1. 44KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

【 参考文献 】
  • [1]Yandell M, Ence D: A beginner’s guide to eukaryotic genome annotation. Nat Rev Genet 2012, 13(5):329-42.
  • [2]Mathé C, Sagot M-F, Schiex T, Rouzé P: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res 2002, 30(19):4103-17.
  • [3]Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I: ABySS: a parallel assembler for short read sequence data. Genome Res 2009, 19(6):1117-23.
  • [4]Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 2010, 20(2):265-72.
  • [5]Grabherr MG, Haas BJ, Yassour M: Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat Biotechnol 2011, 29(7):644-52.
  • [6]Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, et al.: Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 2003, 31(19):5654-66.
  • [7]Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009, 25(9):1105-11.
  • [8]Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, Baren Van JM, et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 2010, 28(5):511-5.
  • [9]Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, et al.: MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 2008, 18(1):188-96.
  • [10]Lee J, Wu S, Zhang Y: Ab Initio Protein Structure Prediction. School of Biological Sciences, University of Liverpool, Springer Netherlands; 2009.
  • [11]Stanke M, Waack S: Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 2003, 19(Suppl 2):ii215-25.
  • [12]Stanke M, Schöffmann O, Morgenstern B, Waack S: Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 2006, 7(1):62. BioMed Central Full Text
  • [13]Korf I: Gene finding in novel genomes. BMC Bioinformatics 2004, 5(1):59. BioMed Central Full Text
  • [14]Souvorov A, Kapustin Y, Kiryutin B, Chetvernin V, Tatusova T, Lipman D. Gnomon-NCBI eukaryotic gene prediction tool. National Center for Biotechnology Information; 2010; (online) http://www.ncbi.nlm.nih.gov/genome/guide/gnomon.shtml.
  • [15]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic Local Alignment Search Tool. Mol Biol 1990, 215(3):403-10.
  • [16]Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, et al.: Evolutionary and biomedical insights from the Rhesus Macaque genome. Science 2007, 316(5822):222-34.
  • [17]Pruitt K, Tatusova T, Maglott D: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2005, 33(Database issue):D501-4.
  • [18]Hubbard T, Barker D, Clamp M: The Ensembl genome database project. Nucleic Acids Res 2002, 30(1):38-41.
  • [19]Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, et al.: The UCSC Genome Browser Database. Nucleic Acids Res 2003, 31(1):51-4.
  • [20]Burset M, Seledtsov IA, Solovyev VV: SpliceDB: database of canonical and non-canonical mammalian splice sites. Nucleic Acids Res 2001, 29(1):255-9.
  • [21]Norgren RB: Improving genome assemblies and annotations for nonhuman primates. ILAR J 2013, 54(2):144-53.
  • [22]Zhang X, Goodsell J, Norgren RB: Limitations of the rhesus macaque draft genome assembly and annotation. BMC Genomics 2012, 13(1):206. BioMed Central Full Text
  • [23]Zhang S, Liu C, Shi M, Kong L, Chen J, Zhou W, et al.: RhesusBase: a knowledgebase for the monkey research community. Nucleic Acids Res 2013, 41(Database issue):D892-905.
  • [24]Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al.: GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res 2012, 22(9):1760-74.
  • [25]Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, et al.: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 2010, 464:768-72.
  • [26]Ying W, Lin L: RNA-Seq-based assessment for genome annotation databases. Chin Sci Bull 2013, 58(33):3471-82.
  • [27]Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9(4):357-9.
  • [28]Derti A, Garrett-Engele P, MacIsaac KD, Stevens RC, Sriram S, Chen R, et al.: A quantitative atlas of polyadenylation in five mammals. Genome Res 2012, 22(6):1173-83.
  • [29]Pipes L, Li S, Bozinoski M, Palermo R, Peng X, Blood P, et al.: The non-human primate reference transcriptome resource (NHPRTR) for comparative functional genomics. Nucleic Acids Res 2013, 41(D1):D906-14.
  • [30]Merkin J, Russell C, Chen P, Burge CB: Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science 2012, 338(6114):1593-9.
  • [31]Zhang XO, Yin QF, Wang HB, Zhang Y, Chen T, Zheng P, et al.: Species-specific alternative splicing leads to unique expression of sno-lncRNAs. BMC Genomics 2014, 15(1):287. BioMed Central Full Text
  • [32]Chen J, Peng Z, Zhang R, Yang X: RNA editome in rhesus macaque shaped by purifying selection. PLoS Genet 2014, 10(4):e1004274.
  • [33]Barrenas F, Palermo R, Agricola B, MB A: Deep transcriptional sequencing of mucosal challenge compartment from rhesus macaques acutely infected with simian immunodeficiency virus implicates loss of cell adhesion preceding immune activation. J Virol 2014, 88(14):7962-72.
  • [34]Zhang S, Liu C, Yu P, Zhong X, Chen J, Yang X, et al.: Evolutionary interrogation of human biology in well-annotated genomic framework of rhesus macaque. Mol Biol Evol 2014, 31(5):1309-24.
  文献评价指标  
  下载次数:42次 浏览次数:18次