Biology Direct | |
A new rhesus macaque assembly and annotation for next-generation sequencing analyses | |
Aleksey V Zimin4  Adam S Cornish7  Mnirnal D Maudhoo7  Robert M Gibbs7  Xiongfei Zhang7  Sanjit Pandey7  Daniel T Meehan7  Kristin Wipfler7  Steven E Bosinger2  Zachary P Johnson2  Gregory K Tharp2  Guillaume Marçais4  Michael Roberts4  Betsy Ferguson1  Howard S Fox5  Todd Treangen3  Steven L Salzberg6  James A Yorke4  Robert B Norgren7  | |
[1] Division of Neurosciences, Primate Genetics Program, Oregon National Primate Research Center, Oregon Health & Sciences University, Beaverton, Oregon 97006, USA | |
[2] Non-Human Primate Genomics Core, Yerkes National Primate Research Center, Robert W. Woodruff Health Sciences Center, Emory University, Atlanta, Georgia 30322, USA | |
[3] Current affiliation: National Biodefense Analysis and Countermeasures Center, Frederick, MD 21702, USA | |
[4] Institute for Physical Science and Technology, University of Maryland, College Park, Maryland 20742, USA | |
[5] Department of Pharmacology and Experimental Neuroscience, University of Nebraska Medical Center, Omaha, Nebraska 68198, USA | |
[6] Center for Computational Biology and Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA | |
[7] Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, Nebraska 68198, USA | |
关键词: Next-generation sequencing; Transcriptome; Annotation; Assembly; Genome; Rhesus macaque; Macaca mulatta; | |
Others : 1084097 DOI : 10.1186/1745-6150-9-20 |
|
received in 2014-07-18, accepted in 2014-10-03, 发布年份 2014 | |
【 摘 要 】
Background
The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.
Results
We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.
Conclusions
The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates.
Reviewers
This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.
【 授权许可】
2014 Zimin et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150113144413509.pdf | 2068KB | download | |
Figure 5. | 83KB | Image | download |
Figure 4. | 29KB | Image | download |
Figure 3. | 300KB | Image | download |
Figure 2. | 20KB | Image | download |
Figure 1. | 62KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
【 参考文献 】
- [1]Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, Batzer MA, Bustamante CD, Eichler EE, Hahn MW, Hardison RC, Makova KD, Miller W, Milosavljevic A, Palermo RE, Siepel A, Sikela JM, Attaway T, Bell S, Bernard KE, Buhay CJ, Chandrabose MN, Dao M, Davis C, Delehaunty KD, Ding Y, et al.: Evolutionary and biomedical insights from the rhesus macaque genome. Science 2007, 316:222-234.
- [2]Vallender EJ: Expanding whole exome resequencing into non-human primates. Genome Biol 2011, 12:R87. BioMed Central Full Text
- [3]Zhang X, Goodsell J, Norgren RB: Limitations of the rhesus macaque draft genome assembly and annotation. BMC Genomics 2012, 13:206. BioMed Central Full Text
- [4]Norgren RB: Improving genome assemblies and annotations for nonhuman primates. ILAR J 2013, 54:144-153.
- [5]Roberto R, Misceo D, D’Addabbo P, Archidiacono N, Rocchi M: Refinement of macaque synteny arrangement with respect to the official rheMac2 macaque sequence assembly. Chromosome Res 2008, 16:977-985.
- [6]Zhang SJ, Liu CJ, Shi M, Kong L, Chen JY, Zhou WZ, Zhu X, Yu P, Wang J, Yang X, Hou N, Ye Z, Zhang R, Xiao R, Zhang X, Li CY: RhesusBase: a knowledgebase for the monkey research community. Nucleic Acids Res 2013, 41:D892-D905.
- [7]Peng X, Pipes L, Xiong H, Green RR, Jones DC, Ruzzo WL, Schroth GP, Mason CE, Palermo RE, Katze MG: Assessment and improvement of Indian-origin rhesus macaque and Mauritian-origin cynomolgus macaque genome annotations using deep transcriptome sequencing data. J Med Primatol 2014, 43:317-328.
- [8]Yan G, Zhang G, Fang X, Zhang Y, Li C, Ling F, Cooper DN, Li Q, Li Y, van Gool AJ, Du H, Chen J, Chen R, Zhang P, Huang Z, Thompson JR, Meng Y, Bai Y, Wang J, Zhuo M, Wang T, Huang Y, Wei L, Li J, Wang Z, Hu H, Yang P, Le L, Stenson PD, Li B, et al.: Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat Biotechnol 2011, 29:1019-1023. 201
- [9]Zimin AV, Marçais G, Puiu D, Roberts M, Salzberg SL, Yorke JA: The MaSuRCA genome assembler. Bioinformatics 2013, 29:2669-2677.
- [10]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215:403-410.
- [11]Karere GM, Froenicke L, Millon L, Womack JE, Lyons LA: A high-resolution radiation hybrid map of rhesus macaque chromosome 5 identifies rearrangements in the genome assembly. Genomics 2008, 92:210-218.
- [12]Murphy WJ, Agarwala R, Schäffer AA, Stephens R, Smith C Jr, Crumpler NJ, David VA, O’Brien SJ: A rhesus macaque radiation hybrid map and comparative analysis with the human genome. Genomics 2005, 86:383-395.
- [13]Ventura M, Ventura M, Antonacci F, Cardone MF, Stanyon R, D’Addabbo P, Cellamare A, Sprague LJ, Eichler EE, Archidiacono N, Rocchi M: Evolutionary formation of new centromeres in macaque. Science 2007, 316:243-246.
- [14]Rocchi M: Synteny block organization of Macaca mulatta. 2013. [http://www.biologia.uniba.it/macaque/ webcite]
- [15]Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol 2000, 7:203-214.
- [16]Wienberg J, Stanyon R, Jauch A, Cremer T: Homologies in human and Macaca fuscata chromosomes revealed by in situ suppression hybridization with human chromosome specific DNA libraries. Chromosoma 1992, 101:265-270.
- [17]Rogers J, Garcia R, Shelledy W, Kaplan J, Arya A, Johnson Z, Bergstrom M, Novakowski L, Nair P, Vinson A, Newman D, Heckman G, Cameron J: An initial genetic linkage map of the rhesus macaque (Macaca mulatta) genome using human microsatellite loci. Genomics 2006, 87:30-38.
- [18]Homer N, Merriman B: TMAP: the Torrent Mapping Alignment Program. [https://github.com/iontorrent/TS/tree/master/Analysis/TMAP webcite]
- [19]Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008, 18:821-829.
- [20]Schulz MH, Zerbino DR, Vingron M, Birney E: Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 2012, 28:1086-1092.
- [21]Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL: TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013, 14:R36. BioMed Central Full Text
- [22]Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L: Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 2012, 31:46-53.
- [23]Gish W, States DJ: Identification of protein coding regions by database similarity search. Nat Genet 1993, 3:266-272.
- [24]Zhou L, Pertea M, Delcher AL, Florea L: Sim4cc: a cross-species spliced alignment program. Nucleic Acids Res 2009, 37:e80.
- [25]Wu TD, Watanabe CK: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 2005, 21:1859-1875.
- [26]Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 2010, 28:511-515.
- [27]Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48:443-453.
- [28]Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP: Integrative genomics viewer. Nat Biotechnol 2011, 29:24-26.
- [29]Keibler E, Brent MR: Eval: a software package for analysis of genome annotations. BMC Bioinformatics 2003, 4:50. BioMed Central Full Text
- [30]NCBI: Macaca mulatta GFF FTP site 2012. [ftp://ftp.ncbi.nih.gov/genomes/Macaca_mulatta/GFF/ref_Primary_Assembly_top_level.gff3.gz]
- [31]Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 2000, 16:276-277.
- [32]Lipman DJ, Pearson WR: Rapid and sensitive protein similarity searches. Science 1985, 227:1435-1441.
- [33]Dobin A, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR: STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29:15-21.
- [34]Narzisi G, Mishra B: Comparing de novo genome assembly: The long and short of it. PLoS One 2011, 6:e19175.
- [35]Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marçais G, Pop M, Yorke JA: GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res 2012, 22:557-567.
- [36]Zimin AV, Roberts M, Marçais G, Salzberg SL, Yorke JA: Mis-assembled “segmental duplications” in two versions of the Bos taurus genome. PLoS One 2012, 7:e42680.
- [37]Hunt M, Newbold C, Berriman M, Otto TD: A comprehensive evaluation of assembly scaffolding tools. Genome Biol 2014, 15:R42. BioMed Central Full Text
- [38]Shiina T, Ota M, Shimizu S, Katsuyama Y, Hashimoto N, Takasu M, Anzai T, Kulski JK, Kikkawa E, Naruse T, Kimura N, Yanagiya K, Watanabe A, Hosomichi K, Kohara S, Iwamoto C, Umehara Y, Meyer A, Wanner V, Sano K, Macquin C, Ikeo K, Tokunaga K, Gojobori T, Inoko H, Bahram S: Rapid evolution of major histocompatibility complex class I genes in primates generates new disease alleles in humans via hitchhiking diversity. Genetics 2006, 1731:1555-1570.
- [39]Daza-Vamenta R, Glusman G, Rowen L, Guthrie B, Geraght DE: Genetic divergence of the rhesus macaque major histocompatibility complex. Genome Res 2004, 14:1501-1515.
- [40]Tung J, Barreiro LB, Johnson ZP, Hansen KD, Michopoulos V, Toufexis D, Michelini K, Wilson ME, Gilad Y: Social environment is associated with gene regulatory variation in the rhesus macaque immune system. Proc Natl Acad Sci 2012, 109:6490-6495.
- [41]Kalin NH: Nonhuman primate studies of fear, anxiety, and temperament and the role of benzodiazepine receptors and GABA systems. J Clin Psychiatry 2003, 64(Suppl 3):41-44.
- [42]Vallender EJ: Bioinformatic approaches to identifying orthologs and assessing evolutionary relationships. Methods 2009, 49:50-55.
- [43]Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Bányai L, Patthy L: Identification and correction of abnormal, incomplete and mispredicted proteins in public databases. BMC Bioinformatics 2008, 9:353. BioMed Central Full Text
- [44]Nagy A, Szláma G, Szarka E, Trexler M, Bányai L, Patthy L: Reassessing domain architecture evolution of metazoan proteins: major impact of gene prediction errors. Genes 2011, 2:449-501.
- [45]Ebeling M, Küng E, See A, Broger C, Steiner G, Berrera M, Heckel T, Iniguez L, Albert T, Schmucki R, Biller H, Singer T, Certa U: Genome-based analysis of the nonhuman primate Macaca fascicularis as a model for drug safety assessment. Genome Res 2011, 21:1746-1756.
- [46]Sandler NG, Bosinger S, Estes J, Zhu R, Tharp G, Boritz E, Levin D, Wijeyesinghe S, Makamdop KN, Del Prete G, Hill B, Timmer J, Reiss E, Darko S, Contijoch E, Todd JP, Silvestri G, Nason M, Norgren RB, Keele N, Rao S, Langer J, Lifson J, Schreiber G, Douek DC: Type I IFN responses in rhesus macaques prevent SIV transmission and slow disease progression. Nature 2014, 511:601-605.