期刊论文详细信息
BMC Bioinformatics
A composite genome approach to identify phylogenetically informative data from next-generation sequencing
Rachel S. Schwartz2  Kelly M. Harkins4  Anne C. Stone3  Reed A. Cartwright1 
[1] School of Life Sciences, Arizona State University, Tempe, AZ, USA
[2] The Biodesign Institute, Arizona State University, Tempe, AZ, USA
[3] School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA
[4] Department of Anthropology, University of California – Santa Cruz, Santa Cruz, CA, USA
关键词: Mammals;    Apes;    Next-generation sequencing;    Phylogenetics;   
Others  :  1232174
DOI  :  10.1186/s12859-015-0632-y
 received in 2014-12-09, accepted in 2015-05-29,  发布年份 2015
PDF
【 摘 要 】

Background

Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.

Results

For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.

Conclusions

SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases webcite.

【 授权许可】

   
2015 Schwartz et al.

【 预 览 】
附件列表
Files Size Format View
20151113014706484.pdf 623KB PDF download
Fig. 7. 31KB Image download
Fig. 6. 12KB Image download
Fig. 5. 59KB Image download
Fig. 4. 27KB Image download
Fig. 3. 34KB Image download
Fig. 2. 18KB Image download
Fig. 1. 15KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

【 参考文献 】
  • [1]Giribet G, Edgecombe GD, Wheeler WC. Arthropod phylogeny based on eight molecular loci and morphology. Nature. 2001; 413:157-61.
  • [2]Harpke D, Meng S, Rutten T, Kerndorff H, Blattner FR. Phylogeny of Crocus (Iridaceae) based on one chloroplast and two nuclear loci: ancient hybridization and chromosome number evolution. Mol Phylogenet Evol. 2013; 66:617-27.
  • [3]Stanley EL, Bauer AM, Jackman TR, Branch WR, Le Fras N Mouton P. Between a rock and a hard polytomy: rapid radiation in the rupicolous girdled lizards (Squamata: Cordylidae). Mol Phylogenet Evol. 2011; 58:53-70.
  • [4]Burleigh JG, Bansal MS, Eulenstein O, Hartmann S, Wehe A, Vision TJ. Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. Syst Biol. 2011; 60:117-25.
  • [5]Cohen E, Chor B. Detecting phylogenetic signals in eukaryotic whole genome sequences. J Comput Biol. 2012; 19:945-56.
  • [6]Crawford NG, Faircloth BC, McCormack JE, Brumfield RT, Winker K, Glenn TC. More than 1000 ultraconserved elements provide evidence that turtles are the sister group of archosaurs. Biol Lett. 2012; 8:783-6.
  • [7]Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 2005; 6:361-75.
  • [8]Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC. Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Syst Biol. 2012; 61:717-26.
  • [9]McCormack JE, Faircloth BC, Crawford NG, Gowaty PA, Brumfield RT, Glenn TC. Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with species-tree analysis. Genome Res. 2012; 22:746-54.
  • [10]McCormack JE, Harvey MG, Faircloth BC, Crawford NG, Glenn TC, Brumfield RT. A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. PloS One. 2013; 8:54848.
  • [11]Yoder JB, Briskine R, Mudge J, Farmer A, Paape T, Steele K, Weiblen GD, Bharti AK, Zhou P. Phylogenetic signal variation in the genomes of Medicago (Fabaceae). Syst Biol. 2013; 62:424-38.
  • [12]Fan Y, Huang Z, Cao C, Chen C, Chen Y, Fan D, He J, Hou H, Hu L, Hu X, Jiang X, Lai R, Lang Y, Liang B, Liao S, Mu D, Ma Y, Niu Y, Sun X, Xia J, Xiao J, Xiong Z, Xu L, Yang L, Zhang Y, Zhao W, Zhao X, Zheng Y, Zhou J, Zhu Y, Zhang G, Wang J, Yao Y. Genome of the chinese tree shrew. Nat Commun. 2013; 4:1426.
  • [13]Bertels F, Silander OK, Pachkov M, Rainey PB, Nimwegen van E. Automated reconstruction of whole-genome phylogenies from short-sequence reads. Mol Biol Evol. 2014; 31:1077-1088.
  • [14]O’Neill EM, Schwartz R, Bullock CT, Williams JS, Shaffer HB, Aguilar-Miguel X, Parra-Olea G, Weisrock DW. Parallel tagged amplicon sequencing reveals major lineages and phylogenetic structure in the North American tiger salamander (Ambystoma tigrinum) species complex. Mol Ecol. 2013; 22:111-29.
  • [15]Senn H, Ogden R, Cezard T, Gharbi K, Iqbal Z, Johnson E, Kamps-Hughes N, Rosell F, McEwing R. Reference-free SNP discovery for the Eurasian beaver from restriction site-associated DNA paired-end data. Mol Ecol. 2013; 22:3141-150.
  • [16]Steele PR, Guisinger-Bellian M, Linder CR, Jansen RK. Phylogenetic utility of 141 low-copy nuclear regions in taxa at different taxonomic levels in two distantly related families of rosids. Mol Phylogenet Evol. 2008; 48:1013-26.
  • [17]Lemmon AR, Emme SA, Lemmon EM. Anchored hybrid enrichment for massively high-throughput phylogenomics. Syst Biol. 2012; 61:727-44.
  • [18]Eaton DAR, Ree RH. Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Syst Biol. 2013; 62:689-706.
  • [19]Meredith RW, Janečka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC, Eizirik E, Simão TL, Stadler T. Impacts of the cretaceous terrestrial revolution and KPg extinction on mammal diversification. Science. 2011; 334(6055):521-4.
  • [20]Song S, Liu L, Edwards SV, Wu S. Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc Nat Acad Sci. 2012; 109:14942-7.
  • [21]Morgan CC, Foster PG, Webb AE, Pisani D, McInerney JO, O’Connell MJ. Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol. 2013; 30:2145-156.
  • [22]O’Leary MA, Bloch JI, Flynn JJ, Gaudin TJ, Giallombardo A, Giannini NP, Goldberg SL, Kraatz BP, Luo Z-X, Meng J.. The placental mammal ancestor and the post-K-Pg radiation of placentals. Science. 2013; 339(6120):662-7.
  • [23]Romiguier J, Ranwez V, Delsuc F, Galtier N, Douzery EJP. Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals. Mol Biol Evol. 2013; 30:2134-144.
  • [24]Teeling EC, Hedges SB. Making the impossible possible: Rooting the tree of placental mammals. Mol Biol Evol. 2013; 30:1999-2000.
  • [25]Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, RoyChoudhury A. Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol Biol Evol. 2012; 29:1917-32.
  • [26]Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008; 18:821-9.
  • [27]Vitter JS. Random sampling with a reservoir. ACM T Math Softw. 1985; 11:37-57.
  • [28]Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012; 9:357-9.
  • [29]Philippe H, Brinkmann H, Lavrov D, Littlewood D, Manuel M, Worheide G, Baurain D. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 2011; 9:1000602.
  • [30]Tange O. GNU Parallel - the command-line power tool ;login:. USENIX Mag. 2011; 36:42-7.
  • [31]Grafen A. The phylogenetic regression. Philos Trans R Soc Lond B Biol Sci. 1989; 326:119-157.
  • [32]Cartwright RA. DNA assembly with gaps (Dawg): simulating sequence evolution. Bioinformatics. 2005; 21:31-8.
  • [33]Jukes TH, Cantor CR. Evolution of protein molecules. Mammalian Protein Metabolism. Munro HN, editor. Academic Press, New York; 1969.
  • [34]Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics. 2012; 28:593-4.
  • [35]Perelman P, Johnson WE, Roos C, Seuánez HN, Horvath JE, Moreira MAM, Kessing B, Pontius J, Roelke M, Rumpler Y, Schneider MPC, Silva A, O’Brien SJ, Pecon-Slattery J. A molecular phylogeny of living primates. PLoS Genet. 2011; 7:1001342.
  • [36]A map of human genome variation from population-scale sequencing. Nature. 2010; 467:1061-73.
  • [37]Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006; 22:2688-690.
  • [38]Lewis PO. A likelihood approach to estimating phylogeny from discrete morphological character data. Syst Biol. 2001; 50:913-25.
  文献评价指标  
  下载次数:95次 浏览次数:10次