期刊论文详细信息
BMC Bioinformatics
TreeToReads - a pipeline for simulating raw reads from phylogenies
Software
Steven Davis1  Ruth E. Timme1  Errol Strain1  James Pettengill1  Hugh Rand1  Marc Allard1  Emily Jane McTavish2 
[1] Center for Food Safety and Nutrition, Food and Drug Administration, College Park, MD, USA;University of California, Merced, Merced, CA, USA;University of Kansas, Lawrence, RS, USA;
关键词: Genomics;    Phylogenetics;    Simulation;   
DOI  :  10.1186/s12859-017-1592-1
 received in 2016-06-11, accepted in 2017-03-10,  发布年份 2017
来源: Springer
PDF
【 摘 要 】

BackgroundUsing phylogenomic analysis tools for tracking pathogens has become standard practice in academia, public health agencies, and large industries. Using the same raw read genomic data as input, there are several different approaches being used to infer phylogenetic tree. These include many different SNP pipelines, wgMLST approaches, k-mer algorithms, whole genome alignment and others; each of these has advantages and disadvantages, some have been extensively validated, some are faster, some have higher resolution. A few of these analysis approaches are well-integrated into the regulatory process of US Federal agencies (e.g. the FDA’s SNP pipeline for tracking foodborne pathogens). However, despite extensive validation on benchmark datasets and comparison with other pipelines, we lack methods for fully exploring the effects of multiple parameter values in each pipeline that can potentially have an effect on whether the correct phylogenetic tree is recovered.ResultsTo resolve this problem, we offer a program, TreeToReads, which can generate raw read data from mutated genomes simulated under a known phylogeny. This simulation pipeline allows direct comparisons of simulated and observed data in a controlled environment. At each step of these simulations, researchers can vary parameters of interest (e.g., input tree topology, amount of sequence divergence, rate of indels, read coverage, distance of reference genome, etc) to assess the effects of various parameter values on correctly calling SNPs and reconstructing an accurate tree.ConclusionsSuch critical assessments of the accuracy and robustness of analytical pipelines are essential to progress in both research and applied settings.

【 授权许可】

CC BY   
© The Author(s) 2017

【 预 览 】
附件列表
Files Size Format View
RO202311098517320ZK.pdf 832KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  文献评价指标  
  下载次数:5次 浏览次数:0次