期刊论文详细信息
BMC Bioinformatics
De novo assembly of highly polymorphic metagenomic data using in situ generated reference sequences and a novel BLAST-based assembly pipeline
Methodology Article
Chia-Hung Hsieh1  Jiun-Hong Chen2  You-Yu Lin3  Pei-Jer Chen4  Jia-Horng Kao4  Ding-Shinn Chen5  Hurng-Yi Wang6  Xuemei Lu7 
[1] Department of Forestry and Nature Conservation, Chinese Culture University, 111, Taipei, Taiwan;Department of Life Science, National Taiwan University, 106, Taipei, Taiwan;Department of Life Science, National Taiwan University, 106, Taipei, Taiwan;Graduate Institute of Clinical Medicine, National Taiwan University, 100, Taipei, Taiwan;Graduate Institute of Clinical Medicine, National Taiwan University, 100, Taipei, Taiwan;Graduate Institute of Clinical Medicine, National Taiwan University, 100, Taipei, Taiwan;Genomics Research Center, Academia Sinica, 115, Taipei, Taiwan;Graduate Institute of Clinical Medicine, National Taiwan University, 100, Taipei, Taiwan;Institute of Ecology and Evolutionary Biology, National Taiwan University, 106, Taipei, Taiwan;Research Center for Developmental Biology and Regenerative Medicine, National Taiwan University, 100, Taipei, Taiwan;Laboratory of Disease Genomics and Individualized Medicine, Beijing Institute of Genomics, the Chinese Academy of Sciences, 100101, Beijing, China;
关键词: Next generation sequencing;    Metagenomics;    Hepatitis B virus;    Sequence assembly;    Assembly pipeline;   
DOI  :  10.1186/s12859-017-1630-z
 received in 2016-08-31, accepted in 2017-04-12,  发布年份 2017
来源: Springer
PDF
【 摘 要 】

BackgroundThe accuracy of metagenomic assembly is usually compromised by high levels of polymorphism due to divergent reads from the same genomic region recognized as different loci when sequenced and assembled together. A viral quasispecies is a group of abundant and diversified genetically related viruses found in a single carrier. Current mainstream assembly methods, such as Velvet and SOAPdenovo, were not originally intended for the assembly of such metagenomics data, and therefore demands for new methods to provide accurate and informative assembly results for metagenomic data.ResultsIn this study, we present a hybrid method for assembling highly polymorphic data combining the partial de novo-reference assembly (PDR) strategy and the BLAST-based assembly pipeline (BBAP). The PDR strategy generates in situ reference sequences through de novo assembly of a randomly extracted partial data set which is subsequently used for the reference assembly for the full data set. BBAP employs a greedy algorithm to assemble polymorphic reads. We used 12 hepatitis B virus quasispecies NGS data sets from a previous study to assess and compare the performance of both PDR and BBAP. Analyses suggest the high polymorphism of a full metagenomic data set leads to fragmentized de novo assembly results, whereas the biased or limited representation of external reference sequences included fewer reads into the assembly with lower assembly accuracy and variation sensitivity. In comparison, the PDR generated in situ reference sequence incorporated more reads into the final PDR assembly of the full metagenomics data set along with greater accuracy and higher variation sensitivity. BBAP assembly results also suggest higher assembly efficiency and accuracy compared to other assembly methods. Additionally, BBAP assembly recovered HBV structural variants that were not observed amongst assembly results of other methods. Together, PDR/BBAP assembly results were significantly better than other compared methods.ConclusionsBoth PDR and BBAP independently increased the assembly efficiency and accuracy of highly polymorphic data, and assembly performances were further improved when used together. BBAP also provides nucleotide frequency information. Together, PDR and BBAP provide powerful tools for metagenomic data studies.

【 授权许可】

CC BY   
© The Author(s). 2017

【 预 览 】
附件列表
Files Size Format View
RO202311101178358ZK.pdf 811KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  文献评价指标  
  下载次数:1次 浏览次数:0次