期刊论文详细信息
BMC Bioinformatics
BAMQL: a query language for extracting reads from BAM files
Software
Michael Fraser1  Robert G. Bristow2  Christopher M. Lalansingh3  Andre P. Masella3  Pragash Sivasundaram3  Paul C. Boutros4 
[1] Ontario Cancer Institute, Princess Margaret Cancer Centre/University Health Network, Toronto, Canada;Ontario Cancer Institute, Princess Margaret Cancer Centre/University Health Network, Toronto, Canada;Department of Medical Biophysics, University of Toronto, Toronto, Canada;Department of Radiation Oncology, University of Toronto, Toronto, Canada;Ontario Institute for Cancer Research, Suite 510, 661 University Avenue, M5G 0A3, Toronto, Canada;Ontario Institute for Cancer Research, Suite 510, 661 University Avenue, M5G 0A3, Toronto, Canada;Department of Pharmacology & Toxicology, University of Toronto, Toronto, Canada;Department of Medical Biophysics, University of Toronto, Toronto, Canada;
关键词: BAMQL;    Query language;    BAM-format;   
DOI  :  10.1186/s12859-016-1162-y
 received in 2016-04-08, accepted in 2016-07-21,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundIt is extremely common to need to select a subset of reads from a BAM file based on their specific properties. Typically, a user unpacks the BAM file to a text stream using SAMtools, parses and filters the lines using AWK, then repacks them using SAMtools. This process is tedious and error-prone. In particular, when working with many columns of data, mix-ups are common and the bit field containing the flags is unintuitive. There are several libraries for reading BAM files, such as Bio-SamTools for Perl and pysam for Python. Both allow access to the BAM’s read information and can filter reads, but require substantial boilerplate code; this is high overhead for mostly ad hoc filtering.ResultsWe have created a query language that gathers reads using a collection of predicates and common logical connectives. Queries run faster than equivalents and can be compiled to native code for embedding in larger programs.ConclusionsBAMQL provides a user-friendly, powerful and performant way to extract subsets of BAM files for ad hoc analyses or integration into applications. The query language provides a collection of predicates beyond those in SAMtools, and more flexible connectives.

【 授权许可】

CC BY   
© The Author(s) 2016

【 预 览 】
附件列表
Files Size Format View
RO202311099493190ZK.pdf 400KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  文献评价指标  
  下载次数:11次 浏览次数:1次