期刊论文详细信息
G3: Genes, Genomes, Genetics
Fast Ordered Sampling of DNA Sequence Variants
Anthony J. Greenberg^11 
[1] Bayesic Research, Ithaca, NY 14850^1
关键词: nucleotide polymorphism;    random sampling;    statistical genetics;    genomics;    C++;   
DOI  :  10.1534/g3.117.300465
学科分类:生物科学(综合)
来源: Genetics Society of America
PDF
【 摘 要 】

Explosive growth in the amount of genomic data is matched by increasing power of consumer-grade computers. Even applications that require powerful servers can be quickly tested on desktop or laptop machines if we can generate representative samples from large data sets. I describe a fast and memory-efficient implementation of an on-line sampling method developed for tape drives 30 years ago. Focusing on genotype files, I test the performance of this technique on modern solid-state and spinning hard drives, and show that it performs well compared to a simple sampling scheme. I illustrate its utility by developing a method to quickly estimate genome-wide patterns of linkage disequilibrium (LD) decay with distance. I provide open-source software that samples loci from several variant format files, a separate program that performs LD decay estimates, and a C++ library that lets developers incorporate these methods into their own projects.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO201910284377111ZK.pdf 929KB PDF download
  文献评价指标  
  下载次数:9次 浏览次数:14次