期刊论文详细信息
G3: Genes, Genomes, Genetics
BGData - A Suite of R Packages for Genomic Analysis with Big Data
article
Alexander Grueneberg1  Gustavo de los Campos1 
[1] Department of Epidemiology and Biostatistics,Michigan State University, East Lansing, MI 48824;Institute for Quantitative Health Science and Engineering,Michigan State University, East Lansing, MI 48824;Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824
关键词: big data;    parallel computing;    distributed computing;    genetic analyses;    biobank;   
DOI  :  10.1534/g3.119.400018
学科分类:社会科学、人文和艺术(综合)
来源: Genetics Society of America
PDF
【 摘 要 】

We created a suite of packages to enable analysis of extremely large genomic data sets (potentially millions of individuals and millions of molecular markers) within the R environment. The package offers: a matrix-like interface for .bed files (PLINK’s binary format for genotype data), a novel class of linked arrays that allows linking data stored in multiple files to form a single array accessible from the R computing environment, methods for parallel computing capabilities that can carry out computations on very large data sets without loading the entire data into memory and a basic set of methods for statistical genetic analyses. The package is accessible through CRAN and GitHub. In this note, we describe the classes and methods implemented in each of the packages that make the suite and illustrate the use of the packages using data from the UK Biobank.

【 授权许可】

CC BY|CC BY-NC   

【 预 览 】
附件列表
Files Size Format View
RO201907120006547ZK.pdf 1199KB PDF download
  文献评价指标  
  下载次数:10次 浏览次数:1次