期刊论文详细信息
G3: Genes, Genomes, Genetics
OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices
article
Zhi Xiong1  Qingrun Zhang2  Alexander Platt4  Wenyuan Liao5  Xinghua Shi6  Gustavo de los Campos7  Quan Long2 
[1] Department of Computer Science, Shantou University, China;Department of Biochemistry and Molecular Biology,University of Calgary, Canada;Annie Charbonneau Cancer Institute,University of Calgary, Canada;Department of Mathematics and Statistics,University of Calgary, Canada;Department of Medical Genetics,University of Calgary, Canada;Alberta Children’s Hospital Research Institute, University of Calgary, Canada;Center for Computational Genetics and Genomics, Temple University, USA;Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, USA;Department of Epidemiology & Biostatistics, Statistics & Probability and Institute for Quantitative Health Science and Engineering, Michigan State University, USA
关键词: Eigen decomposition;    Singular value decomposition;    Genetic matrices;    Memory virtualization;    Gene mapping;    Genotype-based phenotype prediction;    Genomic selection;   
DOI  :  10.1534/g3.118.200908
学科分类:社会科学、人文和艺术(综合)
来源: Genetics Society of America
PDF
【 摘 要 】

Matrices representing genetic relatedness among individuals ( i.e. , Genomic Relationship Matrices, GRMs) play a central role in genetic analysis. The eigen-decomposition of GRMs (or its alternative that generates fewer top singular values using genotype matrices) is a necessary step for many analyses including estimation of SNP-heritability, Principal Component Analysis (PCA), and genomic prediction. However, the GRMs and genotype matrices provided by modern biobanks are too large to be stored in active memory. To accommodate the current and future “bigger-data”, we develop a disk-based tool, Out-of-Core Matrices Analyzer (OCMA), using state-of-the-art computational techniques that can nimbly perform eigen and Singular Value Decomposition (SVD) analyses. By integrating memory mapping (mmap) and the latest matrix factorization libraries, our tool is fast and memory-efficient. To demonstrate the impressive performance of OCMA, we test it on a personal computer. For full eigen-decomposition, it solves an ordinary GRM ( N = 10,000) in 55 sec. For SVD, a commonly used faster alternative of full eigen-decomposition in genomic analyses, OCMA solves the top 200 singular values (SVs) in half an hour, top 2,000 SVs in 0.95 hr, and all 5,000 SVs in 1.77 hr based on a very large genotype matrix ( N = 1,000,000, M = 5,000) on the same personal computer. OCMA also supports multi-threading when running in a desktop or HPC cluster. Our OCMA tool can thus alleviate the computing bottleneck of classical analyses on large genomic matrices, and make it possible to scale up current and emerging analytical methods to big genomics data using lightweight computing resources.

【 授权许可】

CC BY|CC BY-NC   

【 预 览 】
附件列表
Files Size Format View
RO201907120006425ZK.pdf 718KB PDF download
  文献评价指标  
  下载次数:9次 浏览次数:0次