期刊论文详细信息
BMC Bioinformatics
FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data
Software
Douglas Easton1  Joe Dennis1  James E. Dinulos2  Ivan Gorlov2  Jinyoung Byun2  Xiangjun Xiao2  Olivier Cornelis2  Yafang Li2  Christopher I. Amos2  Younghun Han2  Guoshuai Cai3  Michael F. Seldin4 
[1] Centre for Cancer Genetic Epidemiology, Cambridge University, Cambridge, UK;Department of Biomedical Data Science, Dartmouth College, Hanover, NH, USA;Department of Genetics, Dartmouth College, Hanover, NH, USA;Rowe Program in Genetics, U.C. Davis, Davis, CA, USA;
关键词: Population structure;    Principal component;    Ancestry;    Genome-wide association study;   
DOI  :  10.1186/s12859-016-0965-1
 received in 2015-07-14, accepted in 2016-02-22,  发布年份 2016
来源: Springer
PDF
【 摘 要 】

BackgroundIdentifying subpopulations within a study and inferring intercontinental ancestry of the samples are important steps in genome wide association studies. Two software packages are widely used in analysis of substructure: Structure and Eigenstrat. Structure assigns each individual to a population by using a Bayesian method with multiple tuning parameters. It requires considerable computational time when dealing with thousands of samples and lacks the ability to create scores that could be used as covariates. Eigenstrat uses a principal component analysis method to model all sources of sampling variation. However, it does not readily provide information directly relevant to ancestral origin; the eigenvectors generated by Eigenstrat are sample specific and thus cannot be generalized to other individuals.ResultsWe developed FastPop, an efficient R package that fills the gap between Structure and Eigenstrat. It can: 1, generate PCA scores that identify ancestral origins and can be used for multiple studies; 2, infer ancestry information for data arising from two or more intercontinental origins. We demonstrate the use of FastPop using 2318 SNP markers selected from the genome based on high variability among European, Asian and West African (African) populations. We conducted an analysis of 505 Hapmap samples with European, African or Asian ancestry along with 19661 additional samples of unknown ancestry. The results from FastPop are highly consistent with those obtained by Structure across the 19661 samples we studied. The correlations of the results between FastPop and Structure are 0.99, 0.97 and 0.99 for European, African and Asian ancestry scores, respectively. Compared with Structure, FastPop is more efficient as it finished ancestry inference for 19661 samples in 16 min compared with 21–24 h required by Structure. FastPop also provided scores based on SNP weights so the scores of reference population can be applied to other studies provided the same set of markers are used. We also present application of the method for studying four continental populations (European, Asian, African, and Native American).ConclusionsWe developed an algorithm that can infer ancestries on data involving two or more intercontinental origins. It is efficient for analyzing large datasets. Additionally the PCA derived scores can be applied to multiple data sets to ensure the same ancestry analysis is applied to all studies.

【 授权许可】

CC BY   
© Li et al. 2016

【 预 览 】
附件列表
Files Size Format View
RO202311095286141ZK.pdf 958KB PDF download
12864_2017_4186_Article_IEq18.gif 1KB Image download
12864_2017_3670_Article_IEq8.gif 1KB Image download
12864_2017_4004_Article_IEq4.gif 1KB Image download
12864_2017_4004_Article_IEq5.gif 1KB Image download
12880_2015_Article_74_TeX2GIF_IEq3.gif 1KB Image download
12864_2015_2170_Article_IEq3.gif 2KB Image download
12711_2017_362_Article_IEq93.gif 1KB Image download
【 图 表 】

12711_2017_362_Article_IEq93.gif

12864_2015_2170_Article_IEq3.gif

12880_2015_Article_74_TeX2GIF_IEq3.gif

12864_2017_4004_Article_IEq5.gif

12864_2017_4004_Article_IEq4.gif

12864_2017_3670_Article_IEq8.gif

12864_2017_4186_Article_IEq18.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  文献评价指标  
  下载次数:9次 浏览次数:1次