期刊论文详细信息
BMC Bioinformatics
sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs
  1    2    2    3 
[1] Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 60, Murray Street, M5T 3L9, Toronto, ON, Canada;Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 60, Murray Street, M5T 3L9, Toronto, ON, Canada;0000 0001 2157 2938, grid.17063.33, Dalla Lana School of Public Health, University of Toronto, M5T 3L9, Toronto, Canada;Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 60, Murray Street, M5T 3L9, Toronto, ON, Canada;0000 0001 2157 2938, grid.17063.33, Department of Statistical Sciences, University of Toronto, M5S 3G3, Toronto, Canada;
关键词: Simulation;    Sequencing;    NGS;    1000 genomes;    Linkage disequilibrium;    Pedigree data;   
DOI  :  10.1186/s12859-019-2611-1
来源: publisher
PDF
【 摘 要 】

BackgroundSimulation of genetic variants data is frequently required for the evaluation of statistical methods in the fields of human and animal genetics. Although a number of high-quality genetic simulators have been developed, many of them require advanced knowledge in population genetics or in computation to be used effectively. In addition, generating simulated data in the context of family-based studies demands sophisticated methods and advanced computer programming.ResultsTo address these issues, we propose a new user-friendly and integrated R package, sim1000G, which simulates variants in genomic regions among unrelated individuals or among families. The only input needed is a raw phased Variant Call Format (VCF) file. Haplotypes are extracted to compute linkage disequilibrium (LD) in the simulated genomic regions and for the generation of new genotype data among unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Pedigrees of arbitrary sizes are generated by modeling recombination events with sim1000G. To illustrate the application of sim1000G, various scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation pedigree data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need of any tuning parameters.ConclusionSim1000G fills a gap in the vast area of genetic variants simulators by its simplicity and independence from external tools. Currently, it is one of the few simulation packages completely integrated into R and able to simulate multiple genetic variants among unrelated individuals and within families. Its implementation will facilitate the application and development of computational methods for association studies with both rare and common variants.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO201909247096983ZK.pdf 1034KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:15次