期刊论文详细信息
G3: Genes, Genomes, Genetics
Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty
Naoya Hosono3  Jonathan Sebat1  Seungtai Yoon4  Tatsuhiko Tsunoda3  Michael Q. Zhang2  Anthony Leotta4  Mamoru Kato4 
[1] Department of Psychiatry, University of California, San Diego, La Jolla, CA 92093Department of Psychiatry, University of California, San Diego, La Jolla, CA 92093Department of Psychiatry, University of California, San Diego, La Jolla, CA 92093;Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, 100084, ChiCold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, 100084, ChiBioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, 100084, ChiCold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing, 100084, Chi;Center for Genomic Medicine, RIKEN, Yokohama, Kanagawa 230-0045, JapCenter for Genomic Medicine, RIKEN, Yokohama, Kanagawa 230-0045, JapCenter for Genomic Medicine, RIKEN, Yokohama, Kanagawa 230-0045, Jap;Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724
关键词: copy number variation;    EM algorithm;    haplotype inference;    phasing;   
DOI  :  10.1534/g3.111.000174
学科分类:生物科学(综合)
来源: Genetics Society of America
PDF
【 摘 要 】

Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals’ diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1–2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12–18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs.

【 授权许可】

Unknown   

【 预 览 】
附件列表
Files Size Format View
RO201912010200276ZK.pdf 919KB PDF download
  文献评价指标  
  下载次数:5次 浏览次数:16次