期刊论文详细信息
Source Code for Biology and Medicine
Mega2: validated data-reformatting for linkage and association analyses
Daniel E Weeks3  Nandita Mukhopadhyay1  Charles Kollar2  Robert V Baron2 
[1] Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh 15261, PA, USA;Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh 15261, PA, USA;Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh 15261, PA, USA
关键词: Data management;    Human Genetics;    Association;    Linkage;    Software;   
Others  :  1139271
DOI  :  10.1186/s13029-014-0026-y
 received in 2014-07-18, accepted in 2014-11-14,  发布年份 2014
PDF
【 摘 要 】

Background

In a typical study of the genetics of a complex human disease, many different analysis programs are used, to test for linkage and association. This requires extensive and careful data reformatting, as many of these analysis programs use differing input formats. Writing scripts to facilitate this can be tedious, time-consuming, and error-prone. To address these issues, the open source Mega2 data reformatting program provides validated and tested data conversions from several commonly-used input formats to many output formats.

Results

Mega2, the Manipulation Environment for Genetic Analysis, facilitates the creation of analysis-ready datasets from data gathered as part of a genetic study. It transparently allows users to process genetic data for family-based or case/control studies accurately and efficiently. In addition to data validation checks, Mega2 provides analysis setup capabilities for a broad choice of commonly-used genetic analysis programs. First released in 2000, Mega2 has recently been significantly improved in a number of ways. We have rewritten it in C++ and have reduced its memory requirements. Mega2 now can read input files in LINKAGE, PLINK, and VCF/BCF formats, as well as its own specialized annotated format. It supports conversion to many commonly-used formats including SOLAR, PLINK, Merlin, Mendel, SimWalk2, Cranefoot, IQLS, FBAT, MORGAN, BEAGLE, Eigenstrat, Structure, and PLINK/SEQ. When controlled by a batch file, Mega2 can be used non-interactively in data reformatting pipelines. Support for genetic data from several other species besides humans has been added.

Conclusions

By providing tested and validated data reformatting, Mega2 facilitates more accurate and extensive analyses of genetic data, avoiding the need to write, debug, and maintain one’s own custom data reformatting scripts.

Mega2 is freely available at https://watson.hgen.pitt.edu/register/ webcite.

【 授权许可】

   
2014 Baron et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150321091538489.pdf 751KB PDF download
Figure 1. 80KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Wigginton JE, Abecasis GR: PEDSTATS: descriptive statistics, graphics and quality assessment for gene mapping data. Bioinformatics 2005, 21(16):3445-3447.
  • [2]Sun L, Wilder K, McPeek MS: Enhanced pedigree error detection. Hum Hered 2002, 54(2):99-110.
  • [3]McPeek MS, Sun L: Statistical tests for detection of misspecified relationships by use of genome-screen data. Am J Hum Genet 2000, 66(3):1076-1094.
  • [4]Almasy L, Blangero J: Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet 1998, 62(5):1198-1211.
  • [5]Blangero J, Almasy L: Multipoint oligogenic linkage analysis of quantitative traits. Genet Epidemiol 1997, 14(6):959-964.
  • [6]Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel EM: Mendel: the Swiss army knife of genetic analysis programs. Bioinformatics 2013, 29(12):1568-1570.
  • [7]Lange K, Cantor R, Horvath S, Perola M, Sabatti C, Sinsheimer J, Sobel E: MENDEL version 4.0: A complete package for the exact genetic analysis of discrete traits in pedigree and population data sets. Am J Hum Genet 2001, 69(Suppl):504.
  • [8]Lange K, Weeks D, Boehnke M: Programs for pedigree analysis: MENDEL, FISHER, and dGENE. Genet Epidemiol 1988, 5:471-472.
  • [9]Mukhopadhyay N, Almasy L, Schroeder M, Mulvihill WP, Weeks DE: Mega2: data-handling for facilitating genetic linkage and association analyses. Bioinformatics 2005, 21(10):2556-2557.
  • [10]Mukhopadhyay N, Almasy L, Schroeder M, Mulvihill WP, Weeks DE: Mega2, a data-handling program for facilitating genetic linkage and association analyses. Am J Hum Genet 1999, 65:A436.
  • [11]Lathrop GM, Lalouel J-M: Easy calculations of lod scores and genetic risks on small computers. Am J Hum Genet 1984, 36:460-465.
  • [12]Lathrop GM, Lalouel JM, Julier C, Ott J: Strategies for multilocus linkage analysis in humans. Proc Natl Acad Sci U S A 1984, 81:3443-3446.
  • [13]Lathrop GM, Lalouel JM: Efficient computations in multilocus linkage analysis. Am J Hum Genet 1988, 42:498-505.
  • [14]Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007, 81(3):559-575.
  • [15]Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R: Genomes Project Analysis G: The variant call format and VCFtools. Bioinformatics 2011, 27(15):2156-2158.
  • [16]Makinen VP, Parkkonen M, Wessman M, Groop PH, Kanninen T, Kaski K: High-throughput pedigree drawing. Eur J Hum Genet 2005, 13(8):987-989.
  • [17]Wang Z, McPeek MS: An incomplete-data quasi-likelihood approach to haplotype-based genetic association studies on related Individuals. J Am Stat Assoc 2009, 104(487):1251-1260.
  • [18]Abney MA, Ober C, McPeek MS: Homozygosity mapping of quantitative trait loci in complex inbred pedigrees. Am J Hum Genet 2000, 67(Suppl 2):327.
  • [19]Wang Z, McPeek MS: ATRIUM: testing untyped SNPs in case-control association studies with related individuals. Am J Hum Genet 2009, 85(5):667-678.
  • [20]Laird NM, Horvath S, Xu X: Implementing a unified approach to family-based tests of association. Genet Epidemiol 2000, 19(Suppl 1):S36-42.
  • [21]Thompson EA: Statistical inference from genetic data on pedigrees, vol. 6. Institute of Mathematical Sciences and the American Statistical Association, Beechwood, OH; 2000.
  • [22]Browning BL, Browning SR: Efficient multilocus association testing for whole genome association studies using localized haplotype clustering. Genet Epidemiol 2007, 31(5):365-375.
  • [23]Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006, 38(8):904-909.
  • [24]Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet 2006, 2(12):e190.
  • [25]Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics 2000, 155(2):945-959.
  • [26]Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 2003, 164(4):1567-1587.
  • [27]PLINK/SEQ: A library for the analysis of genetic variation data; [http://atgu.mgh.harvard.edu/plinkseq/]
  • [28]Sobel E, Lange K: Descent graphs in pedigree analysis: Applications to haplotyping, location scores, and marker-sharing statistics. Am J Hum Genet 1996, 58(6):1323-1337.
  • [29]O’Connell JR, Weeks DE: The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype set-recoding and fuzzy inheritance. Nat Genet 1995, 11:402-408.
  • [30]Lemire M: SUP: an extension to SLINK to allow a larger number of marker loci to be simulated in pedigrees conditional on trait values. BMC Genet 2006, 7:40. BioMed Central Full Text
  • [31]Schäffer AA, Lemire M, Ott J, Lathrop GM, Weeks DE: Coordinated conditional simulation with SLINK and SUP of many markers linked or associated to a trait in large pedigrees. Hum Hered 2011, 71(2):126-134.
  • [32]Kong A, Cox NJ: Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet 1997, 61(5):1179-1188.
  • [33]Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES: Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 1996, 58:1347-1363.
  • [34]Kruglyak L, Lander ES: Faster multipoint linkage analysis using Fourier transforms. J Comput Biol 1998, 5(1):1-7.
  • [35]Gudbjartsson DF, Jonasson K, Frigge ML, Kong A: Allegro, a new computer program for multipoint linkage analysis. Nat Genet 2000, 25(1):12-13.
  • [36]Abney M, McPeek MS, Ober C: Estimation of variance components of quantitative traits in inbred populations. Am J Hum Genet 2000, 66(2):629-650.
  • [37]Alcais A, Abel L: Maximum-Likelihood-Binomial method for genetic model-free linkage analysis of quantitative traits in sibships. Genet Epidemiol 1999, 17(2):102-117.
  • [38]Weeks DE, Ott J, Lathrop GM: SLINK: a general simulation program for linkage analysis. Am J Hum Genet 1990, 47(3):A204.
  • [39]S.A.G.E: Statistical Analysis for Genetic Epidemiology; [http://darwin.cwru.edu/sage/]
  • [40]Holmans P: Asymptotic properties of affected-sib-pair linkage analysis. Am J Hum Genet 1993, 52(2):362-374.
  • [41]Browning BL, Browning SR: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 2009, 84(2):210-223.
  • [42]Browning SR, Browning BL: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 2007, 81(5):1084-1097.
  • [43]Browning SR, Briley JD, Briley LP, Chandra G, Charnecki JH, Ehm MG, Johansson KA, Jones BJ, Karter AJ, Yarnall DP, Wagner MJ: Case-control single-marker and haplotypic association analysis of pedigree data. Genet Epidemiol 2005, 28(2):110-122.
  • [44]Terwilliger JD, Speer M, Ott J: Chromosome-based method for rapid computer simulation in human genetic linkage analysis. Genet Epidemiol 1993, 10(4):217-224.
  • [45]Hasstedt SJ: jPAP: Document-driven software for genetic analysis. Genet Epidemiol 2005, 29:255.
  • [46]PAP: Pedigree Analysis Software; [http://hasstedt.genetics.utah.edu/]
  • [47]Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 2002, 30(1):97-101.
  • [48]Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol Ecol Notes 2007, 7(4):574-578.
  • [49]Heath SC: Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am J Hum Genet 1997, 61(3):748-760.
  • [50]Manoukis NC: FORMATOMATIC: a program for converting diploid allelic data between common formats for population genetic analysis. Mol Ecol Notes 2007, 7(4):592-593.
  • [51]Coombs JA, Letcher BH, Nislow KH: CREATE: a software to create input files from diploid genotypic data for 52 genetic software programs. Mol Ecol Resour 2008, 8(3):578-580.
  • [52]Glaubitz JC: CONVERT: A user-friendly program to reformat diploid genotypic data for commonly used population genetic software packages. Mol Ecol Notes 2004, 4(2):309-310.
  • [53]Gillanders EM, Masiello A, Gildea D, Umayam L, Duggal P, Jones MP, Klein AP, Freas-Lutz D, Ibay G, Trout K, Wolfsberg TG, Trent JM, Bailey-Wilson JE, Baxevanis AD: GeneLink: a database to facilitate genetic studies of complex traits. BMC Genomics 2004, 5(1):81. BioMed Central Full Text
  • [54]Lathrop GM, Lalouel JM, Julier C, Ott J: Multilocus linkage analysis in humans: detection of linkage and estimation of recombination. Am J Hum Genet 1985, 37(3):482-498.
  • [55]GAS: Genetic Analysis System; [http://users.ox.ac.uk/~ayoung/gas.html]
  • [56]Epstein MP, Duren WL, Boehnke M: Improved inference of relationship for pairs of individuals. Am J Hum Genet 2000, 67(5):1219-1231.
  • [57]Boehnke M, Cox NJ: Accurate inference of relationships in sib-pair linkage studies. Am J Hum Genet 1997, 61(2):423-429.
  • [58]Fiddy S, Cattermole D, Xie D, Duan XY, Mott R: An integrated system for genetic analysis. BMC Bioinformatics 2006, 7:210. BioMed Central Full Text
  • [59]Abecasis GR, Cardon LR, Cookson WO: A general test of association for quantitative traits in nuclear families. Am J Hum Genet 2000, 66(1):279-292.
  • [60]Clayton D: A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission. Am J Hum Genet 1999, 65(4):1170-1177.
  • [61]ᅟ: SIB-PAIR; [http://genepi.qimr.edu.au/staff/davidD/]
  • [62]fcGENE: Genotype format converter; [http://sourceforge.net/projects/fcgene/]
  • [63]Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR: MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 2010, 34(8):816-834.
  • [64]Marchini J, Howie B: Genotype imputation for genome-wide association studies. Nat Rev Genet 2010, 11(7):499-511.
  • [65]Servin B, Stephens M: Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet 2007, 3(7):e114.
  • [66]Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005, 21(2):263-265.
  • [67]Aulchenko YS, Ripke S, Isaacs A, van Duijn CM: GenABEL: an R library for genome-wide association analysis. Bioinformatics 2007, 23(10):1294-1296.
  • [68]Ruschendorf F, Nurnberg P: ALOHOMORA: a tool for linkage analysis using 10K SNP array data. Bioinformatics 2005, 21(9):2123-2125.
  文献评价指标  
  下载次数:9次 浏览次数:14次