BMC Genomics
VAS: a convenient web portal for efficient integration of genomic features with millions of genetic variants
Kevin Y Yip1  Sau Dan Lee2  Qin Cao2  Eric Dun Ho2 
[1] CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
关键词: Data integration;    Genomic studies;    Genetic variants;    Annotation;   
Others  :  1128470
DOI  :  10.1186/1471-2164-15-886
 received in 2014-07-15, accepted in 2014-10-03,  发布年份 2014
【 摘 要 】


High-throughput experimental methods have fostered the systematic detection of millions of genetic variants from any human genome. To help explore the potential biological implications of these genetic variants, software tools have been previously developed for integrating various types of information about these genomic regions from multiple data sources. Most of these tools were designed either for studying a small number of variants at a time, or for local execution on powerful machines.


To make exploration of whole lists of genetic variants simple and accessible, we have developed a new Web-based system called VAS (Variant Annotation System, available athttps://yiplab.cse.cuhk.edu.hk/vas/ webcite). It provides a large variety of information useful for studying both coding and non-coding variants, including whole-genome transcription factor binding, open chromatin and transcription data from the ENCODE consortium. By means of data compression, millions of variants can be uploaded from a client machine to the server in less than 50 megabytes of data. On the server side, our customized data integration algorithms can efficiently link millions of variants with tens of whole-genome datasets. These two enabling technologies make VAS a practical tool for annotating genetic variants from large genomic studies. We demonstrate the use of VAS in annotating genetic variants obtained from a migraine meta-analysis study and multiple data sets from the Personal Genomes Project. We also compare the running time of annotating 6.4 million SNPs of the CEU trio by VAS and another tool, showing that VAS is efficient in handling new variant lists without requiring any pre-computations.


VAS is specially designed to handle annotation tasks with long lists of genetic variants and large numbers of annotating features efficiently. It is complementary to other existing tools with more specific aims such as evaluating the potential impacts of genetic variants in terms of disease risk. We recommend using VAS for a quick first-pass identification of potentially interesting genetic variants, to minimize the time required for other more in-depth downstream analyses.

【 授权许可】

2014 Ho et al.; licensee BioMed Central Ltd.

【 预 览 】
Files Size Format View
20150223211911898.pdf 1102KB PDF download
Figure 5. 130KB Image download
Figure 4. 118KB Image download
Figure 3. 76KB Image download
Figure 2. 28KB Image download
Figure 1. 90KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR: A method and server for predicting damaging missense mutations. Nat Methods 2010, 7(4):248-249.
  • [2]Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Ruden DM, Lu X: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of drosophila melanogaster strain w1118; iso-2; iso-3. Fly 2012, 6(2):80-92.
  • [3]Kumar P, Henikoff S, C NP: Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 2009, 4(7):1073-1081.
  • [4]McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F: Deriving the consequences of genomic variants with the Ensembl API and SNP effect predictor. Bioinformatics 2010, 26(16):2069-2070.
  • [5]Schaefer C, Meier A, Rost B, Bromberg Y: SNPdbe: constructing and nsSNP functional impacts database. Bioinformatics 2011, 28(4):601-602.
  • [6]Wang K, Li M, Hakonarson H: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research 2010, 38:e164.
  • [7]Cooper GM, Shendure J: Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 2011, 12(9):628-640.
  • [8]ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489(7414):57-74.
  • [9]Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z, Lee LK, Stuart RK, Ching CW, Ching KA, Antosiewicz-Bourget JE, Liu H, Zhang X, Green RD, Lobanenkov VV, Stewart R, Thomson JA, Crawford GE, Kellis M, Ren B: Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 2009, 459(7243):108-112.
  • [10]Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, Farnham PJ, Hirst M, Lander ES, Mikkelsen TS, Thomson JA: The NIH Roadmap epigenomics mapping consortium. Nat Biotechnol 2010, 28(10):1045-1048.
  • [11]Barrenboim M, Manke T: ChroMoS: an integrated web tool for SNP, classification, prioritization and functional interpretation. Bioinformatics 2013, 29(17):2197-2198.
  • [12]Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S, Cherry JM, Snyder M: Annotation of functional variation in personal genomes using RegulomeDB. Genome Res 2012, 22:1790-1797.
  • [13]Cheng YC, Hsiao FC, Yeh EC, Lin WJ, Tang CYL, Tseng HC, Wu HT, Liu CK, Chen CC, Chen YT, Yao A: VarioWatch: providing large-scale and comprehensive annotations on human genomic variants in the next generation sequencing era. Nucleic Acids Res 2012, 40:W76-W81.
  • [14]Habegger L, Balasubramanian S, Chen DZ, Khurana E, Sboner A, Harmanci A, Rozowsky J, Clarke D, Snyder M, Gerstein M: VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment. Bioinformatics 2010, 28(17):2267-2269.
  • [15]Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A, Lochovsky L, Chen J, Harmanci A, Das J, Abyzov A, Balasubramanian S, Beal K, Chakravarty D, Challis D, Chen Y, Clarke D, Clarke L, Cunningham F, Evani US, Flicek P, Fragoza R, Garrison E, Gibbs R, Gümüş ZH, Herrero J, Kitabayashi N, Kong Y, Lage K, et al.: Integrative annotation of variants from 1092 humansapplication to Cancer Genomics. Science 2013, 342(6154):1235587.
  • [16]Kircher M, Witten DM, Jain P, R’Roak BJ, Cooper GM, Shendure J: A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014, 46(3):310-315.
  • [17]Li MJ, Wang P, Liu X, Lim EL, Wang Z, Yeager M, Wong MP, Sham PC, Chanock SJ, Wang J: GWASdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res 2011, 40:D1047-D1054.
  • [18]Paila U, Chapman BA, Kirchner R, Quinlan AR: GEMINI: integrative exploration of genetic variation and genome annotations. PLOS Comput Biol 2013, 9(7):e1003153.
  • [19]Ritchie GRS, Dunham I, Zeggini E, Flicek P: Functional annotation of noncoding sequence variants. Nat Methods 2014, 11(3):294-296.
  • [20]Ward LD, Kellis M: HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res 2012, 40:D930-D934.
  • [21]Contrino S, Smith RN, Butano D, Carr A, Hu F, Lyne R, Rutherford K, Kalderimis A, Sullivan J, Carbon S, Kephart ET, Lloyd P, Stinson EO, Washington NL, Perry MD, Ruzanov P, Zha Z, Lewis SE, Stein LD, Micklem G: modMine: flexible access to modENCODE data. Nucleic Acids Res 2012, 40:D1082-D1088.
  • [22]Ernst J, Kellis M: ChromHMM: automating chromatin-state discovery and characterization. Nat Methods 2012, 9(3):215-216.
  • [23]Yip KY, Cheng C, Bhardwaj N, Brown JB, Leng J, Kundaje A, Rozowsky J, Birney E, Bickel P, Snyder M, Gerstein M: Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol 2012, 13(9):R48.
  • [24]Karolchik D, Barber GP, Casper J, Clawson H, Cline MS, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hinrichs AS, Learned K, Lee BT, Li CH, Raney BJ, Rhead B, Rosenbloom KR, Sloan CA, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ: The UCSC genome browser database: 2014 update. Nucleic Acids Res 2014, 42:764-770.
  • [25]Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 2005, 15:1034-1050.
  • [26]Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A: Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 2010, 20:110-121.
  • [27]Matys V, Fricke E, Geffers R, Gößling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H Münch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E: TRANSFAC: Transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003, 31:374-378.
  • [28]Derrien T, Estellé J, Sola SM, Knowles DG, Raineri E, Ribeca P: Fast computation and applications of genome mappability. PLOS, ONE 2012, 7(1):e30377.
  • [29]Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 1999, 27:573-580.
  • [30]Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, et al.: GENCODE: the reference human genome annotation for the ENCODE project. Genome Res 2012, 22:1760-1774.
  • [31]Sherry ST, Ward MH, Baker J, Kholodov , Phan L, Smigielski E, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001, 29:308-311.
  • [32]Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L, Parkinson H: The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 2014, 42:D1001-D1006.
  • [33]Stenson PD, Mort M, Ball EV, Shaw K, Phillips AD, Cooper DN: The human gene mutation database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet 2014, 133:1-9.
  • [34]Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Giron CG, Gordon L, Hourlier T, Hunt NSJ, Juettemann T, Kahari AK, Keenan S, Kulesha E, Martin FJ, Maurel T, McLaren WM, Murphy DN, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, Riat HS, Ruffier M, et al.: Ensembl 2014. Nucleic Acids Res 2014, 42:D749-D755.
  • [35]Anttila V, Winsvold BS, Gormley P, Kurth T, Bettella F, McMahon G, Kallela M, Malik R, Vries Bd, Terwindt G, Medland SE, Todt U, McArdle WL, Quaye L, Koiranen M, Ikram MA, Lehtimäki T, Stam AH, Ligthart L, Wedenoja J, Dunham I, Neale BM, Palta P, Hamalainen E, Schurks M, Rose LM, Buring JE, Ridker PM, Steinberg S, Stefansson H, et al.: Genome-wide meta-analysis identifies new susceptibility loci for migraine. Nature Genetics 2013, 45(8):912-917.
  • [36]Church GM: The personal genome project. Mol Syst Biol 2005, 1(1):2005.0030.
  下载次数:30次 浏览次数:40次