期刊论文详细信息
BMC Bioinformatics
Critical assessment of on-premise approaches to scalable genome analysis
Research
Habiba Alsafar1  Gihan Daw Elbait2  Amira Al-Aamri3  Syafiq Kamarul Azman3  Andreas Henschel4 
[1] Center for Biotechnology (BTC), Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates;Department of Biomedical Engineering, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates;Department of Biology, College of Arts and Sciences, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates;Center for Biotechnology (BTC), Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates;Department of Electrical Engineering and Computer Science, College of Engineering, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates;Department of Electrical Engineering and Computer Science, College of Engineering, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates;Center for Biotechnology (BTC), Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates;
关键词: Genomic data science;    Big data;    Genomic databases;    SQL;    VCF;    NoSQL;    Horizontal scaling;   
DOI  :  10.1186/s12859-023-05470-2
 received in 2023-05-27, accepted in 2023-09-08,  发布年份 2023
来源: Springer
PDF
【 摘 要 】

BackgroundPlummeting DNA sequencing cost in recent years has enabled genome sequencing projects to scale up by several orders of magnitude, which is transforming genomics into a highly data-intensive field of research. This development provides the much needed statistical power required for genotype–phenotype predictions in complex diseases.MethodsIn order to efficiently leverage the wealth of information, we here assessed several genomic data science tools. The rationale to focus on on-premise installations is to cope with situations where data confidentiality and compliance regulations etc. rule out cloud based solutions. We established a comprehensive qualitative and quantitative comparison between BCFtools, SnpSift, Hail, GEMINI, and OpenCGA. The tools were compared in terms of data storage technology, query speed, scalability, annotation, data manipulation, visualization, data output representation, and availability.ResultsTools that leverage sophisticated data structures are noted as the most suitable for large-scale projects in varying degrees of scalability in comparison to flat-file manipulation (e.g., BCFtools, and SnpSift). Remarkably, for small to mid-size projects, even lightweight relational database.ConclusionThe assessment criteria provide insights into the typical questions posed in scalable genomics and serve as guidance for the development of scalable computational infrastructure in genomics.

【 授权许可】

CC BY   
© BioMed Central Ltd., part of Springer Nature 2023

【 预 览 】
附件列表
Files Size Format View
RO202310111784616ZK.pdf 1662KB PDF download
12951_2023_2095_Article_IEq11.gif 1KB Image download
12936_2023_4724_Article_IEq15.gif 1KB Image download
12888_2023_5142_Article_IEq8.gif 1KB Image download
40795_2023_760_Article_IEq27.gif 1KB Image download
Fig. 2 886KB Image download
12888_2023_5145_Article_IEq1.gif 1KB Image download
Fig. 1 281KB Image download
Fig. 3 919KB Image download
Fig. 2 823KB Image download
Fig. 4 915KB Image download
Fig. 3 1782KB Image download
42004_2023_995_Article_IEq11.gif 1KB Image download
Fig. 4 964KB Image download
Fig. 4 861KB Image download
MediaObjects/13395_2023_324_MOESM1_ESM.docx 7665KB Other download
Fig. 1 190KB Image download
Fig. 5 517KB Image download
MediaObjects/13570_2023_286_MOESM1_ESM.docx 29KB Other download
Fig. 2 406KB Image download
Fig. 1 1222KB Image download
Fig. 1 60KB Image download
MediaObjects/12888_2023_5196_MOESM1_ESM.docx 66KB Other download
13690_2023_1170_Article_IEq108.gif 1KB Image download
Fig. 5 1835KB Image download
Fig. 1 322KB Image download
Fig. 1 472KB Image download
12888_2023_5172_Article_IEq21.gif 1KB Image download
MediaObjects/13227_2023_218_MOESM4_ESM.pdf 1656KB PDF download
42004_2023_995_Article_IEq34.gif 1KB Image download
Fig. 2 257KB Image download
42004_2023_995_Article_IEq36.gif 1KB Image download
MediaObjects/12888_2023_5199_MOESM3_ESM.pdf 386KB PDF download
Fig. 4 982KB Image download
MediaObjects/12944_2023_1911_MOESM2_ESM.docx 2768KB Other download
Fig. 5 1561KB Image download
【 图 表 】

Fig. 5

Fig. 4

42004_2023_995_Article_IEq36.gif

Fig. 2

42004_2023_995_Article_IEq34.gif

12888_2023_5172_Article_IEq21.gif

Fig. 1

Fig. 1

Fig. 5

13690_2023_1170_Article_IEq108.gif

Fig. 1

Fig. 1

Fig. 2

Fig. 5

Fig. 1

Fig. 4

Fig. 4

42004_2023_995_Article_IEq11.gif

Fig. 3

Fig. 4

Fig. 2

Fig. 3

Fig. 1

12888_2023_5145_Article_IEq1.gif

Fig. 2

40795_2023_760_Article_IEq27.gif

12888_2023_5142_Article_IEq8.gif

12936_2023_4724_Article_IEq15.gif

12951_2023_2095_Article_IEq11.gif

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  文献评价指标  
  下载次数:10次 浏览次数:2次