BMC Systems Biology | |
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment | |
Wei-Che Hwang1  Yu-Ting Hsiao1  Wei-Po Lee1  | |
[1] Department of Information Management, National Sun Yat-sen University, Kaohsiung, Taiwan | |
关键词: MapReduce; Cloud computing; Parallel model; Swarm intelligence; Evolutionary algorithm; Systems biology; Gene network inference; | |
Others : 1141552 DOI : 10.1186/1752-0509-8-5 |
|
received in 2013-06-04, accepted in 2014-01-06, 发布年份 2014 |
【 摘 要 】
Background
To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks.
Results
This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced.
Conclusions
Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
【 授权许可】
2014 Lee et al.; licensee BioMed Central Ltd.
Files | Size | Format | View |
---|---|---|---|
Figure 6. | 51KB | Image | download |
Figure 10. | 58KB | Image | download |
Figure 9. | 60KB | Image | download |
Figure 8. | 58KB | Image | download |
Figure 7. | 109KB | Image | download |
Figure 6. | 119KB | Image | download |
Figure 5. | 73KB | Image | download |
Figure 4. | 75KB | Image | download |
Figure 3. | 31KB | Image | download |
Figure 2. | 35KB | Image | download |
Figure 1. | 52KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
Figure 9.
Figure 10.
Figure 6.
【 参考文献 】
- [1]Ingolia NT, Weissman JS: Systems biology: reverse engineering the cell. Nature 2008, 454:1059-1062.
- [2]Lee W-P, Tzou W-S: Computational methods for discovering gene networks from expression data. Brief Bioinform 2009, 10(4):408-423.
- [3]Ay A, Arnosti DN: Mathematical modeling of gene expression: a guide for the perplexed biologist. Crit Rev Biochem Mol Biol 2011, 46(2):137-151.
- [4]Maki Y, Ueda T, Okamoto M, Uematsu N, Inamura K, Uchida K, Takahashi Y, Eguchi Y: Inference of genetic network using the expression profile time course data of mouse P19 cells. Genome Inform 2002, 13:382-383.
- [5]Kikuchi S, Tominaga D, Arita M, Tomita M: Dynamic modeling of genetic networks using genetic algorithm and S-system. Bioinformatics 2003, 19:643-650.
- [6]Noman N, Iba H: Inferring gene regulatory networks using differential evolution with local search. IEEE/ACM Trans Comput Biol Bioinform 2007, 4:634-647.
- [7]Ho S-Y, Ho S-Y, Hsieh C-H, Huang HL: "An intelligent two-stage evolutionary algorithm for dynamic pathway identification from gene expression profiles". IEEE/ACM Trans Comput Biol Bioinform 2007, 4:648-704.
- [8]Kabir M, Noman N, Iba H: Reversely engineering gene regulatory network from microarray data using linear time-variant model. BMC Bioinform 2010, 11:S56. BioMed Central Full Text
- [9]Lee W-P, Hsiao Y-T: Inferring gene regulatory networks using a hybrid GA-PSO approach with numerical constraints and network decomposition. Inform Sci 2012, 188:80-99.
- [10]Germany: it-weise.de (self-published). 2009. Available online: http://www.it-weise.de/ webcite
- [11]Bazil JN, Qi F, Beard DA: A parallel algorithm for reverse engineering of biological networks. Integr Biol 2011, 3(12):1145-1145.
- [12]Sirbu A, Ruskin HJ, Crane M: Comparison of evolutionary algorithms in gene regulatory network model inference. BMC Bioinform 2010, 11:59. BioMed Central Full Text
- [13]Jostins L, Jaeger J: Reverse engineering a gene network using an asynchronous parallel evolution strategy. BMC Syst Biol 2010, 4:17. BioMed Central Full Text
- [14]Tominaga D, Koga N, Okamoto M: Efficient numerical optimization algorithm based on genetic algorithm for inverse problem. Proc Genet Evol Comput Conf 2000, 251-258.
- [15]Moles CG, Mendes P, Banga J: Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res 2003, 13(11):2467-2474.
- [16]Lee W-P, Hsiao Y-T: An adaptive GA-PSO approach with gene clustering to infer S-system models of gene regulatory networks. Comput J 2011, 54(9):1449-1464.
- [17]Kimura S, Ide K, Kashihara A, Kano M, Hatakeyama M, Masui R, Nakagawa N, Yokoyama S, Kuramitsu S, Konagaya A: Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm. Bioinformatics 2005, 21(7):1154-1163.
- [18]Spieth C, Streichert F, Speer N, Zell A: A memetic inference method for gene regulatory networks based on S-Systems. Proc Congress Evol Comput 2004, 152-157.
- [19]Alba E, Tomassini M: Parallelism and evolutionary algorithms. IEEE Trans Evol Comput 2002, 6(5):443-462.
- [20]Cantú-Paz E: Efficient and accurate parallel genetic algorithms. New York: Springer; 2000.
- [21]Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Lee AG, Patterson D, Zaharia M: A view of cloud computing. Commun ACM 2010, 53(4):50-58.
- [22]Dean J, Ghemawat S: MapReduce: simplified data processing on large clusters. Commun ACM 2008, 51(1):107-113.
- [23]K–H L, Lee Y-J, Choi H, Chung YD, Moon B: Parallel data processing with MapReduce: A survey. ACM SIGMOD Record 2011, 40(4):11-19.
- [24]Qiu J, Ekanayake J, Gunarathne T, Choi JY, S–H B, Li H, Zhang B, Wu T-L, Ruan Y, Ekanayake S, Hughes A, Fox G: Hybrid cloud and cluster computing paradigms for life science applications. BMC Bioinform 2010, 11(Suppl 12):S3. BioMed Central Full Text
- [25]Krampis K, Booth T, Chapman B, Tiwari B, Bicak M, Field D, Nelson K: Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinform 2012, 13:42. BioMed Central Full Text
- [26]Schatz MC, Labgmead B, Salzberg SL: Cloud computing and the DNA data race. Nat Biotechnol 2010, 28(7):691-693.
- [27]Pratt B, Howbert JJ, Tasman NI, Nilsson EJ: MR-Tandem: parallel X! Tandem using Hadoop MapReduce on Amazon Web Services. Bioinformatics 2012, 28:136-137.
- [28]Taylor RC: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinform 2010, 11(Suppl 12):S1. BioMed Central Full Text
- [29]Schatz M: Cloudburst: highly sensitive read mapping with MapReduce. Bioinformatics 2009, 25(11):1363-1369.
- [30]Langmead B, Hansen KD, Leek JT: Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 2010, 11(8):R83. BioMed Central Full Text
- [31]Sadasivam GS, Baktavatchalam G: A novel approach to multiple sequence alignment using hadoop data grids. Proc Int Workshop Massive Data Anal Cloud 2010, 1-7.
- [32]Lewis S, Csordas A, Killcoyne S, Hermjakob H, Hoopmann MR, Moritz RL, Deutsch EW, Boyle J: Hydra: a scale proteomic search engine which utilizes the Hadoop distributed computing framework. BMC Bioinform 2012, 13:324. BioMed Central Full Text
- [33]Srirama SN, Jakovits P, Vainikko E: Adapting scientific computing problems to clouds using MapReduce. Futur Gener Comput Syst 2012, 28(1):184-192.
- [34]Ekanayake J, Li H, Zhang B, Gunarathne T, S–H B, Qiu J, Fox G: Twister: A runtime for iterative MapReduce. Proc Nineteenth ACM Int Symp High Perform Distributed Comput 2010, 810-818.
- [35]Kennedy J, Eberhart R: Swarm intelligence. San Francisco, CA: Morgan Kaufman Publishers; 2001.
- [36]Michalewicz Z: Genetic algorithms + data structures = evolution programs. Berlin, Germeny: Springer; 1999.
- [37]Schaffter T, Marbach D, Floreano D: GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 2011, 27(16):2263-2270.
- [38]Kim SY, Imoto S, Miyano S: Inferring gene networks from time series microarray data using dynamic Bayesian networks. Brief Bioinform 2003, 4(3):228-235.
- [39]Balaji S, Babu MM, Iyer LM, Luscombe NM, Aravind L: Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast. J Mol Biol 2006, 360(1):213-227.
- [40]Greenfield A, Madar A, Ostrer H, Bonneau R: DREAM4: Combining genetic and dynamic information to identify biological networks and dynamical models. PLoS One 2010, 5(10):e13397.
- [41]Gradshtenyn IS, Ryzhik IM: Table of integrals, series, and products. New York: Academic Press; 1980.
- [42]Fusaro VA, Patil P, Gafni E, Wall D, Tonellato P: Biomedical cloud computing with Amazon web servers. PLoS Comput Biol 2011, 7(8):e1002147.
- [43]Hill MD, Marty MR: Amdahl's law in the multicore era. IEEE Comput 2008, 41(7):33-38.
- [44]Amdahl GM: Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of Amer Federation of Information Processing Societies Conference. AFIPS Press; 1967.
- [45]Chou IC, Voit EO: Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Math Biosci 2009, 219(2):57-83.
- [46]Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P: Inferring regulatory networks from expression data using tree-based methods. PLoS One 2010, 5(9):e12776.
- [47]Haury AC, Mordelet F, Vera-Licona P, Vert J-P: TIGRESS: trustful inference of gene regulation using stability selection. BMC Syst Biol 2012, 6(1):145. BioMed Central Full Text
- [48]Bornholdt S: Systems biology: less is more in modeling large genetic networks. Sci Signal 2005, 310(5747):449.