期刊论文详细信息
Algorithms for Molecular Biology
A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series
Sara C Madeira2  Arlindo L Oliveira1 
[1] Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
[2] University of Beira Interior, Covilhã, Portugal
Others  :  797091
DOI  :  10.1186/1748-7188-4-8
 received in 2008-07-14, accepted in 2009-06-04,  发布年份 2009
PDF
【 摘 要 】

Background

The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of efficient biclustering algorithms able to identify all maximal contiguous column coherent biclusters.

Methods

In this work, we propose e-CCC-Biclustering, a biclustering algorithm that finds and reports all maximal contiguous column coherent biclusters with approximate expression patterns in time polynomial in the size of the time series gene expression matrix. This polynomial time complexity is achieved by manipulating a discretized version of the original matrix using efficient string processing techniques. We also propose extensions to deal with missing values, discover anticorrelated and scaled expression patterns, and different ways to compute the errors allowed in the expression patterns. We propose a scoring criterion combining the statistical significance of expression patterns with a similarity measure between overlapping biclusters.

Results

We present results in real data showing the effectiveness of e-CCC-Biclustering and its relevance in the discovery of regulatory modules describing the transcriptomic expression patterns occurring in Saccharomyces cerevisiae in response to heat stress. In particular, the results show the advantage of considering approximate patterns when compared to state of the art methods that require exact matching of gene expression time series.

Discussion

The identification of co-regulated genes, involved in specific biological processes, remains one of the main avenues open to researchers studying gene regulatory networks. The ability of the proposed methodology to efficiently identify sets of genes with similar expression patterns is shown to be instrumental in the discovery of relevant biological phenomena, leading to more convincing evidence of specific regulatory mechanisms.

Availability

A prototype implementation of the algorithm coded in Java together with the dataset and examples used in the paper is available in http://kdbio.inesc-id.pt/software/e-ccc-biclustering webcite.

【 授权许可】

   
2009 Madeira and Oliveira; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140706032716508.pdf 4916KB PDF download
Figure 19. 62KB Image download
Figure 18. 67KB Image download
Figure 17. 51KB Image download
Figure 16. 137KB Image download
Figure 15. 116KB Image download
Figure 14. 48KB Image download
Figure 13. 105KB Image download
Figure 12. 85KB Image download
Figure 11. 95KB Image download
Figure 10. 68KB Image download
Figure 9. 14KB Image download
Figure 8. 122KB Image download
Figure 7. 110KB Image download
Figure 6. 50KB Image download
Figure 5. 126KB Image download
Figure 4. 109KB Image download
Figure 3. 21KB Image download
Figure 2. 56KB Image download
Figure 1. 20KB Image download
Figure 1. 20KB Image download
Figure 1. 20KB Image download
Figure 1. 20KB Image download
【 图 表 】

Figure 1.

Figure 1.

Figure 1.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

Figure 11.

Figure 12.

Figure 13.

Figure 14.

Figure 15.

Figure 16.

Figure 17.

Figure 18.

Figure 19.

【 参考文献 】
  • [1]Bar-Joseph Z: Analyzing time series gene expression data. Bioinformatics 2004, 20(16):2493-2503.
  • [2]Androulakis IP, Yang E, Almon RR: Analysis of Time-Series Gene Expression Data: Methods, Challenges and Opportunities. Annual Review of Biomedical Engineering 2007, 9:205-228.
  • [3]McLachlan GJ, Do K, Ambroise C: Analysing microarray gene expression data. Wiley Series in Probability and Statistics; 2004.
  • [4]Cheng Y, Church GM: Biclustering of Expression Data. In Proc of the 8th International Conference on Intelligent Systems for Molecular Biology 2000, 93-103.
  • [5]Mechelen IV, Bock HH, Boeck PD: Two-mode clustering methods: a structured overview. Stat Methods Med Res. 2004, 13(5):363-394.
  • [6]Madeira SC, Oliveira AL: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 2004, 1(1):24-45.
  • [7]Tanay A, Sharan R, Shamir R: Discovering statistically significant biclusters in gene expression data. Bioinformatics. 2002, 18(Suppl 1):S136-S144.
  • [8]Ben-Dor A, Chor B, Karp R, Yakhini Z: Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem. J Comput Biol 2002, 10(3-4):373-384.
  • [9]Madeira SC, Teixeira MC, Sá-Correia I, Oliveira AL: Identification of Regulatory Modules in Time Series Gene Expression Data using a Linear Time Biclustering Algorithm. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 21 Mar. 2008. IEEE Computer Society Digital Library. IEEE Computer Society;
  • [10]Peeters R: The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics 2003, 131(3):651-654.
  • [11]Yang E, Foteinou PT, King K, Yarmush ML, Androulakis I: A novel non-overlapping bi-clustering algorithm for network generation using living cell array data. Bioinformatics 2007, 23(17):2306-2313.
  • [12]Murali TM, Kasif S: Extracting conserved gene expression motifs from gene expression data. Pac Symp Biocomput 2003, 77-88.
  • [13]Koyuturk M, Szpankowski W, Grama A: Biclustering Gene-Feature Matrices for Statistically Significant Dense Patterns. In Proc of the 8th International Conference on Research in Computational Molecular Biology 2004, 480-484.
  • [14]Liu J, Wang W, Yang J: Biclustering in gene expression data by tendency. In Proc of the 3rd International IEEE Computer Society Computational Systems Bioinformatics Conference 2004, 182-193.
  • [15]Liu J, Wang W, Yang J: A framework for ontology-driven subspace clustering. In Proc of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2004, 623-628.
  • [16]Liu J, Wang W, Yang J: Gene ontology friendly biclustering of expression profiles. In Proc of the 3rd IEEE Computational Systems Bioinformatics Conference 2004, 436-447.
  • [17]Liu J, Wang W, Yang J: Mining Sequential Patterns from Large Data Sets. Kluwer 2005., 18
  • [18]Lonardi S, Szpankowski W, Yang Q: Finding Biclusters by Random Projections. In Proc of the 15th Annual Symposium on Combinatorial Pattern Matching 2004, 102-116.
  • [19]Sheng Q, Moreau Y, Moor BD: Biclustering microarray data by Gibbs sampling. Bioinformatics 2003, 19 Suppl 2:ii196-ii205.
  • [20]Ji L, Tan K: Identifying time-lagged gene clusters using gene expression data. Bioinformatics 2005, 21(4):509-516.
  • [21]Wu C, Fu Y, Murali TM, Kasif S: Gene expression module discovery using Gibbs sampling. Genome Informatics 2004, 15:239-248.
  • [22]Madeira SC, Oliveira AL: A Linear Time Biclustering Algorithm for Time Series Gene Expression Data. In Proc of the 5th Workshop on Algorithms in Bioinformatics. Springer Verlag, LNCS/LNBI 3692; 2005:39-52.
  • [23]Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 2006, 22(10):1282-1283.
  • [24]Zhang Y, Zha H, Chu CH: A Time-Series Biclustering Algorithm for Revealing Co-Regulated Genes. In Proc of the 5th IEEE International Conference on Information Technology: Coding and Computing 2005, 32-37.
  • [25]Gusfield D: Algorithms on strings, trees, and sequences. Computer Science and Computational Biology Series, Cambridge University Press; 1997.
  • [26]Sagot MF: Spelling approximate repeated or common motifs using a suffix tree. In Proc of Latin'98. Springer Verlag, LNCS 1380; 1998:111-127.
  • [27]Madeira SC: Efficient Biclustering Algorithms for Time Series Gene Expression Data Analysis. PhD thesis. Instituto Superior Técnico, Technical University of Lisbon; 2008.
  • [28]Martin D, Brun C, Remy E, Mouren P, Thieffry D, Jacq B: GOToolBox: functional investigation of gene datasets based on Gene Ontology. [http://burgundy.cmmt.ubc.ca/GOToolBox/] webciteGenome Biology 2004, 5(12):R101.
  • [29]Teixeira MC, Monteiro P, Jain P, Tenreiro S, Fernandes AR, Mira NP, Alenquer M, Freitas AT, Oliveira AL, Sa-Correia I: The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae. [http://www.yeastract.com/] webciteNucleic Acids Research 2006, 34:D446-D451.
  • [30]Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Molecular Biology of the Cell 2000, 11:4241-4257.
  • [31]Ji L, Tan K: Mining gene expression data for positive and negative co-regulated gene clusters. Bioinformatics 2004, 20(16):2711-2718.
  • [32]Madeira SC, Oliveira AL: An Efficient Biclustering Algorithm for finding Genes with Similar Patterns in Time-Series Expression Data. In Proc of the 5th Asia Pacific Bioinformatics Conference, Series in Advances in Bioinformatics and Computational Biology. Volume 5. Imperial College Press; 2007::67-80.
  文献评价指标  
  下载次数:218次 浏览次数:25次