期刊论文详细信息
BMC Bioinformatics
A systematic comparison of genome-scale clustering algorithms
Proceedings
Mikael Benson1  Andy D Perkins2  John D Eblen3  Yun Zhang4  Elissa J Chesler5  Jeremy J Jay5  Brynn H Voy6  Michael A Langston6  Arnold M Saxton6 
[1] Linköping University, SE-581 85, Linköping, Sweden;Mississippi State University, MS 39762, Mississippi State, USA;Oak Ridge National Laboratory, 37831, Oak Ridge, TN, USA;Pioneer Hi-Bred International Incorporated, 50131, Johnston, IA, USA;The Jackson Laboratory, 04609, Bar Harbor, ME, USA;University of Tennessee, 37996, Knoxville, TN, USA;
关键词: Cluster Algorithm;    Maximal Clique;    Rand Index;    Jaccard Similarity;    Average Cluster Size;   
DOI  :  10.1186/1471-2105-13-S10-S7
来源: Springer
PDF
【 摘 要 】

BackgroundA wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae.MethodsFor each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each cluster's agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method.ResultsClusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods.ConclusionsValidation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted.

【 授权许可】

Unknown   
© Jay et al.; licensee BioMed Central Ltd. 2012. This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

【 预 览 】
附件列表
Files Size Format View
RO202311090026201ZK.pdf 1230KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  文献评价指标  
  下载次数:6次 浏览次数:0次