期刊论文详细信息
BMC Systems Biology
Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease
Steve Horvath3  Peter Langfelder2  Michael C Oldham1 
[1] Department of Neurology, The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, USA;Department of Human Genetics, University of California, Los Angeles, USA;Department of Biostatistics, University of California, Los Angeles, CA, USA
关键词: Gene expression;    Microarrays;    Data pre-processing;    Standardized C(k) curve;    cor(K,C);    Clustering coefficient;    Huntington’s disease;    Sample network analysis;    Sample networks;   
Others  :  1144190
DOI  :  10.1186/1752-0509-6-63
 received in 2012-03-01, accepted in 2012-05-03,  发布年份 2012
PDF
【 摘 要 】

Background

Genomic datasets generated by new technologies are increasingly prevalent in disparate areas of biological research. While many studies have sought to characterize relationships among genomic features, commensurate efforts to characterize relationships among biological samples have been less common. Consequently, the full extent of sample variation in genomic studies is often under-appreciated, complicating downstream analytical tasks such as gene co-expression network analysis.

Results

Here we demonstrate the use of network methods for characterizing sample relationships in microarray data generated from human brain tissue. We describe an approach for identifying outlying samples that does not depend on the choice or use of clustering algorithms. We introduce a battery of measures for quantifying the consistency and integrity of sample relationships, which can be compared across disparate studies, technology platforms, and biological systems. Among these measures, we provide evidence that the correlation between the connectivity and the clustering coefficient (two important network concepts) is a sensitive indicator of homogeneity among biological samples. We also show that this measure, which we refer to as cor(K,C), can distinguish biologically meaningful relationships among subgroups of samples. Specifically, we find that cor(K,C) reveals the profound effect of Huntington’s disease on samples from the caudate nucleus relative to other brain regions. Furthermore, we find that this effect is concentrated in specific modules of genes that are naturally co-expressed in human caudate nucleus, highlighting a new strategy for exploring the effects of disease on sets of genes.

Conclusions

These results underscore the importance of systematically exploring sample relationships in large genomic datasets before seeking to analyze genomic feature activity. We introduce a standardized platform for this purpose using freely available R software that has been designed to enable iterative and interactive exploration of sample networks.

【 授权许可】

   
2012 Oldham et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150330092431885.pdf 2574KB PDF download
  文献评价指标  
  下载次数:4次 浏览次数:11次