期刊论文详细信息
BMC Genomics
Dynamic association rules for gene expression data analysis
Methodology Article
Wen-Hsiung Li1  Cheng-Han Chung2  Shu-Chuan Chen3  Tsung-Hsien Tsai4 
[1] Academia Sinica, 115, Taipei, Taiwan;Department of Ecology and Evolution, University of Chicago, 60637, Chicago, IL, USA;Department of Biological Sciences, Idaho State University, 83209, Pocatello, ID, USA;Department of Mathematics and Statistics, Idaho State University, 83209, Pocatello, ID, USA;Department of Statistics, National Cheng-Kung University, 701, Tainan, Taiwan;
关键词: Association rules;    Gene expression data;    Bioinformatics;    Data mining;    Transcriptome analysis;   
DOI  :  10.1186/s12864-015-1970-x
 received in 2015-03-02, accepted in 2015-10-02,  发布年份 2015
来源: Springer
PDF
【 摘 要 】

BackgroundThe purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted.ResultsWe developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease.ConclusionsIn the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.

【 授权许可】

CC BY   
© Chen et al. 2015

【 预 览 】
附件列表
Files Size Format View
RO202311102267138ZK.pdf 2615KB PDF download
Fig. 5 1132KB Image download
12864_2023_9693_Article_IEq1.gif 1KB Image download
MediaObjects/40249_2023_1145_MOESM2_ESM.docx 37KB Other download
Fig. 5 893KB Image download
Fig. 7 2851KB Image download
MediaObjects/13046_2023_2851_MOESM8_ESM.docx 44KB Other download
Fig. 1 378KB Image download
Fig. 1 161KB Image download
【 图 表 】

Fig. 1

Fig. 1

Fig. 7

Fig. 5

12864_2023_9693_Article_IEq1.gif

Fig. 5

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  • [38]
  • [39]
  • [40]
  • [41]
  • [42]
  • [43]
  • [44]
  • [45]
  • [46]
  • [47]
  • [48]
  • [49]
  • [50]
  • [51]
  • [52]
  • [53]
  • [54]
  • [55]
  • [56]
  文献评价指标  
  下载次数:2次 浏览次数:1次