期刊论文详细信息
BMC Bioinformatics
Latent Semantic Indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets
Proceedings
Ramin Homayouni1  Kevin Heinrich2  Sujoy Roy3  Vinhthuy Phan4  Michael W Berry5 
[1] Bioinformatics Program, University of Memphis, 38152, Memphis, TN, USA;Department of Biology, University of Memphis, 38152, Memphis, TN, USA;Computable Genomix, 38163, Memphis, TN, USA;Department of Computer Science, University of Memphis, 38152, Memphis, TN, USA;Department of Computer Science, University of Memphis, 38152, Memphis, TN, USA;Bioinformatics Program, University of Memphis, 38152, Memphis, TN, USA;Department of Electrical Engineering and Computer Science, University of Tennessee, 37996, Knoxville, TN, USA;
关键词: Singular Value Decomposition;    Receiver Operating Characteristic;    Tunicamycin;    Implicit Association;    Latent Semantic Indexing;   
DOI  :  10.1186/1471-2105-12-S10-S19
来源: Springer
PDF
【 摘 要 】

BackgroundIdentification of transcription factors (TFs) responsible for modulation of differentially expressed genes is a key step in deducing gene regulatory pathways. Most current methods identify TFs by searching for presence of DNA binding motifs in the promoter regions of co-regulated genes. However, this strategy may not always be useful as presence of a motif does not necessarily imply a regulatory role. Conversely, motif presence may not be required for a TF to regulate a set of genes. Therefore, it is imperative to include functional (biochemical and molecular) associations, such as those found in the biomedical literature, into algorithms for identification of putative regulatory TFs that might be explicitly or implicitly linked to the genes under investigation.ResultsIn this study, we present a Latent Semantic Indexing (LSI) based text mining approach for identification and ranking of putative regulatory TFs from microarray derived differentially expressed genes (DEGs). Two LSI models were built using different term weighting schemes to devise pair-wise similarities between 21,027 mouse genes annotated in the Entrez Gene repository. Amongst these genes, 433 were designated TFs in the TRANSFAC database. The LSI derived TF-to-gene similarities were used to calculate TF literature enrichment p-values and rank the TFs for a given set of genes. We evaluated our approach using five different publicly available microarray datasets focusing on TFs Rel, Stat6, Ddit3, Stat5 and Nfic. In addition, for each of the datasets, we constructed gold standard TFs known to be functionally relevant to the study in question. Receiver Operating Characteristics (ROC) curves showed that the log-entropy LSI model outperformed the tf-normal LSI model and a benchmark co-occurrence based method for four out of five datasets, as well as motif searching approaches, in identifying putative TFs.ConclusionsOur results suggest that our LSI based text mining approach can complement existing approaches used in systems biology research to decipher gene regulatory networks by providing putative lists of ranked TFs that might be explicitly or implicitly associated with sets of DEGs derived from microarray experiments. In addition, unlike motif searching approaches, LSI based approaches can reveal TFs that may indirectly regulate genes.

【 授权许可】

Unknown   
© Roy et al; licensee BioMed Central Ltd. 2011. This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

【 预 览 】
附件列表
Files Size Format View
RO202311100252637ZK.pdf 885KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  • [31]
  • [32]
  • [33]
  • [34]
  • [35]
  • [36]
  • [37]
  文献评价指标  
  下载次数:1次 浏览次数:0次