期刊论文详细信息
PeerJ
Potential Arabidopsis thaliana glucosinolate genes identified from the co-expression modules using graph clustering approach
Sarahani Harun1  Zeti-Azura Mohamed-Hussein1  Mohammad Bozlul Karim2  Md Altaf Ul Amin2  Shigehiko Kanaya2  Nor Afiqah-Aleng3 
[1] Centre for Bioinformatics Research, Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia, UKM Bangi, Selangor, Malaysia;Graduate School of Science and Technology & NAIST Data Science Center, Nara Institute of Science and Technology, Nara, Japan;Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Nerus, Terengganu, Malaysia;
关键词: Secondary metabolites;    Nitrogen-containing compounds;    Aliphatic glucosinolates;    Indolic glucosinolates;    Graph clustering;    Gene network analysis;   
DOI  :  10.7717/peerj.11876
来源: DOAJ
【 摘 要 】

Background Glucosinolates (GSLs) are plant secondary metabolites that contain nitrogen-containing compounds. They are important in the plant defense system and known to provide protection against cancer in humans. Currently, increasing the amount of data generated from various omics technologies serves as a hotspot for new gene discovery. However, sometimes sequence similarity searching approach is not sufficiently effective to find these genes; hence, we adapted a network clustering approach to search for potential GSLs genes from the Arabidopsis thaliana co-expression dataset. Methods We used known GSL genes to construct a comprehensive GSL co-expression network. This network was analyzed with the DPClusOST algorithm using a density of 0.5. 0.6. 0.7, 0.8, and 0.9. Generating clusters were evaluated using Fisher’s exact test to identify GSL gene co-expression clusters. A significance score (SScore) was calculated for each gene based on the generated p-value of Fisher’s exact test. SScore was used to perform a receiver operating characteristic (ROC) study to classify possible GSL genes using the ROCR package. ROCR was used in determining the AUC that measured the suitable density value of the cluster for further analysis. Finally, pathway enrichment analysis was conducted using ClueGO to identify significant pathways associated with the GSL clusters. Results The density value of 0.8 showed the highest area under the curve (AUC) leading to the selection of thirteen potential GSL genes from the top six significant clusters that include IMDH3, MVP1, T19K24.17, MRSA2, SIR, ASP4, MTO1, At1g21440, HMT3, At3g47420, PS1, SAL1, and At3g14220. A total of Four potential genes (MTO1, SIR, SAL1, and IMDH3) were identified from the pathway enrichment analysis on the significant clusters. These genes are directly related to GSL-associated pathways such as sulfur metabolism and valine, leucine, and isoleucine biosynthesis. This approach demonstrates the ability of the network clustering approach in identifying potential GSL genes which cannot be found from the standard similarity search.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:0次