期刊论文详细信息
BMC Bioinformatics
Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine
Jun Meng2  Dong Liu2  Chao Sun2  Yushi Luan1 
[1] School of Life Science and Biotechnology, Dalian University of Technology, Dalian 116023, Liaoning, China
[2] School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, Liaoning, China
关键词: Feature selection;    SVM;    Prediction;    Pre-miRNA;    MiRNA;   
Others  :  1114590
DOI  :  10.1186/s12859-014-0423-x
 received in 2014-07-30, accepted in 2014-12-11,  发布年份 2014
PDF
【 摘 要 】

Background

MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies have shown that miRNAs are involved in biological responses to a variety of biotic and abiotic stresses. Identification of these molecules and their targets can aid the understanding of regulatory processes. Recently, prediction methods based on machine learning have been widely used for miRNA prediction. However, most of these methods were designed for mammalian miRNA prediction, and few are available for predicting miRNAs in the pre-miRNAs of specific plant species. Although the complete Solanum lycopersicum genome has been published, only 77 Solanum lycopersicum miRNAs have been identified, far less than the estimated number. Therefore, it is essential to develop a prediction method based on machine learning to identify new plant miRNAs.

Results

A novel classification model based on a support vector machine (SVM) was trained to identify real and pseudo plant pre-miRNAs together with their miRNAs. An initial set of 152 novel features related to sequential structures was used to train the model. By applying feature selection, we obtained the best subset of 47 features for use with the Back Support Vector Machine-Recursive Feature Elimination (B-SVM-RFE) method for the classification of plant pre-miRNAs. Using this method, 63 features were obtained for plant miRNA classification. We then developed an integrated classification model, miPlantPreMat, which comprises MiPlantPre and MiPlantMat, to identify plant pre-miRNAs and their miRNAs. This model achieved approximately 90% accuracy using plant datasets from nine plant species, including Arabidopsis thaliana, Glycine max, Oryza sativa, Physcomitrella patens, Medicago truncatula, Sorghum bicolor, Arabidopsis lyrata, Zea mays and Solanum lycopersicum. Using miPlantPreMat, 522 Solanum lycopersicum miRNAs were identified in the Solanum lycopersicum genome sequence.

Conclusions

We developed an integrated classification model, miPlantPreMat, based on structure-sequence features and SVM. MiPlantPreMat was used to identify both plant pre-miRNAs and the corresponding mature miRNAs. An improved feature selection method was proposed, resulting in high classification accuracy, sensitivity and specificity.

【 授权许可】

   
2014 Meng et al.; licensee BioMed Central.

【 预 览 】
附件列表
Files Size Format View
20150205011929640.pdf 1369KB PDF download
Figure 5. 32KB Image download
Figure 4. 20KB Image download
Figure 3. 49KB Image download
Figure 2. 50KB Image download
Figure 1. 52KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]He L, Hannon GJ: MicroRNAs: small RNAs with a big role in gene regulation. Nat Rev Genet 2004, 7(5):522-531.
  • [2]Julia SR, Jacek K, Edyta K, Piotr K: Structural basis of microRNA length variety. Nucleic Acids Reaearch 2010, 39(1):257-268.
  • [3]Chatterjee S, Grobhans H: Active turnover modulates mature microRNA activity in caenorhabditis elegans. Nature 2009, 461:546-549.
  • [4]Baek D, Villen J, Shin C, Camargo FD, Gygi SP, Bartel DP: The impact of microRNAs on protein output. Nature 2008, 7209(455):64-71.
  • [5]Brennecke J, Hipfner DR, Stark A, Russell RB, Cohen SM: Bantam encodes a developmentally regulated microRNA that controls cell proliferation and regulates the proapoptotic gene hid in Drosophila. Cell 2003, 113(1):25-36.
  • [6]Xu PZ, Vernooy SY, Guo M, Hay BA: The Drosophila microRNA Mir-14suppresses cell death and is required for normal fat metabolism. Curr Biol 2003, 13(9):790-795.
  • [7]Shusei S, Michael E, Robert B, Li CB: The tomato genome sequence provides insights into fleshy fruit evolution. Nature 2012, 485:635-641.
  • [8]Mendes ND, Freitas AT, Sagot MF: Current tools for the identification of miRNA genes and their targets. Nucleic Acids Res 2009, 37(8):2419-2433.
  • [9]Lim LP, Lau NC, Weinstein EG: The microRNAs of Caenorhabditis elegans. Genes Dev 2003, 17(8):991-1008.
  • [10]Lai EC, Tomancak P, Williams RW: Computational identification of Drosophila microRNA genes.Genome Biol 2003, 7(4):R42.
  • [11]Huang TH, Fan B, Rothschild MF: MiRFinder: an improved approach and software implementation for genome-wide fast microRNA precursor scans.BMC Bioinformatics 2007, 8:341.
  • [12]Bartel DP: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 2004, 116(2):281-297.
  • [13]Xue CH, Li F, He T: Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 2005, 6:310-316. BioMed Central Full Text
  • [14]Yousef M, Nebozhyn M, Shatkay H: Combining multi-species genomic data for microRNA identification using a Naïve Bayes classifier. Bioinformatics 2006, 22(11):1325-1334.
  • [15]Jiang P, Wu H, Wang W: MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res 2007, 35(Web Server issue)):W339-344.
  • [16]Gordon L, Chervonenkis AY, Gammerman AJ, Shahmuradov IA, Solovyev VV: Sequence alignment kernel for recognition ofpromoter regions. Bioinformatics 2003, 19(15):1964-1971.
  • [17]Lim LP, Lau NC, Weinstein EG: The microRNAs of Caenorhabditis elegans. Genes Dev 2003, 17(8):991-1008.
  • [18]Meng J, Shi L, Luan YS: Plant microRNA-target interaction identification model based on the integration of prediction tools and support vector machine.Plos One 2014, 9(7):e103181.
  • [19]Lai EC, Tomancak P, Williams RW: Computational identification of Drosophila microRNA genes.Genome Biol 2003, 7(4):R42.
  • [20]Jones-Rhoades MW, Bartel DP: Computational identification of plant microRNAs and their targets including a stress-induced miRNA. Mol Cell 2004, 14(6):787-799.
  • [21]Schultes EA, Hraber PT, LaBean TH: Estimating the contributions of selection and self-organization in RNA secondary structure. J Mol Evol 1999, 49(1):76-83.
  • [22]Wojciechowski P, Formanowicz P, Blazewicz J: Reference Alignment Based Methods for Quality Evaluation of Multiple Sequence Alignment-A Survey. Current Bioinformatics 2014, 9(1):44-56.
  • [23]Maji S, Garg D: Hybrid Approach Using SVM and MM2 in Splice Site Junction Identification. Current Bioinformatics 2014, 9(1):76-85.
  • [24]Moorthy K, Saberi M, Deris S: A review on missing value imputation algorithms for microarray gene expression data. Current Bioinformatics 2014, 9(1):18-22.
  • [25]Zhang BH, Pan XP, Cox SB: Evidence that miRNAs are different from other RNAs. Cell Mol Life Sci 2006, 63(2):246-254.
  • [26]Batuwita R, Palade V: microPred: effective classification of pre-miRNAs for human miRNA gene prediction. Bioinformatics 2009, 25(8):989-995.
  • [27]Xuan P, Guo M, Liu X: PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs. Bioinformatics 2011, 27(10):1368-1376.
  • [28]Chih-Chung C, Chih-Jen L: LIBSVM: A Library for Support Vector Machines.ACM Trans Intell Syst Technol 2011, 2(3):27 (27 pp.).
  • [29]Guyon I, Weston J, Barnhill S: Gene selection for cancer classification using support vector machines. Machine learning 2002, 46(1–3):389-422.
  • [30]Guyon I, Elisseeff A: An introduction to variable and feature selection. The J Mach Learn Res 2003, 3:1157-1182.
  • [31]Kent JT: Information gain and a general measure of correlation. Biometrika 1983, 70(1):163-173.
  • [32]Chawla NV, Bowyer KW, Hall LO: SMOTE: synthetic minority over-sampling technique. arXiv preprint 2011, 11(6):1813-1819.
  • [33]Hall MA: Correlation-based feature selection for machine learning. The University of Waikato, Hamilton; 1999.
  • [34]John GH, Langley P: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence. San Francisco: Morgan Kaufmann Publishers Inc., 1995: 338–345.
  • [35]Breiman L: Random Forests. Machine Learning 2001, 45:5-32.
  • [36]Xuan P, Guo M, Liu X, Huang Y, Li W, Huang Y: PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs. Bioinformatics 2011, 27(10):1368-1376.
  • [37]Bonnet E, Wuyts J, Rouzé P, Van de Peer Y: Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics 2004, 20(17):2911-2917.
  • [38]Umesono K, Evans RM: Determinants of target gene specificity for steroid/thyroid hormone receptors. Cell 1989, 57(7):1139-1146.
  • [39]Lund E, Sheets MD, Imboden SB: Limiting Ago protein restricts RNAi and microRNA biogenesis during early development in Xenopus laevis. Genes Dev 2011, 25(11):1121-1131.
  • [40]Regnier M: Knuth-Morris-Pratt algorithm: an analysis. In: Mathematical Foundations of Computer Science 1989. New York: Springer, 1989: 431–444.
  • [41]Ohler U, Yekta S, Lim LP: Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification. RNA 2004, 10(9):1309-1322.
  文献评价指标  
  下载次数:56次 浏览次数:11次