期刊论文详细信息
Frontiers in Plant Science
Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods
Jiantao Yu1  Leyi Wei2  Kaiyang Qu2  Chunyu Wang4 
[1] College of Information Engineering, North-West A&College of Intelligence and Computing, Tianjin University, Tianjin, China;F University, Yangling, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China;
关键词: pentatricopeptide repeat;    mixed feature extraction methods;    maximum relevant maximum distance;    random forest;    J48;    naïve bayes;   
DOI  :  10.3389/fpls.2018.01961
来源: DOAJ
【 摘 要 】

Motivation: Pentatricopeptide repeat (PPR) is a triangular pentapeptide repeat domain that plays a vital role in plant growth. In this study, we seek to identify PPR coding genes and proteins using a mixture of feature extraction methods. We use four single feature extraction methods focusing on the sequence, physical, and chemical properties as well as the amino acid composition, and mix the features. The Max-Relevant-Max-Distance (MRMD) technique is applied to reduce the feature dimension. Classification uses the random forest, J48, and naïve Bayes with 10-fold cross-validation.Results: Combining two of the feature extraction methods with the random forest classifier produces the highest area under the curve of 0.9848. Using MRMD to reduce the dimension improves this metric for J48 and naïve Bayes, but has little effect on the random forest results.Availability and Implementation: The webserver is available at: http://server.malab.cn/MixedPPR/index.jsp.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:4次