期刊论文详细信息
Proteome Science
Protein sequence classification using feature hashing
Proceedings
Cornelia Caragea1  Prasenjit Mitra2  Adrian Silvescu3 
[1] Information Sciences and Technology, Pennsylvania State University, University Park, PA, USA;Information Sciences and Technology, Pennsylvania State University, University Park, PA, USA;Naviance Inc., Oakland, CA, USA;Naviance Inc., Oakland, CA, USA;
关键词: Hash Function;    Support Vector Machine Classifier;    Latent Dirichlet Allocation;    Average Mutual Information;    Latent Semantic Indexing;   
DOI  :  10.1186/1477-5956-10-S1-S14
来源: Springer
PDF
【 摘 要 】

Recent advances in next-generation sequencing technologies have resulted in an exponential increase in the rate at which protein sequence data are being acquired. The k-gram feature representation, commonly used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques can be crucial for the performance and the complexity of the learning algorithms. In this paper, we study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is "reduced" by hashing the features into a low-dimensional space, using a hash function, i.e., by mapping features into hash keys, where multiple features can be mapped (at random) to the same hash key, and "aggregating" their counts. We compare feature hashing with the "bag of k-grams" approach. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.

【 授权许可】

CC BY   
© Caragea et al; licensee BioMed Central Ltd. 2012

【 预 览 】
附件列表
Files Size Format View
RO202311105843970ZK.pdf 1066KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  • [22]
  • [23]
  • [24]
  • [25]
  • [26]
  • [27]
  • [28]
  • [29]
  • [30]
  文献评价指标  
  下载次数:1次 浏览次数:0次