期刊论文详细信息
BMC Bioinformatics
Keeping up with the genomes: efficient learning of our increasing knowledge of the tree of life
Alexandru Cristian1  Gail Rosen2  Zhengqiao Zhao2 
[1] Department of Computer Science, Drexel University, Market Street, Philadelphia, US;Ecological and Evolutionary Signal-process and Informatics (EESI) Lab, Department of Electrical and Computer Engineering, Drexel University, Market Street, Philadelphia, US;
关键词: Incremental learning;    Naïve Bayes taxanomic classifier;    RefSeq;    Metagenomics;   
DOI  :  10.1186/s12859-020-03744-7
来源: Springer
PDF
【 摘 要 】

BackgroundIt is a computational challenge for current metagenomic classifiers to keep up with the pace of training data generated from genome sequencing projects, such as the exponentially-growing NCBI RefSeq bacterial genome database. When new reference sequences are added to training data, statically trained classifiers must be rerun on all data, resulting in a highly inefficient process. The rich literature of “incremental learning” addresses the need to update an existing classifier to accommodate new data without sacrificing much accuracy compared to retraining the classifier with all data.ResultsWe demonstrate how classification improves over time by incrementally training a classifier on progressive RefSeq snapshots and testing it on: (a) all known current genomes (as a ground truth set) and (b) a real experimental metagenomic gut sample. We demonstrate that as a classifier model’s knowledge of genomes grows, classification accuracy increases. The proof-of-concept naïve Bayes implementation, when updated yearly, now runs in 1/4th of the non-incremental time with no accuracy loss.ConclusionsIt is evident that classification improves by having the most current knowledge at its disposal. Therefore, it is of utmost importance to make classifiers computationally tractable to keep up with the data deluge. The incremental learning classifier can be efficiently updated without the cost of reprocessing nor the access to the existing database and therefore save storage as well as computation resources.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202104245651975ZK.pdf 2512KB PDF download
  文献评价指标  
  下载次数:27次 浏览次数:8次