科技报告

【摘要】

We consider the supervised learning problem of assigning test influenza sequences to their correct group (where the group is the host species), Assume that training cases (influenza sequences and their group labels) are available, usually via estimates of phylogenetic (evolutional) trees as a special case of unsupervised learning. We compare three supervised learning methods: (1) a published signature pattern analysis (VESPA) approach; (2) an unpublished Bayesian approach that assumes sites are independent, and (3) a nearest-neighbor approach with flexible evolutionary distance measures. Although the Bayesian approach has the attractive feature of reporting estimated probabilities for each group for each test sequence, those proprobabilities are somewhat suspect because of the site-independence assumption that is difficult to remove. We investigate the impact of this independence assumption and show that it can be conservative or anti-conservative (meaning that it leads to either overstating or understating the separability of the groups). The VESPA approach also assumes site independence, but it has the advantage of allowing for dependence among sequence scores in a way that can easily be estimated, as we illustrate. Finally, the distance-based method is always a strong contender, and especially in this case because of the ease of incorporating evolutionary models into the distance measure. All three methods are of potential use on similar problems, with no single method emerging as the clear winner. We compare conclusions under the three approaches for sequences from the Nucleoprotein (NP) gene of the human influenza RNA virus from three host species. This data is available from the influenza database maintained at Los Alamos National Laboratory (http:/71inker.lanL gov u/searchJrarne. html).

【预览】

附件列表
Files	Size	Format	View
DE2001763365.pdf	570KB	PDF	download


Comparison of Signature Pattern Analysis Methods in Molecular Epidemiology.

Burr, T. ; Charlton, W. ; Stanbro, W.
Technical Information Center Oak Ridge Tennessee
关键词: Epidemiology; Medicine; Molecules; Molecular biology; Signature pattern analysis;
RP-ID : DE2001763365
学科分类：工程和技术（综合）
美国\|英语
来源: National Technical Reports Library
PDF


	文献评价指标
	下载次数：14次	浏览次数：24次

【 摘 要 】

【 预 览 】

【摘要】

【预览】