期刊论文

【摘要】

BackgroundText mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model.ResultsWe present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods.ConclusionsIn practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.

【授权许可】

CC BY
© Yu et al; licensee BioMed Central Ltd. 2010

【预览】

附件列表
Files	Size	Format	View
RO202311105121700ZK.pdf	3966KB	PDF	download

【参考文献】

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
[62]
[63]
[64]
[65]
[66]
[67]

BMC Bioinformatics
Gene prioritization and clustering by multi-view text mining
Methodology Article
Leon-Charles Tranchevent¹ Bart De Moor¹ Shi Yu¹ Yves Moreau¹
[1] Bioinformatics Group, Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B3001, Heverlee, Belgium;
关键词: Area Under Curve; Text Mining; Unify Medical Language System; Latent Semantic Indexing; Ensemble Cluster;
DOI : 10.1186/1471-2105-11-28
received in 2009-04-22, accepted in 2010-01-14, 发布年份 2010
来源: Springer
PDF


	文献评价指标
	下载次数：10次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【参考文献】