BMC Bioinformatics,2010年
Eric Van Wijngaerden, Yves Moreau, Ricardo J Camacho, Soo-Yon Rhee, Robert W Shafer, Koen Deforche, Gertjan Beheydt, Philippe Lemey, Kristof Theys, Kristel van Laethem, Anne-Mieke Vandamme
LicenseType:CC BY |
BackgroundFailure on Highly Active Anti-Retroviral Treatment is often accompanied with development of antiviral resistance to one or more drugs included in the treatment. In general, the virus is more likely to develop resistance to drugs with a lower genetic barrier. Previously, we developed a method to reverse engineer, from clinical sequence data, a fitness landscape experienced by HIV-1 under nelfinavir (NFV) treatment. By simulation of evolution over this landscape, the individualized genetic barrier to NFV resistance may be estimated for an isolate.ResultsWe investigated the association of estimated genetic barrier with risk of development of NFV resistance at virological failure, in 201 patients that were predicted fully susceptible to NFV at baseline, and found that a higher estimated genetic barrier was indeed associated with lower odds for development of resistance at failure (OR 0.62 (0.45 - 0.94), per additional mutation needed, p = .02).ConclusionsThus, variation in individualized genetic barrier to NFV resistance may impact effective treatment options available after treatment failure. If similar results apply for other drugs, then estimated genetic barrier may be a new clinical tool for choice of treatment regimen, which allows consideration of available treatment options after virological failure.
BMC Bioinformatics,2010年
Daniela Nitsch, Bart de Moor, Yves Moreau, Fabian Ojeda, Joana P Gonçalves
LicenseType:CC BY |
BackgroundDiscovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals.To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network.ResultsWe have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%.ConclusionIn this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.
BMC Bioinformatics,2010年
Leon-Charles Tranchevent, Bart De Moor, Shi Yu, Yves Moreau
LicenseType:CC BY |
BackgroundText mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model.ResultsWe present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods.ConclusionsIn practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.
BMC Bioinformatics,2010年
Yves Moreau, Bart de Moor, Fabian Ojeda, Joana P Gonçalves, Daniela Nitsch
英文
Background
Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals.
To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network.
Results
We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%.
Conclusion
In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.
BMC Bioinformatics,2010年
Yves Moreau, Bart De Moor, Johan AK Suykens, Leon-Charles Tranchevent, Anneleen Daemen, Tillmann Falck, Shi Yu
英文
Background
This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as L∞, L1, and L2 MKL. In particular, L2 MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing L∞ MKL method. In real biomedical applications, L2 MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources.
Results
We provide a theoretical analysis of the relationship between the L2 optimization of kernels in the dual problem with the L2 coefficient regularization in the primal problem. Understanding the dual L2 problem grants a unified view on MKL and enables us to extend the L2 method to a wide range of machine learning problems. We implement L2 MKL for ranking and classification problems and compare its performance with the sparse L∞ and the averaging L1 MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. L2 MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel L2 MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing.
Conclusions
This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in L∞ MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing L2 kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL.
Availability
The MATLAB code of algorithms implemented in this paper is downloadable from http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html webcite.
BMC Bioinformatics,2010年
Anne-Mieke Vandamme, Eric Van Wijngaerden, Robert W Shafer, Soo-Yon Rhee, Ricardo J Camacho, Philippe Lemey, Kristel van Laethem, Yves Moreau, Gertjan Beheydt, Koen Deforche, Kristof Theys
英文
Background
Failure on Highly Active Anti-Retroviral Treatment is often accompanied with development of antiviral resistance to one or more drugs included in the treatment. In general, the virus is more likely to develop resistance to drugs with a lower genetic barrier. Previously, we developed a method to reverse engineer, from clinical sequence data, a fitness landscape experienced by HIV-1 under nelfinavir (NFV) treatment. By simulation of evolution over this landscape, the individualized genetic barrier to NFV resistance may be estimated for an isolate.
Results
We investigated the association of estimated genetic barrier with risk of development of NFV resistance at virological failure, in 201 patients that were predicted fully susceptible to NFV at baseline, and found that a higher estimated genetic barrier was indeed associated with lower odds for development of resistance at failure (OR 0.62 (0.45 - 0.94), per additional mutation needed, p = .02).
Conclusions
Thus, variation in individualized genetic barrier to NFV resistance may impact effective treatment options available after treatment failure. If similar results apply for other drugs, then estimated genetic barrier may be a new clinical tool for choice of treatment regimen, which allows consideration of available treatment options after virological failure.