全部资源

1 Problems with the nested granularity of feature domains in bioinformatics: the eXtasy case [期刊论文]

BMC Bioinformatics,2015年

Jesse Davis, Bart De Moor, Yves Moreau, Dusan Popovic, Alejandro Sifrim

LicenseType:CC BY |

摘要
图表
参考文献

BackgroundData from biomedical domains often have an inherit hierarchical structure. As this structure is usually implicit, its existence can be overlooked by practitioners interested in constructing and evaluating predictive models from such data. Ignoring these constructs leads to potentially problematic and the routinely unrecognized bias in the models and results. In this work, we discuss this bias in detail and propose a simple, sampling-based solution for it. Next, we explore its sources and extent on synthetic data. Finally, we demonstrate how the state-of-the-art variant prioritization framework, eXtasy, benefits from using the described approach in its Random forest-based core classification model.Results and conclusionsThe conducted simulations clearly indicate that the heterogeneous granularity of feature domains poses significant problems for both the standard Random forest classifier and a modification that relies on stratified bootstrapping. Conversely, using the proposed sampling scheme when training the classifier mitigates the described bias. Furthermore, when applied to the eXtasy data under a realistic class distribution scenario, a Random forest learned using the proposed sampling scheme displays much better precision that its standard version, without degrading recall. Moreover, the largest performance gains are achieved in the most important part of the operating range: the top of prioritized gene list.

连接1

2 Highlights from the 11th ISCB Student Council Symposium 2015 [期刊论文]

BMC Bioinformatics,2016年

Pieter Meysman, Kris Laukens, Bart Cuypers, Przemysław Szałaj, Dariusz Plewczynski, Fatemeh Vafaee, Daniel J. Fazakerley, James R. Krycer, Rima Chaudhuri, David E. James, Westa Domanova, Zdenka Kuncic, M. Michael Gromiha, C. Ramakrishnan, Nagarajan Raju, Sonia Pankaj Chothani, Dan DeBlasio, Giulia Fiscon, Lieven Thorrez, Yves Moreau, Griet Laenen, Margherita Francescatto, Sean J. Humphrey, Jakob Jespersen, Alexander Junge, Masakazu Sekijima, Giovanni Felici, Paola Bertolazzi, Emanuel Weitschek, Massimo Ciccozzi, Bart Cuypers, Maya Berg, Hideo Imamura, Jean-Claude Dujardin, Manu Vanaerschot, Pengyi Yang, Katie Wilkins, R. Gonzalo Parra, Mehedi Hassan, Farzana Rahman, Sander Willems, Nigel J Burroughs, Paddy J Slator, Zhonghui Tang, Oskar Luo, Paul Michalski, Xingwang Li, Yijun Ruan, Anupama Jigisha

LicenseType:CC BY |

预览 | 原文链接 | 全文 [ 浏览：0 下载：0 ]

摘要
图表
参考文献

contentsA1 Highlights from the eleventh ISCB Student Council Symposium 2015Katie Wilkins, Mehedi Hassan, Margherita Francescatto, Jakob Jespersen, R. Gonzalo Parra, Bart Cuypers, Dan DeBlasio, Alexander Junge, Anupama Jigisha, Farzana RahmanO1 Prioritizing a drug’s targets using both gene expression and structural similarityGriet Laenen, Sander Willems, Lieven Thorrez, Yves MoreauO2 Organism specific protein-RNA recognition: A computational analysis of protein-RNA complex structures from different organismsNagarajan Raju, Sonia Pankaj Chothani, C. Ramakrishnan, Masakazu Sekijima; M. Michael GromihaO3 Detection of Heterogeneity in Single Particle Tracking TrajectoriesPaddy J Slator, Nigel J BurroughsO4 3D-NOME: 3D NucleOme Multiscale Engine for data-driven modeling of three-dimensional genome architecturePrzemysław Szałaj, Zhonghui Tang, Paul Michalski, Oskar Luo, Xingwang Li, Yijun Ruan, Dariusz PlewczynskiO5 A novel feature selection method to extract multiple adjacent solutions for viral genomic sequences classificationGiulia Fiscon, Emanuel Weitschek, Massimo Ciccozzi, Paola Bertolazzi, Giovanni FeliciO6 A Systems Biology Compendium for Leishmania donovaniBart Cuypers, Pieter Meysman, Manu Vanaerschot, Maya Berg, Hideo Imamura, Jean-Claude Dujardin, Kris LaukensO7 Unravelling signal coordination from large scale phosphorylation kinetic dataWesta Domanova, James R. Krycer, Rima Chaudhuri, Pengyi Yang, Fatemeh Vafaee, Daniel J. Fazakerley, Sean J. Humphrey, David E. James, Zdenka Kuncic

连接1

3 Predicting receptor-ligand pairs through kernel learning [期刊论文]

BMC Bioinformatics,2011年

Bart De Moor, Ernesto Iacucci, Fabian Ojeda, Yves Moreau

LicenseType:Unknown |

预览 | 原文链接 | 全文 [ 浏览：0 下载：0 ]

摘要
图表
参考文献

BackgroundRegulation of cellular events is, often, initiated via extracellular signaling. Extracellular signaling occurs when a circulating ligand interacts with one or more membrane-bound receptors. Identification of receptor-ligand pairs is thus an important and specific form of PPI prediction.ResultsGiven a set of disparate data sources (expression data, domain content, and phylogenetic profile) we seek to predict new receptor-ligand pairs. We create a combined kernel classifier and assess its performance with respect to the Database of Ligand-Receptor Partners (DLRP) 'golden standard' as well as the method proposed by Gertz et al. Among our findings, we discover that our predictions for the tgfβ family accurately reconstruct over 76% of the supported edges (0.76 recall and 0.67 precision) of the receptor-ligand bipartite graph defined by the DLRP "golden standard". In addition, for the tgfβ family, the combined kernel classifier is able to relatively improve upon the Gertz et al. work by a factor of approximately 1.5 when considering that our method has an F-measure of 0.71 while that of Gertz et al. has a value of 0.48.ConclusionsThe prediction of receptor-ligand pairings is a difficult and complex task. We have demonstrated that using kernel learning on multiple data sources provides a stronger alternative to the existing method in solving this task.

连接1

4 Estimating the individualized HIV-1 genetic barrier to resistance using a nelfinavir fitness landscape [期刊论文]

BMC Bioinformatics,2010年

Eric Van Wijngaerden, Yves Moreau, Ricardo J Camacho, Soo-Yon Rhee, Robert W Shafer, Koen Deforche, Gertjan Beheydt, Philippe Lemey, Kristof Theys, Kristel van Laethem, Anne-Mieke Vandamme

LicenseType:CC BY |

预览 | 原文链接 | 全文 [ 浏览：0 下载：0 ]

摘要
图表
参考文献

BackgroundFailure on Highly Active Anti-Retroviral Treatment is often accompanied with development of antiviral resistance to one or more drugs included in the treatment. In general, the virus is more likely to develop resistance to drugs with a lower genetic barrier. Previously, we developed a method to reverse engineer, from clinical sequence data, a fitness landscape experienced by HIV-1 under nelfinavir (NFV) treatment. By simulation of evolution over this landscape, the individualized genetic barrier to NFV resistance may be estimated for an isolate.ResultsWe investigated the association of estimated genetic barrier with risk of development of NFV resistance at virological failure, in 201 patients that were predicted fully susceptible to NFV at baseline, and found that a higher estimated genetic barrier was indeed associated with lower odds for development of resistance at failure (OR 0.62 (0.45 - 0.94), per additional mutation needed, p = .02).ConclusionsThus, variation in individualized genetic barrier to NFV resistance may impact effective treatment options available after treatment failure. If similar results apply for other drugs, then estimated genetic barrier may be a new clinical tool for choice of treatment regimen, which allows consideration of available treatment options after virological failure.

连接1

5 Candidate gene prioritization by network analysis of differential expression using machine learning approaches [期刊论文]

BMC Bioinformatics,2010年

Daniela Nitsch, Bart de Moor, Yves Moreau, Fabian Ojeda, Joana P Gonçalves

LicenseType:CC BY |

预览 | 原文链接 | 全文 [ 浏览：0 下载：0 ]

摘要
图表
参考文献

BackgroundDiscovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals.To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network.ResultsWe have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%.ConclusionIn this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.

连接1

6 Gene prioritization and clustering by multi-view text mining [期刊论文]

BMC Bioinformatics,2010年

Leon-Charles Tranchevent, Bart De Moor, Shi Yu, Yves Moreau

LicenseType:CC BY |

预览 | 原文链接 | 全文 [ 浏览：0 下载：0 ]

摘要
图表
参考文献

BackgroundText mining has become a useful tool for biologists trying to understand the genetics of diseases. In particular, it can help identify the most interesting candidate genes for a disease for further experimental analysis. Many text mining approaches have been introduced, but the effect of disease-gene identification varies in different text mining models. Thus, the idea of incorporating more text mining models may be beneficial to obtain more refined and accurate knowledge. However, how to effectively combine these models still remains a challenging question in machine learning. In particular, it is a non-trivial issue to guarantee that the integrated model performs better than the best individual model.ResultsWe present a multi-view approach to retrieve biomedical knowledge using different controlled vocabularies. These controlled vocabularies are selected on the basis of nine well-known bio-ontologies and are applied to index the vast amounts of gene-based free-text information available in the MEDLINE repository. The text mining result specified by a vocabulary is considered as a view and the obtained multiple views are integrated by multi-source learning algorithms. We investigate the effect of integration in two fundamental computational disease gene identification tasks: gene prioritization and gene clustering. The performance of the proposed approach is systematically evaluated and compared on real benchmark data sets. In both tasks, the multi-view approach demonstrates significantly better performance than other comparing methods.ConclusionsIn practical research, the relevance of specific vocabulary pertaining to the task is usually unknown. In such case, multi-view text mining is a superior and promising strategy for text-based disease gene identification.

连接1