• 已选条件:
  • × Hongyu Zhao
  • × BMC Bioinformatics
  • × Article
 全选  【符合条件的数据共:12条】

BMC Bioinformatics,2017年

Hongyu Zhao, David F. Stern, Michael I. Klein

LicenseType:CC BY |

预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

BackgroundPersonalizing treatment regimes based on gene expression profiles of individual tumors will facilitate management of cancer. Although many methods have been developed to identify pathways perturbed in tumors, the results are often not generalizable across independent datasets due to the presence of platform/batch effects. There is a need to develop methods that are robust to platform/batch effects and able to identify perturbed pathways in individual samples.ResultsWe present Gene-Ranking Analysis of Pathway Expression (GRAPE) as a novel method to identify abnormal pathways in individual samples that is robust to platform/batch effects in gene expression profiles generated by multiple platforms. GRAPE first defines a template consisting of an ordered set of pathway genes to characterize the normative state of a pathway based on the relative rankings of gene expression levels across a set of reference samples. This template can be used to assess whether a sample conforms to or deviates from the typical behavior of the reference samples for this pathway. We demonstrate that GRAPE performs well versus existing methods in classifying tissue types within a single dataset, and that GRAPE achieves superior robustness and generalizability across different datasets. A powerful feature of GRAPE is the ability to represent individual gene expression profiles as a vector of pathways scores. We present applications to the analyses of breast cancer subtypes and different colonic diseases. We perform survival analysis of several TCGA subtypes and find that GRAPE pathway scores perform well in comparison to other methods.ConclusionsGRAPE templates offer a novel approach for summarizing the behavior of gene-sets across a collection of gene expression profiles. These templates offer superior robustness across distinct experimental batches compared to existing methods. GRAPE pathway scores enable identification of abnormal gene-set behavior in individual samples using a non-competitive approach that is fundamentally distinct from popular enrichment-based methods. GRAPE may be an appropriate tool for researchers seeking to identify individual samples displaying abnormal gene-set behavior as well as to explore differences in the consensus gene-set behavior of groups of samples. GRAPE is available in R for download at https://CRAN.R-project.org/package=GRAPE.

    BMC Bioinformatics,2017年

    Geoffrey L. Chupp, Jose Gomez, Lauren Cohn, Xiting Yan, Hongyu Zhao, Anqi Liang

    LicenseType:CC BY |

    预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

    BackgroundDistance based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. An alternative method to examine disease phenotypes is to use pre-defined biological pathways. These pathways have been shown to be perturbed in different ways in different subjects who have similar clinical features. We hypothesize that differences in the expressions of genes in a given pathway are more predictive of differences in biological differences compared to standard approaches and if integrated into clustering analysis will enhance the robustness and accuracy of the clustering method. To examine this hypothesis, we developed a novel computational method to assess the biological differences between samples using gene expression data by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior.ResultsPre-defined biological pathways were downloaded and genes in each pathway were used to cluster samples using the Gaussian mixture model. The clustering results across different pathways were then summarized to calculate the pathway-based distance score between samples. This method was applied to both simulated and real data sets and compared to the traditional Euclidean distance and another pathway-based clustering method, Pathifier. The results show that the pathway-based distance score performs significantly better than the Euclidean distance, especially when the heterogeneity is low and genes in the same pathways are correlated. Compared to Pathifier, we demonstrated that our approach achieves higher accuracy and robustness for small pathways. When the pathway size is large, by downsampling the pathways into smaller pathways, our approach was able to achieve comparable performance.ConclusionsWe have developed a novel distance score that represents the biological differences between samples using gene expression data and pre-defined biological pathway information. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both simulated data and real data when compared to traditional methods. It also has comparable or better performance compared to Pathifier.

      BMC Bioinformatics,2016年

      Hongyu Zhao, Xiang Wan, Kai Dong, Tiejun Tong

      LicenseType:CC BY |

      预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

      BackgroundRNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (Annals Appl Stat 5:2493–2518, 2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as the negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than or equal to the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated.ResultsIn this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes’ rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze two real RNA-Seq data sets to demonstrate the advantages of our method in real-world applications.ConclusionsWe have developed a new classifier using the negative binomial model for RNA-seq data classification. Our simulation results show that our proposed classifier has a better performance than existing works. The proposed classifier can serve as an effective tool for classifying RNA-seq data. Based on the comparison results, we have provided some guidelines for scientists to decide which method should be used in the discriminant analysis of RNA-Seq data. R code is available at http://www.comp.hkbu.edu.hk/~xwan/NBLDA.Ror https://github.com/yangchadam/NBLDA

        BMC Bioinformatics,2011年

        Lisa M Chung, Wei Zheng, Hongyu Zhao

        LicenseType:Unknown |

        预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

        BackgroundHigh throughput sequencing technology provides us unprecedented opportunities to study transcriptome dynamics. Compared to microarray-based gene expression profiling, RNA-Seq has many advantages, such as high resolution, low background, and ability to identify novel transcripts. Moreover, for genes with multiple isoforms, expression of each isoform may be estimated from RNA-Seq data. Despite these advantages, recent work revealed that base level read counts from RNA-Seq data may not be randomly distributed and can be affected by local nucleotide composition. It was not clear though how the base level read count bias may affect gene level expression estimates.ResultsIn this paper, by using five published RNA-Seq data sets from different biological sources and with different data preprocessing schemes, we showed that commonly used estimates of gene expression levels from RNA-Seq data, such as reads per kilobase of gene length per million reads (RPKM), are biased in terms of gene length, GC content and dinucleotide frequencies. We directly examined the biases at the gene-level, and proposed a simple generalized-additive-model based approach to correct different sources of biases simultaneously. Compared to previously proposed base level correction methods, our method reduces bias in gene-level expression estimates more effectively.ConclusionsOur method identifies and corrects different sources of biases in gene-level expression measures from RNA-Seq data, and provides more accurate estimates of gene expression levels from RNA-Seq. This method should prove useful in meta-analysis of gene expression levels using different platforms or experimental protocols.

          BMC Bioinformatics,2011年

          Lisa M Chung, Wei Zheng, Hongyu Zhao

          LicenseType:Unknown |

          预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

          BackgroundHigh throughput sequencing technology provides us unprecedented opportunities to study transcriptome dynamics. Compared to microarray-based gene expression profiling, RNA-Seq has many advantages, such as high resolution, low background, and ability to identify novel transcripts. Moreover, for genes with multiple isoforms, expression of each isoform may be estimated from RNA-Seq data. Despite these advantages, recent work revealed that base level read counts from RNA-Seq data may not be randomly distributed and can be affected by local nucleotide composition. It was not clear though how the base level read count bias may affect gene level expression estimates.ResultsIn this paper, by using five published RNA-Seq data sets from different biological sources and with different data preprocessing schemes, we showed that commonly used estimates of gene expression levels from RNA-Seq data, such as reads per kilobase of gene length per million reads (RPKM), are biased in terms of gene length, GC content and dinucleotide frequencies. We directly examined the biases at the gene-level, and proposed a simple generalized-additive-model based approach to correct different sources of biases simultaneously. Compared to previously proposed base level correction methods, our method reduces bias in gene-level expression estimates more effectively.ConclusionsOur method identifies and corrects different sources of biases in gene-level expression measures from RNA-Seq data, and provides more accurate estimates of gene expression levels from RNA-Seq. This method should prove useful in meta-analysis of gene expression levels using different platforms or experimental protocols.

            BMC Bioinformatics,2017年

            Hongyu Zhao, David F. Stern, Michael I. Klein

            LicenseType:CC BY |

            预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

            BackgroundPersonalizing treatment regimes based on gene expression profiles of individual tumors will facilitate management of cancer. Although many methods have been developed to identify pathways perturbed in tumors, the results are often not generalizable across independent datasets due to the presence of platform/batch effects. There is a need to develop methods that are robust to platform/batch effects and able to identify perturbed pathways in individual samples.ResultsWe present Gene-Ranking Analysis of Pathway Expression (GRAPE) as a novel method to identify abnormal pathways in individual samples that is robust to platform/batch effects in gene expression profiles generated by multiple platforms. GRAPE first defines a template consisting of an ordered set of pathway genes to characterize the normative state of a pathway based on the relative rankings of gene expression levels across a set of reference samples. This template can be used to assess whether a sample conforms to or deviates from the typical behavior of the reference samples for this pathway. We demonstrate that GRAPE performs well versus existing methods in classifying tissue types within a single dataset, and that GRAPE achieves superior robustness and generalizability across different datasets. A powerful feature of GRAPE is the ability to represent individual gene expression profiles as a vector of pathways scores. We present applications to the analyses of breast cancer subtypes and different colonic diseases. We perform survival analysis of several TCGA subtypes and find that GRAPE pathway scores perform well in comparison to other methods.ConclusionsGRAPE templates offer a novel approach for summarizing the behavior of gene-sets across a collection of gene expression profiles. These templates offer superior robustness across distinct experimental batches compared to existing methods. GRAPE pathway scores enable identification of abnormal gene-set behavior in individual samples using a non-competitive approach that is fundamentally distinct from popular enrichment-based methods. GRAPE may be an appropriate tool for researchers seeking to identify individual samples displaying abnormal gene-set behavior as well as to explore differences in the consensus gene-set behavior of groups of samples. GRAPE is available in R for download at https://CRAN.R-project.org/package=GRAPE.