• 已选条件:
  • × Jian Wang
  • × BMC Bioinformatics
 全选  【符合条件的数据共:17条】

BMC Bioinformatics,2020年

Junbin Fang, Xiaoqin Yang, Weipeng Hu, Lin Fang, Yong Zhang, Huixin Xu, Huanming Yang, Jia Ye, Zijian Zhao, Jian Wang, Jiayin Wang, Yongsheng Chen, Weiqiang Sun, Jing Yan, Yun Cheng

LicenseType:Unknown |

预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

BMC Bioinformatics,2018年

Jian Wang, Yan Wang, Shaowu Zhang, Lishuang Li, Hongfei Lin, Xiwei Tang

LicenseType:Unknown |

预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

BMC Bioinformatics,2017年

Zhihao Yang, Ling Luo, Yijia Zhang, Jian Wang, Zhehuan Zhao, Hongfei Lin, Zhengguang Li, Wei Zheng

LicenseType:CC BY |

预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

BackgroundDrug-drug interactions (DDIs) often bring unexpected side effects. The clinical recognition of DDIs is a crucial issue for both patient safety and healthcare cost control. However, although text-mining-based systems explore various methods to classify DDIs, the classification performance with regard to DDIs in long and complex sentences is still unsatisfactory.MethodsIn this study, we propose an effective model that classifies DDIs from the literature by combining an attention mechanism and a recurrent neural network with long short-term memory (LSTM) units. In our approach, first, a candidate-drug-oriented input attention acting on word-embedding vectors automatically learns which words are more influential for a given drug pair. Next, the inputs merging the position- and POS-embedding vectors are passed to a bidirectional LSTM layer whose outputs at the last time step represent the high-level semantic information of the whole sentence. Finally, a softmax layer performs DDI classification.ResultsExperimental results from the DDIExtraction 2013 corpus show that our system performs the best with respect to detection and classification (84.0% and 77.3%, respectively) compared with other state-of-the-art methods. In particular, for the Medline-2013 dataset with long and complex sentences, our F-score far exceeds those of top-ranking systems by 12.6%.ConclusionsOur approach effectively improves the performance of DDI classification tasks. Experimental analysis demonstrates that our model performs better with respect to recognizing not only close-range but also long-range patterns among words, especially for long, complex and compound sentences.

    BMC Bioinformatics,2015年

    Feng Ying Yu, Yuan Yuan Sun, Zhi Hao Yang, Hong Fei Lin, Jian Wang, Xiao Hua Hu

    LicenseType:Unknown |

    预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

    BackgroundRevealing protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein interactions, which makes it possible to predict protein complexes from protein-protein interaction (PPI) networks. However, the small amount of known physical interactions may limit protein complex detection.MethodsThe new PPI networks are constructed by integrating PPI datasets with the large and readily available PPI data from biomedical literature, and then the less reliable PPI between two proteins are filtered out based on semantic similarity and topological similarity of the two proteins. Finally, the supervised learning protein complex detection (SLPC), which can make full use of the information of available known complexes, is applied to detect protein complex on the new PPI networks.ResultsThe experimental results of SLPC on two different categories yeast PPI networks demonstrate effectiveness of the approach: compared with the original PPI networks, the best average improvements of 4.76, 6.81 and 15.75 percentage units in the F-score, accuracy and maximum matching ratio (MMR) are achieved respectively; compared with the denoising PPI networks, the best average improvements of 3.91, 4.61 and 12.10 percentage units in the F-score, accuracy and MMR are achieved respectively; compared with ClusterONE, the start-of the-art complex detection method, on the denoising extended PPI networks, the average improvements of 26.02 and 22.40 percentage units in the F-score and MMR are achieved respectively.ConclusionsThe experimental results show that the performances of SLPC have a large improvement through integration of new receivable PPI data from biomedical literature into original PPI networks and denoising PPI networks. In addition, our protein complexes detection method can achieve better performance than ClusterONE.

      BMC Bioinformatics,2016年

      Zhihao Yang, Jian Wang, Yijia Zhang, Hongfei Lin

      LicenseType:CC BY |

      预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

      BackgroundRecently, high-throughput experimental techniques have generated a large amount of protein-protein interaction (PPI) data which can construct large complex PPI networks for numerous organisms. System biology attempts to understand cellular organization and function by analyzing these PPI networks. However, most studies still focus on static PPI networks which neglect the dynamic information of PPI.ResultsThe gene expression data under different time points and conditions can reveal the dynamic information of proteins. In this study, we used an active probability-based method to distinguish the active level of proteins at different active time points. We constructed dynamic probabilistic protein networks (DPPN) to integrate dynamic information of protein into static PPI networks. Based on DPPN, we subsequently proposed a novel method to identify protein complexes, which could effectively exploit topological structure as well as dynamic information of DPPN. We used three different yeast PPI datasets and gene expression data to construct three DPPNs. When applied to three DPPNs, many well-characterized protein complexes were accurately identified by this method.ConclusionThe shift from static PPI networks to dynamic PPI networks is essential to accurately identify protein complex. This method not only can be applied to identify protein complex, but also establish a framework to integrate dynamic information into static networks for other applications, such as pathway analysis.

        BMC Bioinformatics,2016年

        Bo Peng, Jian Wang, Xuan Zhu, Sanjay Shete

        LicenseType:CC BY |

        预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

        BackgroundNext-generation sequencing has been used by investigators to address a diverse range of biological problems through, for example, polymorphism and mutation discovery and microRNA profiling. However, compared to conventional sequencing, the error rates for next-generation sequencing are often higher, which impacts the downstream genomic analysis. Recently, Wang et al. (BMC Bioinformatics 13:185, 2012) proposed a shadow regression approach to estimate the error rates for next-generation sequencing data based on the assumption of a linear relationship between the number of reads sequenced and the number of reads containing errors (denoted as shadows). However, this linear read-shadow relationship may not be appropriate for all types of sequence data. Therefore, it is necessary to estimate the error rates in a more reliable way without assuming linearity. We proposed an empirical error rate estimation approach that employs cubic and robust smoothing splines to model the relationship between the number of reads sequenced and the number of shadows.ResultsWe performed simulation studies using a frequency-based approach to generate the read and shadow counts directly, which can mimic the real sequence counts data structure. Using simulation, we investigated the performance of the proposed approach and compared it to that of shadow linear regression. The proposed approach provided more accurate error rate estimations than the shadow linear regression approach for all the scenarios tested. We also applied the proposed approach to assess the error rates for the sequence data from the MicroArray Quality Control project, a mutation screening study, the Encyclopedia of DNA Elements project, and bacteriophage PhiX DNA samples.ConclusionsThe proposed empirical error rate estimation approach does not assume a linear relationship between the error-free read and shadow counts and provides more accurate estimations of error rates for next-generation, short-read sequencing data.