全部资源

1 Epigenome overlap measure (EPOM) for comparing tissue/cell types based on chromatin states [期刊论文]

BMC Genomics,2016年

Zahra S. Razaee, Wei Vivian Li, Jingyi Jessica Li

LicenseType:CC BY |

摘要
图表
参考文献

BackgroundThe dynamics of epigenomic marks in their relevant chromatin states regulate distinct gene expression patterns, biological functions and phenotypic variations in biological processes. The availability of high-throughput epigenomic data generated by next-generation sequencing technologies allows a data-driven approach to evaluate the similarities and differences of diverse tissue and cell types in terms of epigenomic features. While ChromImpute has allowed for the imputation of large-scale epigenomic information to yield more robust data to capture meaningful relationships between biological samples, widely used methods such as hierarchical clustering and correlation analysis cannot adequately utilize epigenomic data to accurately reveal the distinction and grouping of different tissue and cell types.MethodsWe utilize a three-step testing procedure–ANOVA, t test and overlap test to identify tissue/cell-type- associated enhancers and promoters and to calculate a newly defined Epigenomic Overlap Measure (EPOM). EPOM results in a clear correspondence map of biological samples from different tissue and cell types through comparison of epigenomic marks evaluated in their relevant chromatin states.ResultsCorrespondence maps by EPOM show strong capability in distinguishing and grouping different tissue and cell types and reveal biologically meaningful similarities between Heart and Muscle, Blood & T-cell and HSC & B-cell, Brain and Neurosphere, etc. The gene ontology enrichment analysis both supports and explains the discoveries made by EPOM and suggests that the associated enhancers and promoters demonstrate distinguishable functions across tissue and cell types. Moreover, the tissue/cell-type-associated enhancers and promoters show enrichment in the disease-related SNPs that are also associated with the corresponding tissue or cell types. This agreement suggests the potential of identifying causal genetic variants relevant to cell-type-specific diseases from our identified associated enhancers and promoters.ConclusionsThe proposed EPOM measure demonstrates superior capability in grouping and finding a clear correspondence map of biological samples from different tissue and cell types. The identified associated enhancers and promoters provide a comprehensive catalog to study distinct biological processes and disease variants in different tissue and cell types. Our results also find that the associated promoters exhibit more cell-type-specific functions than the associated enhancers do, suggesting that the non-associated promoters have more housekeeping functions than the non-associated enhancers.

连接1

2 SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites [期刊论文]

BMC Genomics,2016年

Van-Minh Bui, Cheng-Tsung Lu, Julia Tzu-Ya Weng, Tzong-Yi Lee, Shun-Long Weng, Tzu-Hao Chang

LicenseType:CC BY |

预览 | 原文链接 | 全文 [ 浏览：0 下载：0 ]

摘要
图表
参考文献

BackgroundProtein S-sulfenylation is a type of post-translational modification (PTM) involving the covalent binding of a hydroxyl group to the thiol of a cysteine amino acid. Recent evidence has shown the importance of S-sulfenylation in various biological processes, including transcriptional regulation, apoptosis and cytokine signaling. Determining the specific sites of S-sulfenylation is fundamental to understanding the structures and functions of S-sulfenylated proteins. However, the current lack of reliable tools often limits researchers to use expensive and time-consuming laboratory techniques for the identification of S-sulfenylation sites. Thus, we were motivated to develop a bioinformatics method for investigating S-sulfenylation sites based on amino acid compositions and physicochemical properties.ResultsIn this work, physicochemical properties were utilized not only to identify S-sulfenylation sites from 1,096 experimentally verified S-sulfenylated proteins, but also to compare the effectiveness of prediction with other characteristics such as amino acid composition (AAC), amino acid pair composition (AAPC), solvent-accessible surface area (ASA), amino acid substitution matrix (BLOSUM62), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM). Various prediction models were built using support vector machine (SVM) and evaluated by five-fold cross-validation. The model constructed from hybrid features, including PSSM and physicochemical properties, yielded the best performance with sensitivity, specificity, accuracy and MCC measurements of 0.746, 0.737, 0.738 and 0.337, respectively. The selected model also provided a promising accuracy (0.693) on an independent testing dataset. Additionally, we employed TwoSampleLogo to help discover the difference of amino acid composition among S-sulfenylation, S-glutathionylation and S-nitrosylation sites.ConclusionThis work proposed a computational method to explore informative features and functions for protein S-sulfenylation. Evaluation by five-fold cross validation indicated that the selected features were effective in the identification of S-sulfenylation sites. Moreover, the independent testing results demonstrated that the proposed method could provide a feasible means for conducting preliminary analyses of protein S-sulfenylation. We also anticipate that the uncovered differences in amino acid composition may facilitate future studies of the extensive crosstalk among S-sulfenylation, S-glutathionylation and S-nitrosylation.

连接1

3 RDDpred: a condition-specific RNA-editing prediction model from RNA-seq data [期刊论文]

BMC Genomics,2016年

Min-su Kim, Benjamin Hur, Sun Kim

LicenseType:CC BY |

预览 | 原文链接 | 全文 [ 浏览：0 下载：0 ]

摘要
图表
参考文献

BackgroundRNA-editing is an important post-transcriptional RNA sequence modification performed by two catalytic enzymes, "ADAR"(A-to-I) and "APOBEC"(C-to-U). By utilizing high-throughput sequencing technologies, the biological function of RNA-editing has been actively investigated. Currently, RNA-editing is considered to be a key regulator that controls various cellular functions, such as protein activity, alternative splicing pattern of mRNA, and substitution of miRNA targeting site. DARNED, a public RDD database, reported that there are more than 300-thousands RNA-editing sites detected in human genome(hg19). Moreover, multiple studies suggested that RNA-editing events occur in highly specific conditions. According to DARNED, 97.62 % of registered editing sites were detected in a single tissue or in a specific condition, which also supports that the RNA-editing events occur condition-specifically. Since RNA-seq can capture the whole landscape of transcriptome, RNA-seq is widely used for RDD prediction. However, significant amounts of false positives or artefacts can be generated when detecting RNA-editing from RNA-seq. Since it is difficult to perform experimental validation at the whole-transcriptome scale, there should be a powerful computational tool to distinguish true RNA-editing events from artefacts.ResultWe developed RDDpred, a Random Forest RDD classifier. RDDpred reports potentially true RNA-editing events from RNA-seq data. RDDpred was tested with two publicly available RNA-editing datasets and successfully reproduced RDDs reported in the two studies (90 %, 95 %) while rejecting false-discoveries (NPV: 75 %, 84 %).ConclusionRDDpred automatically compiles condition-specific training examples without experimental validations and then construct a RDD classifier. As far as we know, RDDpred is the very first machine-learning based automated pipeline for RDD prediction. We believe that RDDpred will be very useful and can contribute significantly to the study of condition-specific RNA-editing. RDDpred is available at http://biohealth.snu.ac.kr/software/RDDpred.

连接1

4 Identifying micro-inversions using high-throughput sequencing reads [期刊论文]

BMC Genomics,2016年

Yang Li, Jian Ma, Yu-Hang Tang, Huaiqiu Zhu, Feifei He

LicenseType:CC BY |

预览 | 原文链接 | 全文 [ 浏览：0 下载：0 ]

摘要
图表
参考文献

BackgroundThe identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads.ResultsThe algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp.ConclusionsTo our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID.

连接1

5 Transcriptome sequencing based annotation and homologous evidence based scaffolding of Anguilla japonica draft genome [期刊论文]

BMC Genomics,2016年

Max A. Alekseyev, Sergey Aganezov, Chung-Der Hsiao, Chih-Hung Chou, Yu-Hung Chen, Guan-Jay Lyu, Wei-Yun Huang, Yu-Chen Liu, Sheng-Da Hsu, Shao-Zhen Huang, Chia-Yu Liu, Hsien-Da Huang

LicenseType:Unknown |

预览 | 原文链接 | 全文 [ 浏览：0 下载：0 ]

摘要
图表
参考文献

BackgroundAnguilla japonica (Japanese eel) is currently one of the most important research subjects in eastern Asia aquaculture. Enigmatic life cycle of the organism makes study of artificial reproduction extremely limited. Henceforth genomic and transcriptomic resources of eels are urgently needed to help solving the problems surrounding this organism across multiple fields. We hereby provide a reconstructed transcriptome from deep sequencing of juvenile (glass eels) whole body samples. The provided expressed sequence tags were used to annotate the currently available draft genome sequence. Homologous information derived from the annotation result was applied to improve the group of scaffolds into available linkage groups.ResultsWith the transcriptome sequence data combined with publicly available expressed sequence tags evidences, 18,121 genes were structurally and functionally annotated on the draft genome. Among them, 3,921 genes were located in the 19 linkage groups. 137 scaffolds covering 13 million bases were grouped into the linkage groups in additional to the original partial linkage groups, increasing the linkage group coverage from 13 to 14 %.ConclusionsThis annotation provide information of the coding regions of the genes supported by transcriptome based evidence. The derived homologous evidences pave the way for phylogenetic analysis of important genetic traits and the improvement of the genome assembly.

连接1

6 Comprehensive prediction of lncRNA–RNA interactions in human transcriptome [期刊论文]

BMC Genomics,2016年

Tomoshi Kameda, Michiaki Hamada, Junichi Iwakiri, Kiyoshi Asai, Goro Terai

LicenseType:CC BY |

预览 | 原文链接 | 全文 [ 浏览：0 下载：0 ]

摘要
图表
参考文献

MotivationRecent studies have revealed that large numbers of non-coding RNAs are transcribed in humans, but only a few of them have been identified with their functions. Identification of the interaction target RNAs of the non-coding RNAs is an important step in predicting their functions. The current experimental methods to identify RNA–RNA interactions, however, are not fast enough to apply to a whole human transcriptome. Therefore, computational predictions of RNA–RNA interactions are desirable, but this is a challenging task due to the huge computational costs involved.ResultsHere, we report comprehensive predictions of the interaction targets of lncRNAs in a whole human transcriptome for the first time. To achieve this, we developed an integrated pipeline for predicting RNA–RNA interactions on the K computer, which is one of the fastest super-computers in the world. Comparisons with experimentally-validated lncRNA–RNA interactions support the quality of the predictions. Additionally, we have developed a database that catalogs the predicted lncRNA–RNA interactions to provide fundamental information about the targets of lncRNAs.

连接1