BMC Genomics,2017年
Jonathan Shao, Brady Gaynor, Harry D. Dawson, Celine Chen, Joseph F. Urban
LicenseType:CC BY |
BackgroundThe use of swine in biomedical research has increased dramatically in the last decade. Diverse genomic- and proteomic databases have been developed to facilitate research using human and rodent models. Current porcine gene databases, however, lack the robust annotation to study pig models that are relevant to human studies and for comparative evaluation with rodent models. Furthermore, they contain a significant number of errors due to their primary reliance on machine-based annotation. To address these deficiencies, a comprehensive literature-based survey was conducted to identify certain selected genes that have demonstrated function in humans, mice or pigs.ResultsThe process identified 13,054 candidate human, bovine, mouse or rat genes/proteins used to select potential porcine homologs by searching multiple online sources of porcine gene information. The data in the Porcine Translational Research Database ((http://www.ars.usda.gov/Services/docs.htm?docid=6065) is supported by >5800 references, and contains 65 data fields for each entry, including >9700 full length (5′ and 3′) unambiguous pig sequences, >2400 real time PCR assays and reactivity information on >1700 antibodies. It also contains gene and/or protein expression data for >2200 genes and identifies and corrects 8187 errors (gene duplications artifacts, mis-assemblies, mis-annotations, and incorrect species assignments) for 5337 porcine genes.ConclusionsThis database is the largest manually curated database for any single veterinary species and is unique among porcine gene databases in regard to linking gene expression to gene function, identifying related gene pathways, and connecting data with other porcine gene databases. This database provides the first comprehensive description of three major Super-families or functionally related groups of proteins (Cluster of Differentiation (CD) Marker genes, Solute Carrier Superfamily, ATP binding Cassette Superfamily), and a comparative description of porcine microRNAs.
BMC Genomics,2017年
Teshome Dagne Mulugeta, Simen Rød Sandve, Sigbjørn Lien, Matthew Peter Kent, Dag Inge Våge, Jeevan Karloss Antony Samy, Torfinn Nome, Fabian Grammes
LicenseType:CC BY |
BackgroundSalmonids are ray-finned fishes which constitute 11 genera and at least 70 species including Atlantic salmon, whitefishes, graylings, rainbow trout, and char. The common ancestor of all Salmonidae experienced a whole genome duplication (WGD) ~80 million years ago, resulting in an autotetraploid genome. Genomic rediplodization is still going on in salmonid species, providing an unique system for studying evolutionary consequences of whole genome duplication. In recent years, high quality genome sequences of Atlantic salmon and Rainbow trout has been established, due to their scientific and commercial values. In this paper we introduce SalmoBase (http://www.salmobase.org/), a tool for making molecular resources for salmonids public available in a framework of visualizations and analytic tools.ResultsSalmoBase has been developed as a part of the ELIXIR.NO project. Currently, SalmoBase contains molecular resources for Atlantic salmon and Rainbow trout. Data can be accessed through BLAST, Genome Browser (GBrowse), Genetic Variation Browser (GVBrowse) and Gene Expression Browser (GEBrowse).ConclusionsTo the best of our knowledge, SalmoBase is the first database which integrates salmonids data and allow users to study salmonids in an integrated framework. The database and its tools (e.g., comparative genomics tools, synteny browsers) will be expanded as additional public resources describing other Salmonidae genomes become available.
BMC Genomics,2017年
Rongxin Zhang, Zhipeng Su, Jing Zuo, Xing Huang, Xiaohuan Jia, Longsheng Xu, Linhong Zhao, Jian Li, Hanhan Chen, Wei Xie
LicenseType:CC BY |
BackgroundNanobodies are single-domain antibodies that contain the unique structural and functional properties of naturally-occurring heavy chain in camelidae. As a novel class of antibody, they show many advantages compared with traditional antibodies such as smaller size, higher stability, improved specificity, more easily expressed in microorganisms. These unusual hallmarks make them as promising tools in basic research and clinical practice. Although thousands of nanobodies are known to be published, no single database provides searchable, unified annotation and integrative analysis tools for these various nanobodies.ResultsHere, we present the database of Institute Collection and Analysis of Nanobodies (iCAN). It is built for the aim that addressing the above gap to expand and accelerate the nanobody research. iCAN, as the first database of nanobody, contains the most comprehensive information to date on nanobodies and related antigens. So far, iCAN incorporates 2391 entries which include 2131 from patents and 260 from publications and provides a simple user interface for researchers to retrieve and view the detailed information of nanobodies. In addition to the data collection, iCAN also provides online bioinformatic tools for sequence analysis and characteristic feature extraction.ConclusionsIn summary, iCAN enables researchers to analyze nanobody features and explore the applications of nanobodies more efficiently. iCAN is freely available at http://ican.ils.seu.edu.cn.
4 Co-expressed Pathways DataBase for Tomato: a database to predict pathways relevant to a query gene [期刊论文]
BMC Genomics,2017年
Hiroyuki Ohta, Takeshi Obayashi, Takafumi Narise, Nozomu Sakurai, Daisuke Shibata
LicenseType:CC BY |
BackgroundGene co-expression, the similarity of gene expression profiles under various experimental conditions, has been used as an indicator of functional relationships between genes, and many co-expression databases have been developed for predicting gene functions. These databases usually provide users with a co-expression network and a list of strongly co-expressed genes for a query gene. Several of these databases also provide functional information on a set of strongly co-expressed genes (i.e., provide biological processes and pathways that are enriched in these strongly co-expressed genes), which is generally analyzed via over-representation analysis (ORA). A limitation of this approach may be that users can predict gene functions only based on the strongly co-expressed genes.ResultsIn this study, we developed a new co-expression database that enables users to predict the function of tomato genes from the results of functional enrichment analyses of co-expressed genes while considering the genes that are not strongly co-expressed. To achieve this, we used the ORA approach with several thresholds to select co-expressed genes, and performed gene set enrichment analysis (GSEA) applied to a ranked list of genes ordered by the co-expression degree. We found that internal correlation in pathways affected the significance levels of the enrichment analyses. Therefore, we introduced a new measure for evaluating the relationship between the gene and pathway, termed the percentile (p)-score, which enables users to predict functionally relevant pathways without being affected by the internal correlation in pathways. In addition, we evaluated our approaches using receiver operating characteristic curves, which concluded that the p-score could improve the performance of the ORA.ConclusionsWe developed a new database, named Co-expressed Pathways DataBase for Tomato, which is available at http://cox-path-db.kazusa.or.jp/tomato. The database allows users to predict pathways that are relevant to a query gene, which would help to infer gene functions.
BMC Genomics,2017年
Yaohua Hu, Ching Hei Ho, Ling Ming Tsang, Xiaosen Jiang, Ricky Wai Tak Leung, Ka Yan Ma, Lefei Yi, Jing Qin, Ka Hou Chu
LicenseType:CC BY |
BackgroundCrustacea, the second largest subphylum of Arthropoda, includes species of major ecological and economic importance, such as crabs, lobsters, crayfishes, shrimps, and barnacles. With the rapid development of crustacean aquaculture and biodiversity loss, understanding the gene regulatory mechanisms of growth, reproduction, and development of crustaceans is crucial to both aquaculture development and biodiversity conservation of this group of organisms. In these biological processes, transcription factors (TFs) play a vital role in regulating gene expression. However, crustacean transcription factors are still largely unknown, because the lack of complete genome sequences of most crustacean species hampers the studies on their transcriptional regulation on a system-wide scale. Thus, the current TF databases derived from genome sequences contain TF information for only a few crustacean species and are insufficient to elucidate the transcriptional diversity of such a large animal group.ResultsOur database CrusTF (http://qinlab.sls.cuhk.edu.hk/CrusTF) provides comprehensive information for evolutionary and functional studies on the crustacean transcriptional regulatory system. CrusTF fills the knowledge gap of transcriptional regulation in crustaceans by exploring publicly available and newly sequenced transcriptomes of 170 crustacean species and identifying 131,941 TFs within 63 TF families. CrusTF features three categories of information: sequence, function, and evolution of crustacean TFs. The database enables searching, browsing and downloading of crustacean TF sequences. CrusTF infers DNA binding motifs of crustacean TFs, thus facilitating the users to predict potential downstream TF targets. The database also presents evolutionary analyses of crustacean TFs, which improve our understanding of the evolution of transcriptional regulatory systems in crustaceans.ConclusionsGiven the importance of TF information in evolutionary and functional studies on transcriptional regulatory systems of crustaceans, this database will constitute a key resource for the research community of crustacean biology and evolutionary biology. Moreover, CrusTF serves as a model for the construction of TF database derived from transcriptome data. A similar approach could be applied to other groups of organisms, for which transcriptomes are more readily available than genomes.
BMC Genomics,2017年
Emmanuel Gaquerel, Zhihao Ling, Thomas Brockmöller, Dapeng Li, Ian T. Baldwin, Shuqing Xu
LicenseType:CC BY |
BackgroundNicotiana attenuata (coyote tobacco) is an ecological model for studying plant-environment interactions and plant gene function under real-world conditions. During the last decade, large amounts of genomic, transcriptomic and metabolomic data have been generated with this plant which has provided new insights into how native plants interact with herbivores, pollinators and microbes. However, an integrative and open access platform that allows for the efficient mining of these -omics data remained unavailable until now.DescriptionWe present the Nicotiana attenuata Data Hub (NaDH) as a centralized platform for integrating and visualizing genomic, phylogenomic, transcriptomic and metabolomic data in N. attenuata. The NaDH currently hosts collections of predicted protein coding sequences of 11 plant species, including two recently sequenced Nicotiana species, and their functional annotations, 222 microarray datasets from 10 different experiments, a transcriptomic atlas based on 20 RNA-seq expression profiles and a metabolomic atlas based on 895 metabolite spectra analyzed by mass spectrometry. We implemented several visualization tools, including a modified version of the Electronic Fluorescent Pictograph (eFP) browser, co-expression networks and the Interactive Tree Of Life (iTOL) for studying gene expression divergence among duplicated homologous. In addition, the NaDH allows researchers to query phylogenetic trees of 16,305 gene families and provides tools for analyzing their evolutionary history. Furthermore, we also implemented tools to identify co-expressed genes and metabolites, which can be used for predicting the functions of genes. Using the transcription factor NaMYB8 as an example, we illustrate that the tools and data in NaDH can facilitate identification of candidate genes involved in the biosynthesis of specialized metabolites.ConclusionThe NaDH provides interactive visualization and data analysis tools that integrate the expression and evolutionary history of genes in Nicotiana, which can facilitate rapid gene discovery and comparative genomic analysis. Because N. attenuata shares many genome-wide features with other Nicotiana species including cultivated tobacco, and hence NaDH can be a resource for exploring the function and evolution of genes in Nicotiana species in general. The NaDH can be accessed at: http://nadh.ice.mpg.de/.