• 已选条件:
  • × Methodology
  • × 2017
 全选  【符合条件的数据共:561条】

2017年

Botha, Louisa, Grobbelaar, Sara

null | 英文

预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

BMC Bioinformatics,2017年

M. Krzystanek, Z. Szallasi, O. Pipek, A. Bodor, I. Csabai, D. Ribli, D. Szüts, J. Molnár, G. E. Tusnády, Á. Póti

LicenseType:CC BY |

预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

BackgroundDetection of somatic mutations is one of the main goals of next generation DNA sequencing. A wide range of experimental systems are available for the study of spontaneous or environmentally induced mutagenic processes. However, most of the routinely used mutation calling algorithms are not optimised for the simultaneous analysis of multiple samples, or for non-human experimental model systems with no reliable databases of common genetic variations. Most standard tools either require numerous in-house post filtering steps with scarce documentation or take an unpractically long time to run. To overcome these problems, we designed the streamlined IsoMut tool which can be readily adapted to experimental scenarios where the goal is the identification of experimentally induced mutations in multiple isogenic samples.MethodsUsing 30 isogenic samples, reliable cohorts of validated mutations were created for testing purposes. Optimal values of the filtering parameters of IsoMut were determined in a thorough and strict optimization procedure based on these test sets.ResultsWe show that IsoMut, when tuned correctly, decreases the false positive rate compared to conventional tools in a 30 sample experimental setup; and detects not only single nucleotide variations, but short insertions and deletions as well. IsoMut can also be run more than a hundred times faster than the most precise state of art tool, due its straightforward and easily understandable filtering algorithm.ConclusionsIsoMut has already been successfully applied in multiple recent studies to find unique, treatment induced mutations in sets of isogenic samples with very low false positive rates. These types of studies provide an important contribution to determining the mutagenic effect of environmental agents or genetic defects, and IsoMut turned out to be an invaluable tool in the analysis of such data.

    BMC Bioinformatics,2017年

    Rodolfo J. C. Cantet, Johannes W. R. Martini, Henner Simianer, Diercles F. Cardoso, Malena Erbe, Ning Gao, Valentin Wimmer

    LicenseType:CC BY |

    预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

    BackgroundEpistasis marker effect models incorporating products of marker values as predictor variables in a linear regression approach (extended GBLUP, EGBLUP) have been assessed as potentially beneficial for genomic prediction, but their performance depends on marker coding. Although this fact has been recognized in literature, the nature of the problem has not been thoroughly investigated so far.ResultsWe illustrate how the choice of marker coding implicitly specifies the model of how effects of certain allele combinations at different loci contribute to the phenotype, and investigate coding-dependent properties of EGBLUP. Moreover, we discuss an alternative categorical epistasis model (CE) eliminating undesired properties of EGBLUP and show that the CE model can improve predictive ability. Finally, we demonstrate that the coding-dependent performance of EGBLUP offers the possibility to incorporate prior experimental information into the prediction method by adapting the coding to already available phenotypic records on other traits.ConclusionBased on our results, for EGBLUP, a symmetric coding {−1,1} or {−1,0,1} should be preferred, whereas a standardization using allele frequencies should be avoided. Moreover, CE can be a valuable alternative since it does not possess the undesired theoretical properties of EGBLUP. However, which model performs best will depend on characteristics of the data and available prior information. Data from previous experiments can for instance be incorporated into the marker coding of EGBLUP.

      BMC Bioinformatics,2017年

      Joohon Sung, Hyung-Lae Kim, Han-Na Kim, Yangrae Cho, Jongsun Jung, Sunho Lee, Jonghee Hong, Sojeong Ka

      LicenseType:CC BY |

      预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

      BackgroundSeveral recent studies showed that next-generation sequencing (NGS)-based human leukocyte antigen (HLA) typing is a feasible and promising technique for variant calling of highly polymorphic regions. To date, however, no method with sufficient read depth has completely solved the allele phasing issue. In this study, we developed a new method (HLAscan) for HLA genotyping using NGS data.ResultsHLAscan performs alignment of reads to HLA sequences from the international ImMunoGeneTics project/human leukocyte antigen (IMGT/HLA) database. The distribution of aligned reads was used to calculate a score function to determine correctly phased alleles by progressively removing false-positive alleles. Comparative HLA typing tests using public datasets from the 1000 Genomes Project and the International HapMap Project demonstrated that HLAscan could perform HLA typing more accurately than previously reported NGS-based methods such as HLAreporter and PHLAT. In addition, the results of HLA-A, −B, and -DRB1 typing by HLAscan using data generated by NextGen were identical to those obtained using a Sanger sequencing–based method. We also applied HLAscan to a family dataset with various coverage depths generated on the Illumina HiSeq X-TEN platform. HLAscan identified allele types of HLA-A, −B, −C, −DQB1, and -DRB1 with 100% accuracy for sequences at ≥ 90× depth, and the overall accuracy was 96.9%.ConclusionsHLAscan, an alignment-based program that takes read distribution into account to determine true allele types, outperformed previously developed HLA typing tools. Therefore, HLAscan can be reliably applied for determination of HLA type across the whole-genome, exome, and target sequences.

        BMC Bioinformatics,2017年

        Mauno Vihinen, Jelena Čalyševa

        LicenseType:CC BY |

        预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

        BackgroundAmino acid substitutions due to DNA nucleotide replacements are frequently disease-causing because of affecting functionally important sites. If the substituting amino acid does not fit into the protein, it causes structural alterations that are often harmful. Clashes of amino acids cause local or global structural changes. Testing structural compatibility of variations has been difficult due to the lack of a dedicated method that could handle vast amounts of variation data produced by next generation sequencing technologies.ResultsWe developed a method, PON-SC, for detecting protein structural clashes due to amino acid substitutions. The method utilizes side chain rotamer library and tests whether any of the common rotamers can be fitted into the protein structure. The tool was tested both with variants that cause and do not cause clashes and found to have accuracy of 0.71 over five test datasets.ConclusionsWe developed a fast method for residue side chain clash detection. The method provides in addition to the prediction also visualization of the variant in three dimensional structure.

          BMC Bioinformatics,2017年

          Xing Qiu, Yuhang Liu, Jinfeng Zhang

          LicenseType:CC BY |

          预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

          BackgroundNormalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically inflates type I error; and can reduce statistical power in certain situations. In this study we propose a new differential expression analysis pipeline, dubbed as super-delta, that consists of a multivariate extension of the global normalization and a modified t-test. A robust procedure is designed to minimize the bias introduced by DEGs in the normalization step. The modified t-test is derived based on asymptotic theory for hypothesis testing that suitably pairs with the proposed robust normalization.ResultsWe first compared super-delta with four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization in simulation studies. Super-delta was shown to have better statistical power with tighter control of type I error rate than its competitors. In many cases, the performance of super-delta is close to that of an oracle test in which datasets without technical noise were used. We then applied all methods to a collection of gene expression datasets on breast cancer patients who received neoadjuvant chemotherapy. While there is a substantial overlap of the DEGs identified by all of them, super-delta were able to identify comparatively more DEGs than its competitors. Downstream gene set enrichment analysis confirmed that all these methods selected largely consistent pathways. Detailed investigations on the relatively small differences showed that pathways identified by super-delta have better connections to breast cancer than other methods.ConclusionsAs a new pipeline, super-delta provides new insights to the area of differential gene expression analysis. Solid theoretical foundation supports its asymptotic unbiasedness and technical noise-free properties. Implementation on real and simulated datasets demonstrates its decent performance compared with state-of-art procedures. It also has the potential of expansion to be incorporated with other data type and/or more general between-group comparison problems.