• 已选条件:
  • × BMC Bioinformatics
  • × 2023
 全选  【符合条件的数据共:179条】

BMC Bioinformatics,2023年

Namita Khanna, K. Syama, J. Angel Arul Jothi

LicenseType:CC BY |

预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

BMC Bioinformatics,2023年

Pınar Karadayı Ataş

LicenseType:CC BY |

预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

Hashimoto’s thyroiditis is an autoimmune disorder characterized by the destruction of thyroid cells through immune-mediated mechanisms involving cells and antibodies. The condition can trigger disturbances in metabolism, leading to the development of other autoimmune diseases, known as concomitant diseases. Multiple concomitant diseases may coexist in a single individual, making it challenging to diagnose and manage them effectively. This study aims to propose a novel hybrid algorithm that classifies concomitant diseases associated with Hashimoto’s thyroiditis based on sequences. The approach involves building distinct prediction models for each class and using the output of one model as input for the subsequent one, resulting in a dynamic decision-making process. Genes associated with concomitant diseases were collected alongside those related to Hashimoto’s thyroiditis, and their sequences were obtained from the NCBI site in fasta format. The hybrid algorithm was evaluated against common machine learning algorithms and their various combinations. The experimental results demonstrate that the proposed hybrid model outperforms existing classification methods in terms of performance metrics. The significance of this study lies in its two distinctive aspects. Firstly, it presents a new benchmarking dataset that has not been previously developed in this field, using diverse methods. Secondly, it proposes a more effective and efficient solution that accounts for the dynamic nature of the dataset. The hybrid approach holds promise in investigating the genetic heterogeneity of complex diseases such as Hashimoto’s thyroiditis and identifying new autoimmune disease genes. Additionally, the results of this study may aid in the development of genetic screening tools and laboratory experiments targeting Hashimoto’s thyroiditis genetic risk factors. New software, models, and techniques for computing, including systems biology, machine learning, and artificial intelligence, are used in our study.

    BMC Bioinformatics,2023年

    Yong Xu, Xiaochen Bo, Song He, Lianlian Wu, Kunhong Liu, Jing Chen

    LicenseType:CC BY |

    预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

    IntroductionThere are countless possibilities for drug combinations, which makes it expensive and time-consuming to rely solely on clinical trials to determine the effects of each possible drug combination. In order to screen out the most effective drug combinations more quickly, scholars began to apply machine learning to drug combination prediction. However, most of them are of low interpretability. Consequently, even though they can sometimes produce high prediction accuracy, experts in the medical and biological fields can still not fully rely on their judgments because of the lack of knowledge about the decision-making process.Related workDecision trees and their ensemble algorithms are considered to be suitable methods for pharmaceutical applications due to their excellent performance and good interpretability. We review existing decision trees or decision tree ensemble algorithms in the medical field and point out their shortcomings.MethodThis study proposes a decision stump (DS)-based solution to extract interpretable knowledge from data sets. In this method, a set of DSs is first generated to selectively form a decision tree (DST). Different from the traditional decision tree, our algorithm not only enables a partial exchange of information between base classifiers by introducing a stump exchange method but also uses a modified Gini index to evaluate stump performance so that the generation of each node is evaluated by a global view to maintain high generalization ability. Furthermore, these trees are combined to construct an ensemble of DST (EDST).ExperimentThe two-drug combination data sets are collected from two cell lines with three classes (additive, antagonistic and synergistic effects) to test our method. Experimental results show that both our DST and EDST perform better than other methods. Besides, the rules generated by our methods are more compact and more accurate than other rule-based algorithms. Finally, we also analyze the extracted knowledge by the model in the field of bioinformatics.ConclusionThe novel decision tree ensemble model can effectively predict the effect of drug combination datasets and easily obtain the decision-making process.

      BMC Bioinformatics,2023年

      Zhihao Wang, Xinran Ni, Weike Feng, Yue Wu

      LicenseType:CC BY |

      预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

      BackgroundAccurate prediction of molecular property holds significance in contemporary drug discovery and medical research. Recent advances in AI-driven molecular property prediction have shown promising results. Due to the costly annotation of in vitro and in vivo experiments, transfer learning paradigm has been gaining momentum in extracting general self-supervised information to facilitate neural network learning. However, prior pretraining strategies have overlooked the necessity of explicitly incorporating domain knowledge, especially the molecular fragments, into model design, resulting in the under-exploration of the molecular semantic space.ResultsWe propose an effective model with FRagment-based dual-channEL pretraining (FREL). Equipped with molecular fragments, FREL comprehensively employs masked autoencoder and contrastive learning to learn intra- and inter-molecule agreement, respectively. We further conduct extensive experiments on ten public datasets to demonstrate its superiority over state-of-the-art models. Further investigations and interpretations manifest the underlying relationship between molecular representations and molecular properties.ConclusionsOur proposed model FREL achieves state-of-the-art performance on the benchmark datasets, emphasizing the importance of incorporating molecular fragments into model design. The expressiveness of learned molecular representations is also investigated by visualization and correlation analysis. Case studies indicate that the learned molecular representations better capture the drug property variation and fragment semantics.

        BMC Bioinformatics,2023年

        Jörg Fliege, Michael J. Casey, Rubén J. Sánchez-García, Ben D. MacArthur

        LicenseType:CC BY |

        预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

        BackgroundSingle-cell sequencing (sc-Seq) experiments are producing increasingly large data sets. However, large data sets do not necessarily contain large amounts of information.ResultsHere, we formally quantify the information obtained from a sc-Seq experiment and show that it corresponds to an intuitive notion of gene expression heterogeneity. We demonstrate a natural relation between our notion of heterogeneity and that of cell type, decomposing heterogeneity into that component attributable to differential expression between cell types (inter-cluster heterogeneity) and that remaining (intra-cluster heterogeneity). We test our definition of heterogeneity as the objective function of a clustering algorithm, and show that it is a useful descriptor for gene expression patterns associated with different cell types.ConclusionsThus, our definition of gene heterogeneity leads to a biologically meaningful notion of cell type, as groups of cells that are statistically equivalent with respect to their patterns of gene expression. Our measure of heterogeneity, and its decomposition into inter- and intra-cluster, is non-parametric, intrinsic, unbiased, and requires no additional assumptions about expression patterns. Based on this theory, we develop an efficient method for the automatic unsupervised clustering of cells from sc-Seq data, and provide an R package implementation.

          BMC Bioinformatics,2023年

          Hai Hu, Amy R. Peck, Yunguang Sun, Hallgeir Rui, Misung Yi, Inna Chervoneva, Tingting Zhan, Albert J. Kovatich, Craig D. Shriver, Jeffrey A. Hooke

          LicenseType:CC BY |

          预览  |  原文链接  |  全文  [ 浏览:0 下载:0  ]    

          BackgroundProtein biomarkers of cancer progression and response to therapy are increasingly important for improving personalized medicine. Advanced quantitative pathology platforms enable measurement of protein expression in tissues at the single-cell level. However, this rich quantitative cell-by-cell biomarker information is most often not exploited. Instead, it is reduced to a single mean across the cells of interest or converted into a simple proportion of binary biomarker-positive or -negative cells.ResultsWe investigated the utility of retaining all quantitative information at the single-cell level by considering the values of the quantile function (inverse of the cumulative distribution function) estimated from a sample of cell signal intensity levels in a tumor tissue. An algorithm was developed for selecting optimal cutoffs for dichotomizing cell signal intensity distribution quantiles as predictors of continuous, categorical or survival outcomes. The proposed algorithm was used to select optimal quantile biomarkers of breast cancer progression based on cancer cells’ cell signal intensity levels of nuclear protein Ki-67, Proliferating cell nuclear antigen, Programmed cell death 1 ligand 2, and Progesterone receptor. The performance of the resulting optimal quantile biomarkers was validated and compared to the standard cancer compartment mean signal intensity markers using an independent external validation cohort. For Ki-67, the optimal quantile biomarker was also compared to established biomarkers based on percentages of Ki67-positive cells. For proteins significantly associated with PFS in the external validation cohort, the optimal quantile biomarkers yielded either larger or similar effect size (hazard ratio for progression-free survival) as compared to cancer compartment mean signal intensity biomarkers.ConclusionThe optimal quantile protein biomarkers yield generally improved prognostic value as compared to the standard protein expression markers. The proposed methodology has a broad application to single-cell data from genomics, transcriptomics, proteomics, or metabolomics studies at the single cell level.