期刊论文详细信息
Frontiers in Cellular and Infection Microbiology
Learning From Biological and Computational Machines: Importance of SARS-CoV-2 Genomic Surveillance, Mutations and Risk Stratification
Shimpa Sharma1  Rajesh J. Khyalappa1  Meghnad G. Joshi1  Janani Srinivasa Vasudevan2  Akshay Kanakan2  Rajesh Pandey3  Partha Chattopadhyay3  Ranjeet Maurya3  Priti Devi3  Shikha Bhat4  Anuradha Pandey4 
[1] D. Y. Patil Medical College Kolhapur, Kasaba Bawada, Kolhapur, India;INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi, India;INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi, India;Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India;INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology (CSIR-IGIB), New Delhi, India;Birla Institute of Technology and Science, Pilani, India;
关键词: COVID-19;    SARS-CoV-2;    genomic surveillance;    risk stratification;    machine learning;    healthcare;   
DOI  :  10.3389/fcimb.2021.783961
来源: Frontiers
PDF
【 摘 要 】

The global coronavirus disease 2019 (COVID-19) pandemic has demonstrated the range of disease severity and pathogen genomic diversity emanating from a singular virus (severe acute respiratory syndrome coronavirus 2, SARS-CoV-2). This diversity in disease manifestations and genomic mutations has challenged healthcare management and resource allocation during the pandemic, especially for countries such as India with a bigger population base. Here, we undertake a combinatorial approach toward scrutinizing the diagnostic and genomic diversity to extract meaningful information from the chaos of COVID-19 in the Indian context. Using methods of statistical correlation, machine learning (ML), and genomic sequencing on a clinically comprehensive patient dataset with corresponding with/without respiratory support samples, we highlight specific significant diagnostic parameters and ML models for assessing the risk of developing severe COVID-19. This information is further contextualized in the backdrop of SARS-CoV-2 genomic features in the cohort for pathogen genomic evolution monitoring. Analysis of the patient demographic features and symptoms revealed that age, breathlessness, and cough were significantly associated with severe disease; at the same time, we found no severe patient reporting absence of physical symptoms. Observing the trends in biochemical/biophysical diagnostic parameters, we noted that the respiratory rate, total leukocyte count (TLC), blood urea levels, and C-reactive protein (CRP) levels were directly correlated with the probability of developing severe disease. Out of five different ML algorithms tested to predict patient severity, the multi-layer perceptron-based model performed the best, with a receiver operating characteristic (ROC) score of 0.96 and an F1 score of 0.791. The SARS-CoV-2 genomic analysis highlighted a set of mutations with global frequency flips and future inculcation into variants of concern (VOCs) and variants of interest (VOIs), which can be further monitored and annotated for functional significance. In summary, our findings highlight the importance of SARS-CoV-2 genomic surveillance and statistical analysis of clinical data to develop a risk assessment ML model.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202202026812057ZK.pdf 6303KB PDF download
  文献评价指标  
  下载次数:1次 浏览次数:5次