期刊论文详细信息
BMC Bioinformatics
Unsupervised outlier detection applied to SARS-CoV-2 nucleotide sequences can identify sequences of common variants and other variants of interest
Research
Michael Cho1  Tanya Novak2  Adrienne G. Randolph3  Georg Hahn4  Sanghun Lee5  Christoph Lange6  Jonathan Abraham7  Lindsey R. Baden8  Surender Khurana9  Dmitry Prokopenko1,10  Scott T. Weiss1,11  Julian Hecker1,11 
[1] Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, 02115, Boston, MA, USA;Department of Anesthesiology, Critical Care and Pain Medicine, Boston Children’s Hospital, 02115, Boston, MA, USA;Department of Anesthesiology, Critical Care and Pain Medicine, Boston Children’s Hospital, 02115, Boston, MA, USA;Harvard Medical School, Harvard University, 02115, Boston, MA, USA;Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, 02115, Boston, MA, USA;Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, 02115, Boston, MA, USA;Department of Medical Consilience, Graduate School, Dankook University, Yongin, South Korea;Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, 02115, Boston, MA, USA;Harvard Medical School, Harvard University, 02115, Boston, MA, USA;Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, 02115, Boston, MA, USA;Department of Microbiology, Harvard Medical School, Blavatnik Institute, 77 Avenue Louis Pasteur, 02115, Boston, MA, USA;Division of Infectious Diseases, Harvard Medical School, Brigham and Women’s Hospital, 02115, Boston, MA, USA;Food and Drug Administration, 20993, Silver Spring, MD, USA;Genetics and Aging Research Unit, Department of Neurology, McCance Center for Brain Health, Massachusetts General Hospital, 02114, Boston, MA, USA;Harvard Medical School, Harvard University, 02115, Boston, MA, USA;Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, 02115, Boston, MA, USA;
关键词: SARS-CoV-2;    Nucleotide sequences;    Outlier detection;    Variants of interest;    Machine learning;   
DOI  :  10.1186/s12859-022-05105-y
 received in 2022-06-26, accepted in 2022-12-07,  发布年份 2022
来源: Springer
PDF
【 摘 要 】

As of June 2022, the GISAID database contains more than 11 million SARS-CoV-2 genomes, including several thousand nucleotide sequences for the most common variants such as delta or omicron. These SARS-CoV-2 strains have been collected from patients around the world since the beginning of the pandemic. We start by assessing the similarity of all pairs of nucleotide sequences using the Jaccard index and principal component analysis. As shown previously in the literature, an unsupervised cluster analysis applied to the SARS-CoV-2 genomes results in clusters of sequences according to certain characteristics such as their strain or their clade. Importantly, we observe that nucleotide sequences of common variants are often outliers in clusters of sequences stemming from variants identified earlier on during the pandemic. Motivated by this finding, we are interested in applying outlier detection to nucleotide sequences. We demonstrate that nucleotide sequences of common variants (such as alpha, delta, or omicron) can be identified solely based on a statistical outlier criterion. We argue that outlier detection might be a useful surveillance tool to identify emerging variants in real time as the pandemic progresses.

【 授权许可】

CC BY   
© The Author(s) 2022

【 预 览 】
附件列表
Files Size Format View
RO202305062475338ZK.pdf 3671KB PDF download
Fig. 6 112KB Image download
Fig. 1 884KB Image download
Fig. 3 240KB Image download
Fig. 2 1121KB Image download
Fig. 3 199KB Image download
Fig. 1 266KB Image download
MediaObjects/40249_2022_1044_MOESM4_ESM.xlsx 18KB Other download
Fig. 2 667KB Image download
Fig. 2 277KB Image download
40517_2022_243_Article_IEq3.gif 1KB Image download
Table 1 241KB Table download
Fig. 4 1551KB Image download
MediaObjects/40249_2022_1028_MOESM1_ESM.docx 28KB Other download
【 图 表 】

Fig. 4

40517_2022_243_Article_IEq3.gif

Fig. 2

Fig. 2

Fig. 1

Fig. 3

Fig. 2

Fig. 3

Fig. 1

Fig. 6

【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  文献评价指标  
  下载次数:11次 浏览次数:4次