Title: Circulation of SARS-Cov-2 genetic clades in wastewater
Abstract:
Background and objectives: Recent studies have witnessed the emergence end reemergence of SARS-CoV-2 variants leading to a succession of epidemic waves worldwide. The monitoring of SARS-CoV-2 variants in sewage is important for understanding the epidemic evolution and spread of COVID-19 in the environment alongside evaluating the efficiency of its control measures. Thus the present study was carried out to monitor the presence and evolution of SARS-CoV-2 variants in sewage water.
Methods: The DNA sequences extracted from waste water samples and submitted to GISAID (https://www.gisaid.org/) were retrieved and the nucleotide as well as the translated protein sequences were subjected to pairwise alignment against the reference strain SARS-CoV-2 Wuhan-Hu-1/2019 (genbank: MN908947) using modified algorithm of Smith-Waterman with affine gap-cost. Three genomic nomenclature systems i.e., Nextstrain (https://clades.nextstrain.org/), GISAID (https://www.gisaid.org/), and Pango (https://cov-lineages.org/) were applied to the submitted sequences. The alignment process consisted of indexing the reference genome using the Burrows Wheeler Aligner (BWA) and aligning the reads to the reference genome using the BWA-MEM algorithm. Samtools were used for the downstream analyses of the alignment file. Nucleotide substitution, deletion, and insertion variants were called using freebayes tool and related statistics were extracted using BCFtools. Phylogenetic analysis was carried out using distance metrics method between the query sequence and reference node, followed by clades assignment (https://clades.nextstrain.org/). Heatmaps were done in Rstudio using the Pheatmap package (https://cran.r-project.org) and matplotlib in python (https://www.python.org) were applied for data visualization.
Results: A total of 2731sequences related to waster water samples were submitted to GISAID between April 14 2020 to April 19 2022, from countries including Austria (n=2618), USA (n=72), Liechtenstein (n=22), Brazil (n=15), Italy (n=2), and Mexico (n=1). The GISAID nomenclature revealed the prevalence of clades GK (n=1174), GRA (n=1135), GRY (n=141), G (n=135), GR (n=128), GH (n=9), O (n=5), GV (n=4) in sewage water. According to Nextstrain naming system, the clades circulating in waster water were 21A (Delta) (n=890), 21L (Omicron) (n=607), 21J (Delta) (n=354), 20I (Alpha, V1) (n=265), 20A (n=67), 20B (n=22), 21M (Omicron) (n=13), 21I (Delta) (n=3), 20E (EU1) (n=2), 21H (Mu) (n=1). As per Pango definition, the major lineages comprised B.1.617.2 (n=886), BA.2 (n=591), BA.1 (n=378), B.1.1.7 (n=264), AY.43 (n=132), BA.1.1 (n=88), B.1 (n=50), AY.4 (n=40). The Delta-Omicron recombinant virus (n=24) derived from the GK/AY.4 and GRA/BA.1 lineages, consisted of variants XQ (n=12), XM (n=5), XE (n=2), XF (n=2), XD (n=1), XT (n=1), and XN (n=1). The Delta variant detected from waste water comprised of T478K, D614G, P681R, T19R, D950N, R158G, and G142D (decreasing order of percentage) as the major spike mutations in the overall submitted sequences (range: 76 to 61%, mean±SD: 73±6%). The omicron variant displayed D614G, N679K, N969K, K417N, H655Y, P681H, Q954H, N764K, D796Y, T478K, Q493R, Q498R. Y505H, N440K, S477N, E484A, N501Y, G446S, G496S, G339D, S373P, S375F, S371F, T376A, A27S, G142D, R408S, T19I, V213G, D405N, T547K, A67V, T95I, Y145D, L212I, N856K, and L981F, in descending order of the percentage of waste water based nucleotide sequences deposited in GISAID (range: 99 to 32%, mean±SD: 76±23%). The results showed that the 21J (Delta) and 21A (Delta) clades in Austria were first detected from waste water samples on December 28 2020 and January 16 2021, respectively, at least five months prior to being detected in clinical samples from the same area, i.e., on May 28 2021 and June 18 2021, respectively, implying the significance of wastewater surveillance in the identification and tracking of SARS-CoV-2 in the population.
Conclusions: The spatio-temporal distribution of SARS-CoV-2 genomic variants based on the DNA sequences extracted from wastewater and submitted to GISAID up to April 19 2022, are represented here in this study using three genomic nomenclatures. The associated multiple mutations in Spike protein of SARS-CoV-2 featuring the emerging epidemiological variants have been determined that help to enumerate the diversity of SARS-CoV-2 strains circulating in the environment.