-
Notifications
You must be signed in to change notification settings - Fork 1
/
Vaccine_Delta.Rmd
182 lines (151 loc) · 5.43 KB
/
Vaccine_Delta.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
---
title: "Vaccination and Delta Viral Load"
author: "Brian Yandell"
date: "`r format(Sys.time(), '%d %B %Y')`"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE)
```
This build on work of the [AIDS Vaccine Research Laboratory](https://dholk.primate.wisc.edu/project/dho/public/manuscripts/published/viral-loads-in-vaccinees/begin.view?) for the manuscript [https://www.medrxiv.org/content/10.1101/2021.07.31.21261387v3](Vaccinated and unvaccinated individuals have similar viral loads in communities with a high prevalence of the SARS-CoV-2 delta variant). This document source can be found at <https://github.com/UW-Madison-DataScience/pandemic/blob/master/Vaccine_Delta.Rmd>.
## Comments on design
- Would be useful to know something about variant for samples with N1 Ct values above 30.
+ If they are mostly not Delta, then it is a stronger case that Delta has high viral load.
+ If they have the same proportion of Delta as other samples with known variant, then there would be no evidence that Delta has a viral load. That is, the higher infectivity of Delta might be due to something else.
+ Need not sequence all: take a random sample of, say, 100, with N1 > 30 to sequence.
+ But unfortunately, it is not possible to reliably sequence above ct 30, so unfortunately that is unknowable. (DO)
- Symptom data is not connected to earlier data.
+ I connected them based on exact N1 match and Vaccinated match.
+ Some samples had same N1 but different Vaccinated status.
+ Some samples from first batch were not matched in second batch. See discrepancies below.
## Setup for 31 July data (batch 1)
```{r}
library(tidyverse)
library(readxl)
```
```{r}
datadir <- "https://dholk.primate.wisc.edu/_webdav/dho/public/manuscripts/published/viral-loads-in-vaccinees/%40files"
```
```{r}
tmpfile <- tempfile()
download.file(
file.path(datadir, "N1_Ct_values.xlsx"), tmpfile)
n1ct <- read_excel(tmpfile, na = c("", "NA"))
unlink(tmpfile)
rm(tmpfile)
```
```{r}
n1ct <- n1ct %>%
mutate(Variant = (Pango == "B.1.617.2") & !is.na(Pango),
Variant = c("Other", "Delta")[1 + Variant],
Variant = ifelse(is.na(Pango), "Unknown", Variant),
Viral_Load = 2^(30 - N1),
Delta = Variant,
Delta = ifelse(Delta != "Delta", "Other/Variant", Delta),
Location = "Wisconsin") %>%
rename(Vaccinated = "Fully Vaccinated")
```
## Plots and analysis of batch 1
Would be helpful to identify to county, but HIPAA privacy restrictions prevents that.
Otherwise `Location` column could be modified to "Dane Co" and "Other Co", for instance.
```{r}
ggplot(n1ct) +
aes(Vaccinated, N1, col = Pango) +
geom_boxplot(col = "gray") +
geom_jitter(width = 0.2) +
facet_grid(Location ~ Variant) +
scale_y_continuous(sec.axis = sec_axis(~ 2^(30-.), name = "Viral Load",
breaks = c(1,10,100, 1000, 10000)))
```
Test for variant and vaccinated effects. Use location if more than one.
```{r}
if(n1ct %>% count(Location) %>% nrow() > 1) {
fit <- lm(N1 ~ Location * Variant * Vaccinated, n1ct)
} else {
fit <- lm(N1 ~ Variant * Vaccinated, n1ct)
}
```
```{r}
drop1(fit, fit, test = "F")
```
Test for variant and vaccinated effects in samples with sequenced variant. Use location if more than one.
```{r}
n1ctna <- n1ct %>% filter(!is.na(Pango))
if(n1ct %>% count(Location) %>% nrow() > 1) {
fitna <- lm(N1 ~ Location * Variant * Vaccinated, n1ctna)
} else {
fitna <- lm(N1 ~ Variant * Vaccinated, n1ctna)
}
```
```{r}
drop1(fitna, fitna, test = "F")
```
## Setup for August 10 Symptom Dataset (batch 2)
Symptom data. No information in these data about variant, days to symptom onset or infectiousness (cytopathic effects/CPE).
```{r}
tmpfile <- tempfile()
download.file(
file.path(datadir, "10Aug2021_N1_Ct_toshare.xlsx"), tmpfile)
n1ctb <- read_excel(tmpfile, na = c("", "NA"))
unlink(tmpfile)
rm(tmpfile)
```
```{r}
n1ctb <- n1ctb %>%
select(1:3) %>%
rename(N1 = "N1_Ct",
Vaccinated = "Fully Vaccinated") %>%
mutate(AnySymptoms = ifelse(is.na(AnySymptoms), NA, paste(AnySymptoms, "Symptoms")))
```
```{r}
ggplot(n1ctb ) +
aes(Vaccinated, N1) +
geom_boxplot(col = "gray") +
geom_jitter(width = 0.2) +
facet_grid(~ AnySymptoms) +
scale_y_continuous(sec.axis = sec_axis(~ 2^(30-.), name = "Viral Load",
breaks = c(1,10,100, 1000, 10000)))
```
## Comparison of batches 1 and 2
Assume that identical N1 and Vaccinated are same samples.
```{r}
(mismatch <-
full_join(
n1ct,
n1ctb,
by = c("N1")) %>%
filter(Vaccinated.x != Vaccinated.y)) %>%
select(N1,Pango,Variant,AnySymptoms,Vaccinated.x,Vaccinated.y) %>%
knitr::kable()
```
```{r}
n1ctc <-
full_join(
n1ct,
n1ctb,
by = c("N1","Vaccinated"))
```
```{r}
n1ctc %>%
mutate(naVariant = is.na(Variant),
naAnySymptoms = is.na(AnySymptoms)) %>%
count(naVariant, naAnySymptoms)
```
```{r}
n1ctc %>%
filter(!is.na(Variant) & is.na(AnySymptoms))%>%
select(N1,Pango,Variant,AnySymptoms,Vaccinated) %>%
arrange(N1) %>%
knitr::kable()
```
```{r}
ggplot(n1ctc %>%
filter(!is.na(AnySymptoms),
!is.na(Variant))) +
aes(Vaccinated, N1, col = Pango) +
geom_boxplot(col = "gray") +
geom_jitter(width = 0.2) +
facet_grid(AnySymptoms~ Variant) +
scale_y_continuous(sec.axis = sec_axis(~ 2^(30-.), name = "Viral Load",
breaks = c(1,10,100, 1000, 10000)))
```