Chapter 7 The Replication rate, \(R_0\)
7.1 Learning goals and objectives
- Gain an intuitive understanding of \(R_0\).
- Know that the value of \(R_0\) determines how quickly a disease spreads or is eliminated.
- Name the three main drivers of \(R_0\)
- Learn to estimate \(R_0\) from data on the number of infections over time occurring in a population.
7.2 Background
The replication rate, \(R_0\) is a central value in understanding the rate at which a disease is spreading in a susceptible population.
7.2.1 What is \(R_0\)?
\(R_0\) is pronounced “R naught.” The \(R_0\) value is an estimate of the average number of people who will be infected by one contagious person. It specifically applies to a population of people who are susceptible to the disease (have not been vaccinated and are not immune). If a disease has an \(R_0\) of 18, for example, a contagious person will transmit it to an average of 18 other people, assuming that all people in the community are susceptible.
7.2.2 What do \(R_0\) values mean?
The \(R_0\) value of a disease is important to understanding the dynamics of disease spread. Depending on the \(R_0\) value, a disease should follow one of three possible courses in the at-risk community.
- If \(R_0\) is less than 1, each existing infection is spread on average to less than one additional person, leading to decline in the number of cases and eventual end to the spread.
- If \(R_0\) equals 1, each existing infection causes one new infection, leading to stable infection numbers without increase or decrease with time, on average.
- If \(R_0\) is more than 1, each existing infection leads to more than one infection, resulting in growth and potential for epidemic/pandemic conditions.
Importantly, the disease-specific \(R_0\) value pplies when each member of the community is fully vulnerable to the disease with:
- no one vaccinated
- no one immune
- no way to control the spread of the disease
7.2.3 What variables contribute to \(R_0\)?
Three main factors impact the \(R_0\) value of a disease:
Infectious period: The time that an infected person can spread the disease varies from one disease to another. Additional factors such as age of the infected person may affect the period during which a person can infect others. A long period of infectiousness will contribute to a higher \(R_0\) value.
Contact rate: If a person who’s infected with a contagious disease comes into contact with many people who aren’t infected or vaccinated, the disease will spread more quickly. If that person remains at home, in a hospital, or otherwise quarantined while they’re contagious, the disease will spread more slowly. A high contact rate will contribute to a higher \(R_0\) value. The corollary, that lower contact rate, can reduce \(R_0\) is the basis for flattening the curve through social distancing.
Mode of transmission: Airborne illnesses tend to have a higher \(R_0\) value than those spread through contact or through bodily fluids.
7.3 Real data examples
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
The software package EpiEstim provides multiple methods for estimating the time-varying reproduction number from epidemic curves. As a reminder, an epidemic curve typically consists of a set of either cumulative or new cases per unit time. In sars2pack, we are careful to import all datasets as cumulative counts.
7.3.1 State of Maryland in the United States
As a starter, we’ll work with my home state, Maryland and examine its COVID-19 disease spread over time and then estimate the \(R_0\) over time to see how successful Marylanders have been with social distancing.
## Warning in with_tz(Sys.time(), tzone): Unrecognized time zone ''
## # A tibble: 6 x 5
## date state fips count subset
## <date> <chr> <chr> <dbl> <chr>
## 1 2020-01-21 Washington 00053 1 confirmed
## 2 2020-01-22 Washington 00053 1 confirmed
## 3 2020-01-23 Washington 00053 1 confirmed
## 4 2020-01-24 Illinois 00017 1 confirmed
## 5 2020-01-24 Washington 00053 1 confirmed
## 6 2020-01-25 California 00006 1 confirmed
To contrast the way cumulative vs “incidence” data look, we can look at one state cumulatively and the same state in terms of new cases (or “incidence”.
Here, we isolate the state of Maryland and pull out the cumulative cases.
Data from most online sources and all the epidemic curve data
for which we provide accessors are cumulative. We can add a new
column to the dataset using add_incidence_column()
.
## # A tibble: 6 x 6
## date state fips count subset inc
## <date> <chr> <chr> <dbl> <chr> <dbl>
## 1 2020-03-17 Maryland 00024 57 confirmed NA
## 2 2020-03-18 Maryland 00024 85 confirmed 28
## 3 2020-03-19 Maryland 00024 108 confirmed 23
## 4 2020-03-20 Maryland 00024 150 confirmed 42
## 5 2020-03-21 Maryland 00024 195 confirmed 45
## 6 2020-03-22 Maryland 00024 245 confirmed 50
Now, plot both using the plot_epicurve()
helper function. Note that
plot_epicurve()
returns a ggplot object that can be further manipulated.
##
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
## default ggplot2 theme anymore. To recover the previous
## behavior, execute:
## theme_set(theme_cowplot())
## ********************************************************
pcumulative = plot_epicurve(md_full,log=FALSE)
pincidence = plot_epicurve(md_full,case_column='inc',log=FALSE)
pcumulative_log = plot_epicurve(md_full,log=TRUE)
pincidence_log = plot_epicurve(md_full,case_column='inc',log=TRUE)
print(cowplot::plot_grid(pcumulative,pincidence,
pcumulative_log,pincidence_log,ncol=2,
labels=c("A","B","C","D")))
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 row(s) containing missing values (geom_path).
The estimate_Rt()
is a very lightweight wrapper around the
EpiEstim::estimate_R()
function. For more details on the help,
take a look at the EpiEstim documentation.
We are able to use exponential growth and time-dependent models with this data, using generation time model from a recent Annals of Internal Medicine paper.
Since we are looking for \(R_0\) to be less than 1 for the pandemic to be dying down, we can look at \(R_0\) estimates over time.
library(ggplot2)
ggplot(x,aes(x=date_end,y=`Mean(R)`)) +
geom_ribbon(aes(ymin=`Quantile.0.025(R)`,ymax=`Quantile.0.975(R)`),fill='grey75') +
geom_line() + ggtitle('Maryland')
7.3.2 State of Arizona in the United States
Arizona shows a somewhat different pattern of pandemic control than Maryland. Start by creating a full epicurve dataset for Arizona.
az_full = nyt %>%
dplyr::filter(state=='Arizona' & subset=='confirmed' & count>25) %>%
add_incidence_column() # defaults suffice here
Taking a look at the actual counts of cases and incidence can give a quick sense of what to expect with regard to \(R_0\) over time. Note that the absolute number of cases is quite different from Maryland.
library(cowplot)
pcumulative = plot_epicurve(az_full,log=FALSE)
pincidence = plot_epicurve(az_full,case_column='inc',log=FALSE)
pcumulative_log = plot_epicurve(az_full,log=TRUE)
pincidence_log = plot_epicurve(az_full,case_column='inc',log=TRUE)
cowplot::plot_grid(pcumulative,pincidence,
pcumulative_log,pincidence_log,ncol=2,
labels=c("A","B","C","D"))
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 row(s) containing missing values (geom_path).
Arizona shows a resurgence in infections as evidenced by \(R_0\) rising well above 1 at the beginning of June.
library(ggplot2)
x = estimate_Rt(az_full,method = 'parametric_si',
config = list(mean_si=3.96, std_si=4.75))
ggplot(x,aes(x=date_end,y=`Mean(R)`)) +
geom_ribbon(aes(ymin=`Quantile.0.025(R)`,ymax=`Quantile.0.975(R)`),fill='grey75') +
geom_line() + ggtitle('Arizona')
7.3.3 Brazil
## Warning in with_tz(Sys.time(), tzone): Unrecognized time zone ''
## Warning in with_tz(Sys.time(), tzone): Unrecognized time zone ''
## Warning in with_tz(Sys.time(), tzone): Unrecognized time zone ''
## # A tibble: 6 x 7
## ProvinceState CountryRegion Lat Long date count subset
## <chr> <chr> <dbl> <dbl> <date> <dbl> <chr>
## 1 <NA> Afghanistan 33.9 67.7 2020-01-22 0 confirmed
## 2 <NA> Afghanistan 33.9 67.7 2020-01-23 0 confirmed
## 3 <NA> Afghanistan 33.9 67.7 2020-01-24 0 confirmed
## 4 <NA> Afghanistan 33.9 67.7 2020-01-25 0 confirmed
## 5 <NA> Afghanistan 33.9 67.7 2020-01-26 0 confirmed
## 6 <NA> Afghanistan 33.9 67.7 2020-01-27 0 confirmed
jhu_brazil = jhu %>%
dplyr::filter(CountryRegion=='Brazil' & subset=='confirmed' & count>50) %>%
add_incidence_column()
## Warning: Removed 1 row(s) containing missing values (geom_path).
7.3.4 Hubei Province
The incidence data probably need smoothing, and the time-dependent model has unreasonable fluctuations.
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:cowplot':
##
## stamp
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
dates = lubridate::as_date(mdy(names(mar19df)[-c(1:4)]))
hubdat = as.numeric(get_series(province="Hubei", country="China",
dataset=sars2pack::mar19df))
names(hubdat) = dates
mGT <- generation.time("gamma", c(5.8, 0.95)) # from DOI 10.7326/M20-0504
mGT <- generation.time("gamma", c(3.96, 4.75)) # from DOI 10.7326/M20-0504
hubdat.filt = trim_leading_values(c(hubdat[1], diff(hubdat)))
est.EG <- estimate.R(epid=hubdat.filt, GT=mGT,
methods=c("EG", "TD"), begin=1L, end=as.integer(length(hubdat.filt)))
## Waiting for profiling to be done...
## Warning in est.R0.TD(epid = c(`2020-01-22` = 444, `2020-01-23` = 0, `2020-01-24`
## = 105, : Simulations may take several minutes.
## Warning in est.R0.TD(epid = c(`2020-01-22` = 444, `2020-01-23` = 0, `2020-01-24`
## = 105, : Using initial incidence as initial number of cases.
## Reproduction number estimate using Exponential Growth method.
## R : 0.8190473[ 0.8164334 , 0.821658 ]
##
## Reproduction number estimate using Time-Dependent method.
## 2.020789 0 3.0142 3.134995 3.32356 3.865543 1.596743 0 1.878637 2.079345 ...
7.3.5 Italy
For Italy, only the EG model seems to work, with the Annals of Internal Medicine generation time model. It fits the data reasonably well, but the data seems to include a reporting gap.
itdat = as.numeric(get_series(province="", country="Italy",
dataset=sars2pack::mar19df))
names(itdat) = dates
itdat.filt = trim_leading_values(c(itdat[1], diff(itdat)))
est.EG <- estimate.R(epid=itdat.filt, GT=mGT,
methods=c("EG"), begin=1L, end=as.integer(length(itdat.filt)))
## Waiting for profiling to be done...
## Reproduction number estimate using Exponential Growth method.
## R : 1.968466[ 1.957161 , 1.979874 ]
7.3.6 New York City
nyt = nytimes_county_data() %>%
dplyr::filter(county=='New York City' & subset=='confirmed') %>%
dplyr::arrange(date)
nytdat = nyt$count
# do we need to chop zeros off? Seems like not.
nytdat.filt = c(nytdat[1], diff(nytdat))
est <- estimate.R(epid=nytdat.filt, GT=mGT,
methods=c("EG","TD","ML"), begin=1L, end=as.integer(length(nytdat.filt)))
We can also use the package EpiEstim to perform time-dependent \(R_0\) calculations.
##
## Attaching package: 'EpiEstim'
## The following object is masked from 'package:sars2pack':
##
## estimate_R
epiestim = EpiEstim::estimate_R(nytdat.filt, method = "parametric_si",
config = EpiEstim::make_config(list(
mean_si = 3.96, std_si = 4.75)))
## Default config will estimate R on weekly sliding windows.
## To change this change the t_start and t_end arguments.
7.4 Italy from JHU dataset
This example uses data
jhu = jhu_data() %>%
dplyr::filter(CountryRegion=='Italy' & is.na(ProvinceState) & subset=='confirmed') %>%
dplyr::arrange(date)
jhucases = jhu$count
# do we need to chop zeros off? Seems like not.
jhucases.inc = c(jhucases[1], diff(jhucases))
jhucases.inc[jhucases.inc<0] = 0
epiestim = EpiEstim::estimate_R(jhucases.inc, method = "parametric_si",
config = EpiEstim::make_config(list(
mean_si = 3.96, std_si = 4.75)))
## Default config will estimate R on weekly sliding windows.
## To change this change the t_start and t_end arguments.
7.5 Simulated epidemic model
Following code conveyed by John Mallery, we have the following approach for estimating \(R_0\) using a single realization of an epidemic simulation.
library(sars2pack)
library(R0)
library(lubridate)
# Generating an epidemic with given parameters
mGT <- generation.time("gamma", c(3,1.5))
set.seed(5432) # always initialize when simulating!
mEpid <- sim.epid(epid.nb=1, GT=mGT, epid.length=30,
family="poisson", R0=1.67, peak.value=500)
mEpid <- mEpid[,1]
# Running estimations
est <- estimate.R(epid=mEpid, GT=mGT, methods=c("EG","ML","TD"), begin=1, end=30)
## Waiting for profiling to be done...
## Warning in est.R0.TD(epid = c(1, 0, 1, 0, 1, 0, 2, 1, 2, 1, 7, 2, 3, 4, :
## Simulations may take several minutes.
## Warning in est.R0.TD(epid = c(1, 0, 1, 0, 1, 0, 2, 1, 2, 1, 7, 2, 3, 4, : Using
## initial incidence as initial number of cases.
We modified the plotting function in R0 which
was calling dev.new
too often. Use plot2
.
The plotfit2
function is also useful. These fits
look identical but they are not.