sars2pack: Easy access to and use of iconic COVID-19 data resource

On January 30, 2020, the World Health Organization declared coronavirus disease 2019 (COVID-19) a Public Health Emergency of International concern (PHEIC) and within six weeks had characterized the outbreak as a pandemic. Compared to the 2003-2004 severe acute respiratory syndrome (SARS) PHEIC, the COVID-19 pandemic is spreading more quickly and with a much higher death toll. However, the current pandemic is occurring in a more digital and interconnected world. Traditional public health organizations as well as data-mature organizations not traditionally involved directly in public health have rapidly developed digital disease surveillance infrastructuree that provides nearly realtime epidemic tracking data. These data resources have proven invaluable to understanding disease spread, to drive non-pharmacologic intervention (NPI), and, when combined with additional data resources, to project impacts to communities and healthcare systems around the world. Even as the urgency of the initial “hammer” of the COVID-19 pandemic begins to abate, the need for timely, robust, and granular datasets will inform business, policy, and even personal decisions for months or even years to come.

Purpose

The sars2pack R package aims to:

Collect COVID-19 related public health and disease tracking resourcesand provide principled approach date reuse and reproducible computational research.
Provide a data science environment for researchers, media, policy makers, and data scientists to collaborate while promoting reproducible computational research best practices.
Capitalize on the large, existing multidisciplinary data science workforce already familiar with the R programming environment.
Create opportunities for individuals not well-versed in data science to learn and experiment with COVID-19 datasets.
Incorporate examplar workflows that leverage the extensive R data science ecosystem to visualize, analyze, and integrate COVID-19 data resources.

Examples


# Show me all the available datasets
available_datasets()
#> # A tibble: 50 × 8
#>    name       accessor data_type geographical geospatial region resolution url  
#>    <chr>      <chr>    <list>    <lgl>        <lgl>      <list> <list>     <chr>
#>  1 United St… cdc_soc… <chr [1]> TRUE         FALSE      <chr>  <chr [1]>  http…
#>  2 Extensive… us_hosp… <chr [1]> TRUE         TRUE       <chr>  <chr [1]>  http…
#>  3 The Econo… economi… <chr [3]> TRUE         FALSE      <chr>  <chr [2]>  http…
#>  4 The : Exc… financi… <chr [3]> TRUE         FALSE      <chr>  <chr [2]>  http…
#>  5 US county… us_coun… <chr [1]> TRUE         FALSE      <chr>  <chr [3]>  http…
#>  6 CoronaNet… coronan… <chr [1]> TRUE         FALSE      <chr>  <chr [2]>  http…
#>  7 Country m… country… <chr [1]> TRUE         FALSE      <chr>  <chr [1]>  http…
#>  8 Our World… owid_da… <chr [4]> TRUE         FALSE      <chr>  <chr [1]>  http…
#>  9 GISAID me… cov_glu… <chr [1]> TRUE         FALSE      <chr>  <chr [1]>  http…
#> 10 Newick tr… cov_glu… <chr [1]> FALSE        FALSE      <chr>  <chr [1]>  http…
#> # … with 40 more rows