This function access and munges the cumulative time series of confirmed, and deaths from the US data in the repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).

jhu_us_data()

Value

A tidy data.frame (actually, a tbl_df) with columns:

  • UID: Universal Identifier

  • iso2: ISO 3166-1 alpha-2 code

  • iso3: ISO 3166-1 alpha-3 code

  • code3

  • fips: Federal Information Processing Standard Publication code

  • county: County

  • state: Province or state.

  • country: US

  • Lat: Latitude

  • Long: Longitude

  • Combined_Key: Comma-separated combination of columns Admin2, ProvinceState, and CountryRegion

  • date: Date

  • count: The cumulative count of cases for a given geographic area.

  • subset: either confirmed or deaths

Details

Data are updated daily by JHU. Each call to this function redownloads the data from github. No data cleansing is performed. Data are downloaded and then munged into long-form tidy data.frame.

Note

Uses https://raw.githubusercontent.com/CSSEGISandData/... as data source, then modifies column names and munges to long form table.

  • Although numbers are meant to be cumulative, there are instances where a day's count might be less than the prior day due to a reclassification of a case. These are not currently corrected in the source data

Examples

res = jhu_data()
colnames(res)
#> [1] "ProvinceState" "CountryRegion" "Lat"           "Long"         
#> [5] "date"          "count"         "subset"       
head(res)
#> # A tibble: 6 × 7
#>   ProvinceState CountryRegion   Lat  Long date       count subset   
#>   <chr>         <chr>         <dbl> <dbl> <date>     <dbl> <chr>    
#> 1 NA            Afghanistan    33.9  67.7 2020-01-22     0 confirmed
#> 2 NA            Afghanistan    33.9  67.7 2020-01-23     0 confirmed
#> 3 NA            Afghanistan    33.9  67.7 2020-01-24     0 confirmed
#> 4 NA            Afghanistan    33.9  67.7 2020-01-25     0 confirmed
#> 5 NA            Afghanistan    33.9  67.7 2020-01-26     0 confirmed
#> 6 NA            Afghanistan    33.9  67.7 2020-01-27     0 confirmed
dplyr::glimpse(res)
#> Rows: 701,406
#> Columns: 7
#> $ ProvinceState <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ CountryRegion <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanista…
#> $ Lat           <dbl> 33.93911, 33.93911, 33.93911, 33.93911, 33.93911, 33.939…
#> $ Long          <dbl> 67.70995, 67.70995, 67.70995, 67.70995, 67.70995, 67.709…
#> $ date          <date> 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-25, 2020-01…
#> $ count         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ subset        <chr> "confirmed", "confirmed", "confirmed", "confirmed", "con…

table(res$state)
#> Warning: Unknown or uninitialised column: `state`.
#> < table of extent 0 >