Each row of data reports cumulative counts based on best reporting up to the moment published. The NYT states that it may revise earlier entries in the data when they receive new information.

nytimes_county_data()

nytimes_state_data()

Value

a five-column (for states) or six-column (for counties) tibble

  • date: observation date

  • county: county (or sometimes city), see https://github.com/nytimes/covid-19-data/ for details

  • state: present in both the state and county data. Note that simply aggregating by state will sometimes overcount when working with the county level data. See https://github.com/nytimes/covid-19-data/ for details.

  • fips: The Federal Information Processing Standard Publication 6-4 (FIPS 6-4) was a five-digit Federal Information Processing Standards code which uniquely identified counties and county equivalents in the United States, certain U.S. possessions, and certain freely associated states.

  • count: number of cases (cumulative)

  • subset: deaths or confirmed

Details

From the NYTimes github README:

The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

Note

See https://github.com/nytimes/covid-19-data#geographic-exceptions. Also, This dataset contains county data with "holes" in reporting, dates with no reported results. Also, records with "Unknown" county are removed since these records appear to NOT be cumulative data, but incidence data?

Licensing

The NYTimes provides a license that states:

In general, we are making this data publicly available for broad, noncommercial public use including by medical and public health researchers, policymakers, analysts and local news media.

If you use this data, you must attribute it to “The New York Times” in any publication. If you would like a more expanded description of the data, you could say “Data from The New York Times, based on reports from state and local health agencies.”

If you use it in an online presentation, we would appreciate it if you would link to our U.S. tracking page at https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html.

Author

Sean Davis seandavi@gmail.com

Examples

# state data
res = nytimes_state_data()
colnames(res)
#> [1] "date"   "state"  "fips"   "count"  "subset"
dplyr::glimpse(res)
#> Rows: 88,156
#> Columns: 5
#> $ date   <date> 2020-01-21, 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-24, 20…
#> $ state  <chr> "Washington", "Washington", "Washington", "Illinois", "Washingt…
#> $ fips   <chr> "00053", "00053", "00053", "00017", "00053", "00006", "00017", …
#> $ count  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, …
#> $ subset <chr> "confirmed", "confirmed", "confirmed", "confirmed", "confirmed"…

# county data
res = nytimes_county_data()
colnames(res)
#> [1] "date"   "county" "state"  "fips"   "count"  "subset"
dplyr::glimpse(res)
#> Rows: 4,930,632
#> Columns: 6
#> $ date   <date> 2020-01-21, 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-24, 20…
#> $ county <chr> "Snohomish", "Snohomish", "Snohomish", "Cook", "Snohomish", "Or…
#> $ state  <chr> "Washington", "Washington", "Washington", "Illinois", "Washingt…
#> $ fips   <chr> "53061", "53061", "53061", "17031", "53061", "06059", "17031", …
#> $ count  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ subset <chr> "confirmed", "confirmed", "confirmed", "confirmed", "confirmed"…