R/nytimes_data.R
nytimes_county_data.Rd
Each row of data reports cumulative counts based on best reporting up to the moment published. The NYT states that it may revise earlier entries in the data when they receive new information.
nytimes_county_data()
nytimes_state_data()
a five-column (for states) or six-column (for counties) tibble
date: observation date
county: county (or sometimes city), see https://github.com/nytimes/covid-19-data/ for details
state: present in both the state and county data. Note that simply aggregating by state will
sometimes overcount when working with the county
level data. See https://github.com/nytimes/covid-19-data/ for details.
fips: The Federal Information Processing Standard Publication 6-4 (FIPS 6-4) was a five-digit Federal Information Processing Standards code which uniquely identified counties and county equivalents in the United States, certain U.S. possessions, and certain freely associated states.
count: number of cases (cumulative)
subset: deaths
or confirmed
From the NYTimes github README:
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
See https://github.com/nytimes/covid-19-data#geographic-exceptions. Also, This dataset contains county data with "holes" in reporting, dates with no reported results. Also, records with "Unknown" county are removed since these records appear to NOT be cumulative data, but incidence data?
The NYTimes provides a license that states:
In general, we are making this data publicly available for broad, noncommercial public use including by medical and public health researchers, policymakers, analysts and local news media.
If you use this data, you must attribute it to “The New York Times” in any publication. If you would like a more expanded description of the data, you could say “Data from The New York Times, based on reports from state and local health agencies.”
If you use it in an online presentation, we would appreciate it if you would link to our U.S. tracking page at https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html.
jhu_data(), usa_facts_data()
Other data-import:
acaps_government_measures_data()
,
acaps_secondary_impact_data()
,
apple_mobility_data()
,
beoutbreakprepared_data()
,
cci_us_vaccine_data()
,
cdc_aggregated_projections()
,
cdc_excess_deaths()
,
cdc_social_vulnerability_index()
,
coronadatascraper_data()
,
coronanet_government_response_data()
,
cov_glue_lineage_data()
,
cov_glue_newick_data()
,
cov_glue_snp_lineage()
,
covidtracker_data()
,
descartes_mobility_data()
,
ecdc_data()
,
econ_tracker_consumer_spending
,
econ_tracker_employment
,
econ_tracker_unemp_data
,
economist_excess_deaths()
,
financial_times_excess_deaths()
,
google_mobility_data()
,
government_policy_timeline()
,
jhu_data()
,
jhu_us_data()
,
kff_icu_beds()
,
oecd_unemployment_data()
,
owid_data()
,
param_estimates_published()
,
test_and_trace_data()
,
us_county_geo_details()
,
us_county_health_rankings()
,
us_healthcare_capacity()
,
us_hospital_details()
,
us_state_distancing_policy()
,
usa_facts_data()
,
who_cases()
Other case-tracking:
align_to_baseline()
,
beoutbreakprepared_data()
,
bulk_estimate_Rt()
,
combined_us_cases_data()
,
coronadatascraper_data()
,
covidtracker_data()
,
ecdc_data()
,
estimate_Rt()
,
jhu_data()
,
owid_data()
,
plot_epicurve()
,
test_and_trace_data()
,
usa_facts_data()
,
who_cases()
# state data
res = nytimes_state_data()
colnames(res)
#> [1] "date" "state" "fips" "count" "subset"
dplyr::glimpse(res)
#> Rows: 88,156
#> Columns: 5
#> $ date <date> 2020-01-21, 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-24, 20…
#> $ state <chr> "Washington", "Washington", "Washington", "Illinois", "Washingt…
#> $ fips <chr> "00053", "00053", "00053", "00017", "00053", "00006", "00017", …
#> $ count <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, …
#> $ subset <chr> "confirmed", "confirmed", "confirmed", "confirmed", "confirmed"…
# county data
res = nytimes_county_data()
colnames(res)
#> [1] "date" "county" "state" "fips" "count" "subset"
dplyr::glimpse(res)
#> Rows: 4,930,632
#> Columns: 6
#> $ date <date> 2020-01-21, 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-24, 20…
#> $ county <chr> "Snohomish", "Snohomish", "Snohomish", "Cook", "Snohomish", "Or…
#> $ state <chr> "Washington", "Washington", "Washington", "Illinois", "Washingt…
#> $ fips <chr> "53061", "53061", "53061", "17031", "53061", "06059", "17031", …
#> $ count <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ subset <chr> "confirmed", "confirmed", "confirmed", "confirmed", "confirmed"…