I’ve created a GitHub repository based on the New York Times’ COVID-19 database.
The New York Times has been compiling COVID-19 data from various state health departments. They have published the data as a GitHub repository. There are two files of interest in their repository:
In these CSV files, each row represents the number of (confirmed) cases and deaths in a particular geographical area as of a particular date. For example, the first ten rows of
date,state,fips,cases,deaths 2020-01-21,Washington,53,1,0 2020-01-22,Washington,53,1,0 2020-01-23,Washington,53,1,0 2020-01-24,Illinois,17,1,0 2020-01-24,Washington,53,1,0 2020-01-25,California,06,1,0 2020-01-25,Illinois,17,1,0 2020-01-25,Washington,53,1,0 2020-01-26,Arizona,04,1,0
These are easy enough to analyze using a tool like Pandas. However, this kind of file might be hard to analyze if you want to use a spreadsheet program.
So, I wrote a Python script that uses Pandas to output a bunch of CSV files, so people can analyze the data using spreadsheet software. The script also creates a bunch of plots, including one for each state.
The script and all the output are available in my new repo. I have licensed the script under an MIT-style license. Feel free to fork and improve! (The outputs of the script I haven’t so licensed since they are subject to the license restrictions imposed by the Times.)
A big disclaimer: I’m not an epidemiologist (my expertise being in noncommutative algebra, not communicable disease). So, please don’t make any policy decisions based on my plots or any other amateurs who don’t know what they are doing.
Stay safe and healthy!