MEmilio Epidata
MEmilio Epidata provides modules and scripts to download epidemiological data from various different sources. The package as well as links to the sources can be found in the pycode/memilio-epidata.
Dependencies
Required python packages:
pandas>=2.0.0
matplotlib
tables
numpy>=1.22, !=1.25.*
pyarrow
openpyxl
xlrd
requests
pyxlsb
wget
Usage
After installation the following functions are available:
clean_data: Deletes all data files generated by the MEmilio Epidata package.get_case_data: Downloads SARS-CoV-2 case data from Robert Koch-Institut (RKI-C).get_commuter_data: Computes DataFrame of commuter mobility patterns based on the Federal Agency of Work data.get_divi_data: Downloads ICU data from German DIVI Intensivregister (DIVI).get_hospitalization_data: Downloads data about COVID-19 hospitalizations data from Robert Koch-Institut (RKI-H).get_jh_data: Downloads COVID-19 case data from Johns Hopkins University (JH).get_population_data: Downloads population data for German federal states and counties from various public sources (P).get_simulation_data: Downloads all data required for a simulation with the graph-metapopulation model which are SARS-CoV-2 case data(RKI-C), population data (P), ICU data (DIVI) and COVID-19 vaccination data from Robert Koch-Institut (RKI-V).get_testing_data: Downloads data about SARS-CoV-2 PCR tests from Robert Koch-Institut (RKI-T).get_vaccination_data: Downloads the RKI vaccination data and provides different kind of structured data.updateMobility2022: Merges rows and columns of Eisenach to Wartburgkreis which has become one single county by July 2021.createFederalStatesMobility: Creates mobility matrices for German federal states based on county mobility.transformWeatherData: Transforms weather data.
For a detailed description of the run options and the resulting data files written see the epidata subfolder.
The downloaded data is written either to HDF5 or json files.
Additional Tools
Some additional tools for processing or analysing data can be found in tools directory.
Notes for developers
If a new functionality shall be added please stick to the following instructions:
When you start creating a new script:
- Have a look into getDataIntoPandasDataFrame.py. There the main functionality which should be used is implemented.
get_file is used to read in data.
The Conf class sets relevant download options.
Use write_dataframe to write the pandas dataframe to file.
Use check_dir if you want to create a new folder to write data to
- Use the dictionaries in defaultDict.py to rename the existing columns of your data.
Add new column names to one of the existing language dictionaries; English, German and Spanish translations exists at the moment.
For non-english languages always use the EngEng dictionary as the key, thus we can easily change names with just changing one line.
In defaultDict.py a dictionary with id, state and county name, respectively exists. Please use it.
After renaming columns, you should not use pandas dataframe.column but instead use dataframe[column] where column is given by the dictionaries in defaultDict.py. Example: ID_County = dd.GerEng[‘IdLandkreis’] or dd.EngEng[‘idCounty’].
For extensive operations use the progress indicator to give feedback for the user.
ALWAYS use Copy-on-Write for pandas DataFrames.
- Use doxygen like comments in code as folows:
- Add description in the beginning of the file:
## Header
# @brief name descr
# longer description
- Add description in the beginning of every function directly after the definition:
start and end with “””
add a short description to first line
afterwards add a longer description
# @param name of parameter
# @return type description
When you add a new script
Add a console script entry to the
pycode/memilio-epidata/pyproject.tomlfile.- Add it to the cli_dict in getDataIntoPandasDataFrame.py.
Add a meaningful key for the new script.
for the dict value add a list in the form [comment to print when script is started, list of used parser arguments (optional)].
If more than the default parser should be added, add these parser to the list of used parser.
Add tests.
Add an entry “executablename -h” to the .github/test-py/action.yml.
Add an entry “executablename -o data_dl” to the .github/workflows/main.yml.
Add generated data to cleanData.
Adding a new parser:
Add default value to defaultDict in defaultDict.py.
Add to cli_dict in getDataIntoPandasDataFrame.py which scripts use this parser.
Add an if ‘new parser’ in what_list and add parser.add_argument().
General - Always add unittests. - Check test coverage report, if every new feature is covered. - Check the pylint report just comments with “refactor” are allowed.
Troubleshooting
HDF5 errors during installation (mostly on Windows): one of the dependencies of the epidata package requires HDF5 to
be installed on the system. If HDF5 is not discovered properly, this stack overflow thread may help resolve the issue.