MEmilio Epidata ================ MEmilio Epidata provides modules and scripts to download epidemiological data from various different sources. The package as well as links to the sources can be found in the `pycode/memilio-epidata `_. Dependencies ------------ Required python packages: * pandas>=2.0.0 * matplotlib * tables * numpy>=1.22, !=1.25.* * pyarrow * openpyxl * xlrd * requests * pyxlsb * wget Usage ----- After installation the following functions are available: * ``clean_data``: Deletes all data files generated by the MEmilio Epidata package. * ``get_case_data``: Downloads SARS-CoV-2 case data from Robert Koch-Institut (RKI-C). * ``get_commuter_data``: Computes DataFrame of commuter mobility patterns based on the Federal Agency of Work data. * ``get_divi_data``: Downloads ICU data from German DIVI Intensivregister (DIVI). * ``get_hospitalization_data``: Downloads data about COVID-19 hospitalizations data from Robert Koch-Institut (RKI-H). * ``get_jh_data``: Downloads COVID-19 case data from Johns Hopkins University (JH). * ``get_population_data``: Downloads population data for German federal states and counties from various public sources (P). * ``get_simulation_data``: Downloads all data required for a simulation with the graph-metapopulation model which are SARS-CoV-2 case data(RKI-C), population data (P), ICU data (DIVI) and COVID-19 vaccination data from Robert Koch-Institut (RKI-V). * ``get_testing_data``: Downloads data about SARS-CoV-2 PCR tests from Robert Koch-Institut (RKI-T). * ``get_vaccination_data``: Downloads the RKI vaccination data and provides different kind of structured data. * ``updateMobility2022``: Merges rows and columns of Eisenach to Wartburgkreis which has become one single county by July 2021. * ``createFederalStatesMobility``: Creates mobility matrices for German federal states based on county mobility. * ``transformWeatherData``: Transforms weather data. For a detailed description of the run options and the resulting data files written see the `epidata subfolder `_. The downloaded data is written either to HDF5 or json files. Additional Tools ---------------- Some additional tools for processing or analysing data can be found in `tools directory `_. Notes for developers -------------------- If a new functionality shall be added please stick to the following instructions: When you start creating a new script: - Have a look into getDataIntoPandasDataFrame.py. There the main functionality which should be used is implemented. - get_file is used to read in data. - The Conf class sets relevant download options. - Use write_dataframe to write the pandas dataframe to file. - Use check_dir if you want to create a new folder to write data to - Use the dictionaries in defaultDict.py to rename the existing columns of your data. - Add new column names to one of the existing language dictionaries; English, German and Spanish translations exists at the moment. - For non-english languages always use the EngEng dictionary as the key, thus we can easily change names with just changing one line. - In defaultDict.py a dictionary with id, state and county name, respectively exists. Please use it. - After renaming columns, you should not use pandas dataframe.column but instead use dataframe[column] where column is given by the dictionaries in defaultDict.py. Example: ID_County = dd.GerEng['IdLandkreis'] or dd.EngEng['idCounty']. - For extensive operations use the progress indicator to give feedback for the user. - ALWAYS use Copy-on-Write for pandas DataFrames. - Use doxygen like comments in code as folows: - Add description in the beginning of the file: - ## Header - # @brief name descr - # longer description - Add description in the beginning of every function directly after the definition: - start and end with """ - add a short description to first line - afterwards add a longer description - # @param name of parameter - # @return type description When you add a new script - Add a console script entry to the ``pycode/memilio-epidata/pyproject.toml`` file. - Add it to the cli_dict in getDataIntoPandasDataFrame.py. - Add a meaningful key for the new script. - for the dict value add a list in the form [comment to print when script is started, list of used parser arguments (optional)]. - If more than the default parser should be added, add these parser to the list of used parser. - Add tests. - Add an entry "executablename -h" to the .github/test-py/action.yml. - Add an entry "executablename -o data_dl" to the .github/workflows/main.yml. - Add generated data to cleanData. Adding a new parser: - Add default value to defaultDict in defaultDict.py. - Add to cli_dict in getDataIntoPandasDataFrame.py which scripts use this parser. - Add an if 'new parser' in what_list and add parser.add_argument(). General - Always add unittests. - Check test coverage report, if every new feature is covered. - Check the pylint report just comments with "refactor" are allowed. Troubleshooting --------------- - HDF5 errors during installation (mostly on Windows): one of the dependencies of the epidata package requires HDF5 to be installed on the system. If HDF5 is not discovered properly, this `stack overflow thread `_ may help resolve the issue.