MEmilio Epidata
================

MEmilio Epidata provides modules and scripts to download epidemiological data from various different sources.
The package as well as links to the sources can be found in the `pycode/memilio-epidata <https://github.com/SciCompMod/memilio/blob/main/pycode/memilio-epidata>`_.

Dependencies
------------

Required python packages:

* pandas>=2.0.0
* matplotlib
* tables
* numpy>=1.22, !=1.25.*
* pyarrow
* openpyxl
* xlrd
* requests
* pyxlsb
* wget

Usage
-----

After installation the following functions are available:

* ``clean_data``: Deletes all data files generated by the MEmilio Epidata package.
* ``get_case_data``: Downloads SARS-CoV-2 case data from Robert Koch-Institut (RKI-C).
* ``get_commuter_data``: Computes DataFrame of commuter mobility patterns based on the Federal Agency of Work data.
* ``get_divi_data``: Downloads ICU data from German DIVI Intensivregister (DIVI).
* ``get_hospitalization_data``: Downloads data about COVID-19 hospitalizations data from Robert Koch-Institut (RKI-H).
* ``get_jh_data``: Downloads COVID-19 case data from Johns Hopkins University (JH).
* ``get_population_data``: Downloads population data for German federal states and counties from various public sources (P).
* ``get_simulation_data``: Downloads all data required for a simulation with the graph-metapopulation model which are SARS-CoV-2 case data(RKI-C), population data (P), ICU data (DIVI) and COVID-19 vaccination data from Robert Koch-Institut (RKI-V).
* ``get_testing_data``: Downloads data about SARS-CoV-2 PCR tests from Robert Koch-Institut (RKI-T).
* ``get_vaccination_data``: Downloads the RKI vaccination data and provides different kind of structured data.
* ``updateMobility2022``: Merges rows and columns of Eisenach to Wartburgkreis which has become one single county by July 2021.
* ``createFederalStatesMobility``: Creates mobility matrices for German federal states based on county mobility.
* ``transformWeatherData``: Transforms weather data.

For a detailed description of the run options and the resulting data files written
see the `epidata subfolder <https://github.com/SciCompMod/memilio/blob/main/pycode/memilio-epidata/README.rst>`_.

The downloaded data is written either to HDF5 or json files.

Additional Tools
----------------

Some additional tools for processing or analysing data can be found in `tools directory <https://github.com/SciCompMod/memilio/tree/main/tools>`_.

Notes for developers
--------------------

If a new functionality shall be added please stick to the following instructions:

When you start creating a new script:

- Have a look into getDataIntoPandasDataFrame.py. There the main functionality which should be used is implemented.
    - get_file is used to read in data.
    - The Conf class sets relevant download options.
    - Use write_dataframe to write the pandas dataframe to file.
    - Use check_dir if you want to create a new folder to write data to
- Use the dictionaries in defaultDict.py to rename the existing columns of your data.
    - Add new column names to one of the existing language dictionaries; English, German and Spanish translations exists at the moment.
    - For non-english languages always use the EngEng dictionary as the key, thus we can easily change names with just changing one line.
    - In defaultDict.py a dictionary with id, state and county name, respectively exists. Please use it.
- After renaming columns, you should not use pandas dataframe.column but instead use
  dataframe[column] where column is given by the dictionaries in defaultDict.py.
  Example: ID_County = dd.GerEng['IdLandkreis'] or dd.EngEng['idCounty'].
- For extensive operations use the progress indicator to give feedback for the user.
- ALWAYS use Copy-on-Write for pandas DataFrames.
- Use doxygen like comments in code as folows:
    - Add description in the beginning of the file:
        - ## Header
        - # @brief name descr
        - # longer description
    - Add description in the beginning of every function directly after the definition:
        - start and end with """
        - add a short description to first line
        - afterwards add a longer description
        - # @param name of parameter
        - # @return type description

When you add a new script

- Add a console script entry to the ``pycode/memilio-epidata/pyproject.toml`` file.
- Add it to the cli_dict in getDataIntoPandasDataFrame.py.
    - Add a meaningful key for the new script.
    - for the dict value add a list in the form [comment to print when script is started, list of used parser arguments (optional)].
    - If more than the default parser should be added, add these parser to the  list of used parser.
- Add tests.
- Add an entry "executablename -h" to the .github/test-py/action.yml.
- Add an entry "executablename -o data_dl" to the .github/workflows/main.yml.
- Add generated data to cleanData.

Adding a new parser:

- Add default value to defaultDict in defaultDict.py.
- Add to cli_dict in getDataIntoPandasDataFrame.py which scripts use this parser.
- Add an if 'new parser' in what_list and add parser.add_argument().

General
- Always add unittests.
- Check test coverage report, if every new feature is covered.
- Check the pylint report just comments with "refactor" are allowed.

Troubleshooting
---------------

- HDF5 errors during installation (mostly on Windows): one of the dependencies of the epidata package requires HDF5 to 
be installed on the system. If HDF5 is not discovered properly, this `stack overflow thread <https://stackoverflow.com/a/67765023/1151582>`_ 
may help resolve the issue.