Skip to content

harmonize-tools/data4health

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

data4health

Lifecycle: maturing License: GPL v3

Overview

data4health is a tool developed as part of the HARMONIZE project to facilitate the access, preprocessing, and aggregation of health data at customized spatiotemporal resolutions. Originally designed for data from Colombia, Brazil, Peru, and the Dominican Republic, the tool is intended to be adaptable for any linelist health data.

The R package and offers two modes of operation based on the user's coding experience:

  • For users with coding experience: A wide range of functions can be directly used within R.
  • For non-coding users: A graphical user interface (GUI) guides users through the data processing pipeline in an intuitive, user-friendly way.

Key Features of the R Package:

  • Instructions on how to access health data
  • Functions for cleaning and preprocessing health data
  • Spatial harmonization, allowing aggregation to any coarser administrative unit
  • Temporal harmonization, enabling aggregation to epidemiological weeks or months
  • Data visualization capabilities
  • Output as a .csv file, formatted to meet user-specified requirements

Dependencies

packages <- c("foreign", "readxl", "writexl", "shiny", "jsonlite")
install.packages(setdiff(packages, rownames(installed.packages())), repos = "http://cran.us.r-project.org")

Installation

Since the package is not yet published, you need to get in contact with one of the developers and request a tarball of the package. Then you could install it with the following line:

install.packages("/local/path/to/R-packages/harmonize.data4health_0.0.0.9000.tar.gz", repos = NULL, type="source")

How to Use it

There are two main functionalities of data4health. For code-experienced users, a series a functions to support health data analysis are provided that users can implement to simplify their existing data pipeline. Users with less code experience can employ the graphic user interface to clean and aggregate their data in a user friendly way.

Functions

Loading

This function loads a dataframe from a file to an dataframe in the R environment. It is not necessary to use this function, you could also load a dataframe on your own. Currently it accepts .csv, .rds, .xls, .xlsx, and .dbf files.

data_loaded <- data4health_load("path/data.csv")

It is also possible to load multiple files (by passing list of filenames) into one dataframe, in this case all column names need to match.

Cleaning

Using the following functions, you can

  • cols_to_remove / cols_to_include: to pass a vector of the column names to be be removed or to be included respectively.
  • remove_cols_missing: remove columns that have missing data above a certain threshold
  • remove_rows_missing: remove entries where certain column has a missing value (e.g. delete all entries that have no date)
  • remove_rows_threshold: removes certain rows based on threshold based values (works similar as data4health_filter)
  • rename columns: to rename columns
  • rename_values: to rename values within different columns
  • week_to_date: convert a date from epiweek to a Date object
  • date_to_week: convert a date to the first date of the epiweek
  • date_to_month: convert a date to the first date of the month.
data_cleaned <- data4health_clean(data = data_loaded,
                                  cols_to_include = c("DT_NOTIFIC", "ID_MUNICIP", "CS_SEXO"),
                                  remove_rows_missing = c("DT_NOTIFIC"),
                                  rename_columns = c(DT_NOTIFIC = "notification_date",
                                                     ID_MUNICIP = "municipality_code",
                                                     CS_SEXO = "sex"),
                                  date_to_week = "notification_date")

You can add save = TRUE to permanently save the resultant dataframe to your local disk.

Filtering

You can filter any column , passing a list specyfying how you want to filter. The possibilities are:

  • numeric: "over", "under", "between"
  • Date: "after","before", "during"
  • chararcter: "include", "exclude"
data_filtered <- data4health_filter(data = data_cleaned,
                                    municipality_code = list(include = c("312710")),
                                    sex = list(include = c("F")),
                                    notification_date = list(during = c("2018-01-01","2018-12-31")))

Aggregating

It is possible to aggregate the data temporally and spatially using the data4health_aggregate() function. The function by which to aggregate

  • space_col:selects the column by which to spatially aggregate by
  • time_col: select the column by which to temporally aggregate by
  • add_col: selects any additional column(s) by which you would like to aggreagte

To avoid any missing timesteps or missing regions, you can also pass any of the following

  • all_times: a vector of all timesteps (highly recommended to use the seq() function to indicate, start date, end date and timestep to use)
  • all_spaces: a vector that contains all regions
data_aggregated <- data4health_aggregate(data= data_cleaned,
                                         space_col = "municipality_code",
                                         time_col = "notification_date_week")

Visualise

To visualise the results it is recommended to GHRexplore. Here are a few example plots.

Yet to come!
Graphic user interface

Load GUI

Once data4health is loaded, the user interface can be loaded with the following command:

data4health_ui()

A browser window will automatically open. There you can see several tabs:

Clean

In the cleaning tab, you can perform all cleaning steps that can be performed with data4health_clean, however every step is explained, and there are graphs to show the content of the data.

Aggregate

Aggregation does the same as data4health_aggregate().

Visualise

Finally, within the visualisation tab, you can visualise using plots produced the the GHRexplore function.

Resources

Project Website

Harmonize is an international develop cost-effective and reproducible digital tools for stakeholders in hotspots affected by a changing climate in Latin America & the Caribbean (LAC), including cities, small islands, highlands, and the Amazon rainforest.

The HARMONIZE digital toolkits will allow local researchers and users, including national disease control programs, to link, interrogate and use multi-scale spatiotemporal data, to understand the links between environmental change and infectious disease risk in their local context, and to build robust early warning and response systems in low-resource settings.

The project consists of resources and tools developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru and Spain.

Other HARMONIZE tools

Within HARMONIZE, each data source has its own digital toolkit to allow local researchers and users, to prepare, interrogate and eventually merge the data spatio-temporally, to understand the links between environmental change and infectious disease risk in their local context, and to build robust early warning and response systems in low-resource settings. the other toolkits are:

CRAN Website

The example website package website includes a function reference, a model outline, and case studies using the package. The site mainly concerns the release version, but you can also find documentation for the latest development version.

Organizations

dplyr logo GHR
Global Health Resilience

Authors / Contact information

List the authors/contributors of the package and provide contact information if users have questions or feedback.

Daniela Daniela Lührsen ORCID
AI4S Fellow – Health & Climate Data Scientist
Barcelona Supercomputing Center
Global Health Resilience
Climate & Health Data Scientist
Raquel Raquel Martins Lana ORCID
Marie Curie Fellow – Recognised Researcher
Barcelona Supercomputing Center
Global Health Resilience
Recognized Researcher

Citation

  • APA Format:
    • TBD

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages