data4health

Overview

data4health is a tool developed as part of the HARMONIZE project to facilitate the access, preprocessing, and aggregation of health data at customized spatiotemporal resolutions. Originally designed for data from Colombia, Brazil, Peru, and the Dominican Republic, the tool is intended to be adaptable for any linelist health data.

The R package and offers two modes of operation based on the user's coding experience:

For users with coding experience: A wide range of functions can be directly used within R.
For non-coding users: A graphical user interface (GUI) guides users through the data processing pipeline in an intuitive, user-friendly way.

Key Features of the R Package:

Instructions on how to access health data
Functions for cleaning and preprocessing health data
Spatial harmonization, allowing aggregation to any coarser administrative unit
Temporal harmonization, enabling aggregation to epidemiological weeks or months
Data visualization capabilities
Output as a .csv file, formatted to meet user-specified requirements

Dependencies

packages <- c("foreign", "readxl", "writexl", "shiny", "jsonlite")
install.packages(setdiff(packages, rownames(installed.packages())), repos = "http://cran.us.r-project.org")

Installation

Since the package is not yet published, you need to get in contact with one of the developers and request a tarball of the package. Then you could install it with the following line:

install.packages("/local/path/to/R-packages/harmonize.data4health_0.0.0.9000.tar.gz", repos = NULL, type="source")

How to Use it

There are two main functionalities of data4health. For code-experienced users, a series a functions to support health data analysis are provided that users can implement to simplify their existing data pipeline. Users with less code experience can employ the graphic user interface to clean and aggregate their data in a user friendly way.

Functions

Loading

This function loads a dataframe from a file to an dataframe in the R environment. It is not necessary to use this function, you could also load a dataframe on your own. Currently it accepts .csv, .rds, .xls, .xlsx, and .dbf files.

data_loaded <- data4health_load("path/data.csv")

It is also possible to load multiple files (by passing list of filenames) into one dataframe, in this case all column names need to match.

Cleaning

Using the following functions, you can

cols_to_remove / cols_to_include: to pass a vector of the column names to be be removed or to be included respectively.
remove_cols_missing: remove columns that have missing data above a certain threshold
remove_rows_missing: remove entries where certain column has a missing value (e.g. delete all entries that have no date)
remove_rows_threshold: removes certain rows based on threshold based values (works similar as data4health_filter)
rename columns: to rename columns
rename_values: to rename values within different columns
week_to_date: convert a date from epiweek to a Date object
date_to_week: convert a date to the first date of the epiweek
date_to_month: convert a date to the first date of the month.

data_cleaned <- data4health_clean(data = data_loaded,
                                  cols_to_include = c("DT_NOTIFIC", "ID_MUNICIP", "CS_SEXO"),
                                  remove_rows_missing = c("DT_NOTIFIC"),
                                  rename_columns = c(DT_NOTIFIC = "notification_date",
                                                     ID_MUNICIP = "municipality_code",
                                                     CS_SEXO = "sex"),
                                  date_to_week = "notification_date")

You can add save = TRUE to permanently save the resultant dataframe to your local disk.

Filtering

You can filter any column , passing a list specyfying how you want to filter. The possibilities are:

numeric: "over", "under", "between"
Date: "after","before", "during"
chararcter: "include", "exclude"

data_filtered <- data4health_filter(data = data_cleaned,
                                    municipality_code = list(include = c("312710")),
                                    sex = list(include = c("F")),
                                    notification_date = list(during = c("2018-01-01","2018-12-31")))

Aggregating

It is possible to aggregate the data temporally and spatially using the data4health_aggregate() function. The function by which to aggregate

space_col:selects the column by which to spatially aggregate by
time_col: select the column by which to temporally aggregate by
add_col: selects any additional column(s) by which you would like to aggreagte

To avoid any missing timesteps or missing regions, you can also pass any of the following

all_times: a vector of all timesteps (highly recommended to use the seq() function to indicate, start date, end date and timestep to use)
all_spaces: a vector that contains all regions

data_aggregated <- data4health_aggregate(data= data_cleaned,
                                         space_col = "municipality_code",
                                         time_col = "notification_date_week")

Visualise

To visualise the results it is recommended to GHRexplore. Here are a few example plots.

Yet to come!

Graphic user interface

Load GUI

Once data4health is loaded, the user interface can be loaded with the following command:

data4health_ui()

A browser window will automatically open. There you can see several tabs:

Clean

In the cleaning tab, you can perform all cleaning steps that can be performed with data4health_clean, however every step is explained, and there are graphs to show the content of the data.

Aggregate

Aggregation does the same as data4health_aggregate().

Visualise

Finally, within the visualisation tab, you can visualise using plots produced the the GHRexplore function.

Resources

Project Website

Harmonize is an international develop cost-effective and reproducible digital tools for stakeholders in hotspots affected by a changing climate in Latin America & the Caribbean (LAC), including cities, small islands, highlands, and the Amazon rainforest.

The HARMONIZE digital toolkits will allow local researchers and users, including national disease control programs, to link, interrogate and use multi-scale spatiotemporal data, to understand the links between environmental change and infectious disease risk in their local context, and to build robust early warning and response systems in low-resource settings.

The project consists of resources and tools developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru and Spain.

Other HARMONIZE tools

Within HARMONIZE, each data source has its own digital toolkit to allow local researchers and users, to prepare, interrogate and eventually merge the data spatio-temporally, to understand the links between environmental change and infectious disease risk in their local context, and to build robust early warning and response systems in low-resource settings. the other toolkits are:

CRAN Website

The example website package website includes a function reference, a model outline, and case studies using the package. The site mainly concerns the release version, but you can also find documentation for the latest development version.

Organizations

GHR
Global Health Resilience

Authors / Contact information

List the authors/contributors of the package and provide contact information if users have questions or feedback.

	Daniela Lührsen AI4S Fellow – Health & Climate Data Scientist Barcelona Supercomputing Center Global Health Resilience Climate & Health Data Scientist
	Raquel Martins Lana Marie Curie Fellow – Recognised Researcher Barcelona Supercomputing Center Global Health Resilience Recognized Researcher

Citation

APA Format:
- TBD

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
harmonize_readme		harmonize_readme
hooks		hooks
img		img
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data4health

Overview

Dependencies

Installation

How to Use it

Loading

Cleaning

Filtering

Aggregating

Visualise

Load GUI

Clean

Aggregate

Visualise

Resources

Project Website

Other HARMONIZE tools

CRAN Website

Organizations

Authors / Contact information

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

harmonize-tools/data4health

Folders and files

Latest commit

History

Repository files navigation

data4health

Overview

Dependencies

Installation

How to Use it

Loading

Cleaning

Filtering

Aggregating

Visualise

Load GUI

Clean

Aggregate

Visualise

Resources

Project Website

Other HARMONIZE tools

CRAN Website

Organizations

Authors / Contact information

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages