Skip to content

An R package for causal inference by using G-computation

Notifications You must be signed in to change notification settings

chupverse/gcomputation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gcomputation: an R Package for Estimating Marginal Effects Using G-Computation

================

Description

The R package ‘gcomputation’ provides functions to compute G-Computation (GC) to estimate marginal effects. It has estimating marginal functions for binary, time-to-event, continuous and count outcomes regarding two exposures. The package implements GC with various working models or algorithms, referred to as Q-models.

Key Features

gc_binary, gc_times, gc_continuous and gc_count are the main functions, implementing GC to estimate marginal functions for binary, time-to-event, continuous and count outcomes regarding two exposures using a variety of modeling strategies.

The package supports several methods to construct the Q-model:

  • "all": Uses a usual logistic or Cox model, incorporating all variables provided in the formula.
  • "lasso": Implements L1 regularization regression, which can perform predictor selection. It uses the glmnet package.
  • "ridge": Applies L2 regularization regression, also utilizing the glmnet package. This is equivalent to Elastic Net with an alpha value of 0.
  • "elasticnet": Combines both L1 and L2 regularizations regression, also using the glmnet package. The alpha parameter controls the mix between L1 and L2, typically ranging from 0 to 1.
  • "aic": Performs forward selection regression based on the Akaike Information Criterion (AIC), using stepAIC.
  • "bic": Performs forward selection regression based on the Bayesian Information Criterion (BIC), also using stepAIC with k=log(nrow(data)).

The package offers estimation of three types of marginal effects:

  • "ATE" (Average Treatment effect on the entire population): The marginal effect if the entire sample were treated versus entirely untreated.
  • "ATT" (Average Treatment effect on the treated): The marginal effect if the treated patients (group = 1) would have been untreated.
  • "ATU" (Average Treatment effect on the untreated): The marginal effect if the untreated patients (group = 0) would have been treated.

S3 methods are included for objects generated by gc_logistic and gc_survival functions, allowing for:

  • print: To print a summary of the results.
  • summary: To provide a more detailed summary of the prognostic capacities, including confidence intervals.
  • plot: To visualize the results through calibration plots or effect-specific plots (proportion for logistic, survival curve for survival).

Other Exported Functions and Data:

  • transport: Applies an already fitted GC model (an object of class gcbinary, gctimes, gccontinuous or gccount) to a newdata set to estimate marginal effects in a new population.
  • Datasets: Includes the dataPROPHYVAP (simulated randomized clinical trial data) and dataCOHORT (simulated observational cohort data).
  • Multiple Imputations (MI-BOOT): Support for the MI-BOOT approach is added using the boot.mi argument, integrating the mice package for handling missing data.

The package also supports bootstrapping for confidence interval estimation, with options for "bcv" (default) or "boot" types and a default of 500 bootstrap resamples. Users can also control whether tuning parameters are estimated within each bootstrap iteration or on the total population.

Basic Usage

For a binary outcome:

data("dataPROPHYVAP")

.f <- formula(VAP ~ GROUP * (AGE + SEX + BMI + DIABETES))

# 1. Standard execution
# boot.tune=TRUE estimates tuning parameters inside each bootstrap iteration.
gc_bin <- gc_binary(formula=.f, model="ridge", data=dataPROPHYVAP, group="GROUP",
                 cv=10, boot.type="bcv", boot.number=500, boot.tune=TRUE,
                 effect="ATE", progress=TRUE, seed=5192)
gc_bin

# Summary specifying Asymptotic CIs ("norm")
summary(gc_bin, ci.type="norm")

# Calibration plot
plot(gc_bin, method="calibration")


# 2. Execution with multiple imputation
# Uses boot.mi=TRUE and m=5. boot.tune=FALSE to only estimate it once on the complete data set
.f_mi <- formula(VAP ~ GROUP * (AGE + SEX + BMI + DIABETES + GLASGOW + INJURY))
gc_mi <- gc_binary(formula=.f_mi, model="elasticnet", data=dataPROPHYVAP,
                   group="GROUP", cv=10, boot.type="bcv", boot.number=500, boot.tune=FALSE,
                   effect="ATE", progress=TRUE, seed=8051, boot.mi=TRUE, m=5)

# Plotting the calibration curve, smoothed across m imputations
plot(gc_mi, method="calibration", smooth=TRUE) 

# Summary specifying Non-parametric CIs ("perc")
summary(gc_mi, ci.type="perc")


# 3. Transportability
# Define a new dataset (e.g., a subset of younger patients, AGE<=50)
newdata_binary <- subset(dataPROPHYVAP, AGE<=50)

# Transport the fitted gc_bin model to the new dataset
gc_transport <- transport(object=gc_bin, newdata=newdata_binary,
                              boot.number=500)

summary(gc_transport, ci.type="norm")

For a survival outcome:

data(dataPROPHYVAP)

.ft <- formula(Surv(TIME_DEATH, DEATH) ~ GROUP * (AGE + BMI + GLASGOW + LEUKO))

gc_surv <- gc_times(formula=.ft, model="lasso", data=dataPROPHYVAP, group="GROUP",
              param.tune=0.03, boot.type="bcv", boot.number=500, boot.tune=FALSE,
              effect="ATE", pro.time=30, seed=5312)

gc_surv
summary(gc_surv, ci.type="perc")
plot(gc_surv)

Installation

To install the version from GitHub:

remotes::install_github("chupverse/gcomputation")

Reporting bugs

You can report any issues at this link.

About

An R package for causal inference by using G-computation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages