gcomputation: an R Package for Estimating Marginal Effects Using G-Computation
================
The R package ‘gcomputation’ provides functions to compute G-Computation (GC) to estimate marginal effects. It has estimating marginal functions for binary, time-to-event, continuous and count outcomes regarding two exposures. The package implements GC with various working models or algorithms, referred to as Q-models.
gc_binary, gc_times, gc_continuous and gc_count are the main functions, implementing GC to estimate marginal functions for binary, time-to-event, continuous and count outcomes regarding two exposures using a variety of modeling strategies.
The package supports several methods to construct the Q-model:
- "all": Uses a usual logistic or Cox model, incorporating all variables provided in the formula.
- "lasso": Implements L1 regularization regression, which can perform predictor selection. It uses the glmnet package.
- "ridge": Applies L2 regularization regression, also utilizing the glmnet package. This is equivalent to Elastic Net with an alpha value of 0.
- "elasticnet": Combines both L1 and L2 regularizations regression, also using the glmnet package. The alpha parameter controls the mix between L1 and L2, typically ranging from 0 to 1.
- "aic": Performs forward selection regression based on the Akaike Information Criterion (AIC), using stepAIC.
- "bic": Performs forward selection regression based on the Bayesian Information Criterion (BIC), also using stepAIC with
k=log(nrow(data)).
The package offers estimation of three types of marginal effects:
- "ATE" (Average Treatment effect on the entire population): The marginal effect if the entire sample were treated versus entirely untreated.
- "ATT" (Average Treatment effect on the treated): The marginal effect if the treated patients (group = 1) would have been untreated.
- "ATU" (Average Treatment effect on the untreated): The marginal effect if the untreated patients (group = 0) would have been treated.
S3 methods are included for objects generated by gc_logistic and gc_survival functions, allowing for:
print: To print a summary of the results.summary: To provide a more detailed summary of the prognostic capacities, including confidence intervals.plot: To visualize the results through calibration plots or effect-specific plots (proportion for logistic, survival curve for survival).
Other Exported Functions and Data:
transport: Applies an already fitted GC model (an object of classgcbinary,gctimes,gccontinuousorgccount) to anewdataset to estimate marginal effects in a new population.- Datasets: Includes the dataPROPHYVAP (simulated randomized clinical trial data) and dataCOHORT (simulated observational cohort data).
- Multiple Imputations (MI-BOOT): Support for the MI-BOOT approach is added using the
boot.miargument, integrating themicepackage for handling missing data.
The package also supports bootstrapping for confidence interval estimation, with options for "bcv" (default) or "boot" types and a default of 500 bootstrap resamples. Users can also control whether tuning parameters are estimated within each bootstrap iteration or on the total population.
For a binary outcome:
data("dataPROPHYVAP")
.f <- formula(VAP ~ GROUP * (AGE + SEX + BMI + DIABETES))
# 1. Standard execution
# boot.tune=TRUE estimates tuning parameters inside each bootstrap iteration.
gc_bin <- gc_binary(formula=.f, model="ridge", data=dataPROPHYVAP, group="GROUP",
cv=10, boot.type="bcv", boot.number=500, boot.tune=TRUE,
effect="ATE", progress=TRUE, seed=5192)
gc_bin
# Summary specifying Asymptotic CIs ("norm")
summary(gc_bin, ci.type="norm")
# Calibration plot
plot(gc_bin, method="calibration")
# 2. Execution with multiple imputation
# Uses boot.mi=TRUE and m=5. boot.tune=FALSE to only estimate it once on the complete data set
.f_mi <- formula(VAP ~ GROUP * (AGE + SEX + BMI + DIABETES + GLASGOW + INJURY))
gc_mi <- gc_binary(formula=.f_mi, model="elasticnet", data=dataPROPHYVAP,
group="GROUP", cv=10, boot.type="bcv", boot.number=500, boot.tune=FALSE,
effect="ATE", progress=TRUE, seed=8051, boot.mi=TRUE, m=5)
# Plotting the calibration curve, smoothed across m imputations
plot(gc_mi, method="calibration", smooth=TRUE)
# Summary specifying Non-parametric CIs ("perc")
summary(gc_mi, ci.type="perc")
# 3. Transportability
# Define a new dataset (e.g., a subset of younger patients, AGE<=50)
newdata_binary <- subset(dataPROPHYVAP, AGE<=50)
# Transport the fitted gc_bin model to the new dataset
gc_transport <- transport(object=gc_bin, newdata=newdata_binary,
boot.number=500)
summary(gc_transport, ci.type="norm")For a survival outcome:
data(dataPROPHYVAP)
.ft <- formula(Surv(TIME_DEATH, DEATH) ~ GROUP * (AGE + BMI + GLASGOW + LEUKO))
gc_surv <- gc_times(formula=.ft, model="lasso", data=dataPROPHYVAP, group="GROUP",
param.tune=0.03, boot.type="bcv", boot.number=500, boot.tune=FALSE,
effect="ATE", pro.time=30, seed=5312)
gc_surv
summary(gc_surv, ci.type="perc")
plot(gc_surv)To install the version from GitHub:
remotes::install_github("chupverse/gcomputation")You can report any issues at this link.