Causal Mediation Analysis with Time-Varying Exposures, Mediators, and Confounders
causalMed implements the parametric g-formula for total effect
estimation and extends it with the survival mediational g-formula to
decompose causal effects into direct and indirect components. The
package supports:
- Interventional direct and indirect effects (IDE/IIE) — Lin et al. (2017)
- Natural direct and indirect effects (NDE/NIE) — Zheng & van der Laan (2017)
Both approaches handle time-varying exposures, mediators, and
confounders in longitudinal data, including survival outcomes. The
standard g-formula component (total effect estimation) is
cross-validated against the gfoRmula CRAN package.
# Install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("adayim/causalMed")- G-formula for total effects: Monte Carlo simulation under user-defined static or dynamic (threshold) interventions for binary, continuous, and survival outcomes
- Mediational g-formula: decomposes total effects into direct and indirect pathways with time-varying mediators and exposure-induced mediator–outcome confounders
- Two mediation estimands:
mediation_type = "I": interventional IDE/IIE (VanderWeele & Tchetgen Tchetgen 2017; Lin et al. 2017; Yamamuro et al. 2021) — cross-world mediator drawn as a joint M(1:T) trajectory by row-permuting the reference (a*) cohort, matching the SASmGFORMULAmacro and the algorithm of Yamamuro et al. 2021. Does not require cross-world independence. Multiple mediator models are supported under this estimand (Yamamuro 2021 §4); per-mediator IIE(M_k) is reported alongside the overall IDE/TE.mediation_type = "N": natural NDE/NIE (Zheng & van der Laan 2017) — conditional mediator distribution with exposure swapping; requires stronger assumptions (single mediator only)
- Flexible model specification: logistic regression (binary), linear regression (normal), multinomial logistic (categorical), and custom simulation functions
- Recode hooks (
init_recode,in_recode,out_recode) for lag creation, cumulative variables, and other within-loop transformations - Bootstrap confidence intervals: non-parametric bootstrap resampling individuals (preserving longitudinal structure), with percentile and normal-approximation CIs
- Parallel bootstrap: plug in
future::plan(multisession)on Windows orfuture::plan(multicore)on Unix to parallelise across bootstrap replicates
library(causalMed)
data("nonsurvivaldata", package = "causalMed")
# Manage lagged variables
init_rc <- recodes(lag1_A = 0, # At t=0, all lags initialised to 0
lag1_L1 = 0,
lag1_L2 = 0)
in_rc <- recodes(lag1_A = A, # At each subsequent step, copy current values
lag1_L1 = L1,
lag1_L2 = L2)
# Specify models in temporal order: L1 → L2 → A → Y
m_L1 <- spec_model(L1 ~ lag1_A + lag1_L1 + V + time,
var_type = "normal", mod_type = "covariate")
m_L2 <- spec_model(L2 ~ lag1_A + lag1_L2 + V + time,
var_type = "binary", mod_type = "covariate")
m_A <- spec_model(A ~ lag1_A + L1 + L2 + V + time,
var_type = "binary", mod_type = "exposure")
m_Y <- spec_model(Y_bin ~ A + L1 + L2,
var_type = "binary", mod_type = "outcome")
models_bin <- list(m_L1, m_L2, m_A, m_Y)
# Define intervention strategies
# NULL = natural course (draw exposure from its fitted model)
# 1 / 0 = always treat / never treat
ints <- list(natural = NULL, always_treat = 1, never_treat = 0)
# ── 3. Run g-formula ────────────────────────────────────────────────────────
fit_bin <- gformula(
data = nonsurvivaldata,
id_var = "id",
time_var = "time",
base_vars = "V",
exposure = "A",
models = models_bin,
intervention = ints,
ref_int = "natural",
init_recode = init_rc,
in_recode = in_rc,
mc_sample = 10000,
R = 100, # set R > 1 for bootstrap CIs; kept low here for speed
quiet = TRUE,
seed = 2025
)
#> Warning: package 'future' was built under R version 4.5.3
print(fit_bin)
#> Call:
#> gformula(data = nonsurvivaldata, id_var = "id", base_vars = "V",
#> exposure = "A", time_var = "time", models = models_bin, intervention = ints,
#> ref_int = "natural", init_recode = init_rc, in_recode = in_rc,
#> mc_sample = 10000, R = 100, quiet = TRUE, seed = 2025)
#>
#> --- Analysis setup ---
#> Exposure : A
#> Time variable: time
#> ID variable : id
#> Baseline vars: V
#> MC sample : 10000
#> Bootstrap R : 100
#> Seed : 2025
#> Reference : natural
#>
#> --- Mean outcome by intervention ---
#> Intervention Est Sd RD 2.5%(pct) RD 97.5%(pct) RD 2.5%(norm)
#> <fctr> <num> <num> <num> <num> <num>
#> 1: natural 0.1405 0.0067 0.1296 0.1547 0.1275
#> 2: always_treat 0.1508 0.0074 0.1380 0.1652 0.1362
#> 3: never_treat 0.0867 0.0139 0.0617 0.1137 0.0596
#> RD 97.5%(norm)
#> <num>
#> 1: 0.1536
#> 2: 0.1654
#> 3: 0.1139
#>
#> --- Contrasts vs. reference intervention ---
#> Intervention Risk_type Estimate Sd RD 2.5%(pct) RD 97.5%(pct)
#> <char> <char> <num> <num> <num> <num>
#> 1: always_treat - natural Difference 0.0103 0.0024 0.0054 0.0146
#> 2: always_treat / natural Ratio 1.0731 0.0170 1.0395 1.1024
#> 3: never_treat - natural Difference -0.0538 0.0132 -0.0780 -0.0258
#> 4: never_treat / natural Ratio 0.6171 0.0929 0.4514 0.8109
#> RD 2.5%(norm) RD 97.5%(norm)
#> <num> <num>
#> 1: 0.0055 0.0150
#> 2: 1.0398 1.1065
#> 3: -0.0797 -0.0279
#> 4: 0.4350 0.7992library(causalMed)
data("nonsurvivaldata", package = "causalMed")
# Model list must include a mediator model (mod_type = "mediator")
# List order must reflect the temporal DAG: A -> L -> M -> Y
init_med <- recodes(lag1_A = 0, lag1_L1 = 0, lag1_L2 = 0, lag1_M = 0)
in_med <- recodes(lag1_A = A, lag1_L1 = L1, lag1_L2 = L2, lag1_M = M)
models_med <- list(
spec_model(A ~ V + lag1_L1 + lag1_L2 + lag1_A + time,
var_type = "binary", mod_type = "exposure"),
spec_model(L1 ~ V + A + lag1_L1 + time,
var_type = "normal", mod_type = "covariate"),
spec_model(L2 ~ V + A + lag1_L2 + time,
var_type = "binary", mod_type = "covariate"),
spec_model(M ~ V + A + L1 + L2 + lag1_M + time,
var_type = "normal", mod_type = "mediator"), # <-- mediator
spec_model(Y_bin ~ V + A + M + L1 + L2,
var_type = "binary", mod_type = "outcome")
)
fit_med <- mediation(
data = nonsurvivaldata,
id_var = "id",
time_var = "time",
base_vars = "V",
exposure = "A",
outcome = "Y_bin",
models = models_med,
init_recode = init_med,
in_recode = in_med,
mediation_type = "I", # Interventional IDE/IIE
mc_sample = 10000,
R = 100,
quiet = TRUE,
seed = 2025
)
print(fit_med)
#> Call:
#> mediation(data = nonsurvivaldata, id_var = "id", base_vars = "V",
#> exposure = "A", outcome = "Y_bin", time_var = "time", models = models_med,
#> init_recode = init_med, in_recode = in_med, mc_sample = 10000,
#> mediation_type = "I", R = 100, quiet = TRUE, seed = 2025)
#>
#> --- Analysis setup ---
#> Exposure : A
#> Time variable: time
#> ID variable : id
#> Baseline vars: V
#> MC sample : 10000
#> Bootstrap R : 100
#> Seed : 2025
#> Mediation : Interventional effects (IDE/IIE) -- Lin et al. (2017)
#>
#> --- Marginal mean outcome per arm (Q-functionals) ---
#> Interventional arms draw mediators from independently-permuted pools (G):
#> Phi11 = E[Y(a=1, G1)]: exposure=1, mediators ~ a=1 pool [reference]
#> Phi10 = E[Y(a=1, G0)]: exposure=1, mediators ~ a=0 pool [cross-world]
#> Phi00 = E[Y(a=0, G0)]: exposure=0, mediators ~ a=0 pool [reference]
#> nat1/nat0 = E[Y(a=1)]/E[Y(a=0)]: natural course (used for the total effect)
#> Intervention Est Sd RD 2.5%(pct) RD 97.5%(pct) RD 2.5%(norm)
#> <char> <num> <num> <num> <num> <num>
#> 1: nat0 0.0847 0.0135 0.0607 0.1128 0.0582
#> 2: nat1 0.1519 0.0075 0.1383 0.1661 0.1371
#> 3: Phi00 0.0848 0.0135 0.0608 0.1126 0.0584
#> 4: Phi10 0.1456 0.0074 0.1329 0.1586 0.1312
#> 5: Phi11 0.1517 0.0075 0.1389 0.1662 0.1370
#> RD 97.5%(norm)
#> <num>
#> 1: 0.1112
#> 2: 0.1667
#> 3: 0.1112
#> 4: 0.1600
#> 5: 0.1664
#>
#> --- Effect decomposition ---
#> Direct effect (IDE) = Phi10 - Phi00
#> Indirect effect (IIE) = Phi11 - Phi10 (sequential per mediator when N>=2)
#> IDE + IIE = Phi11 - Phi00 (interventional overall effect)
#> Total effect (TE) = nat1 - nat0 (natural plug-in g-formula)
#> TE - (Direct+Indirect)= mediated-interaction residual (TE - overall)
#> Mediation Prop. = (Total - Direct) / Total (percentage; RR not applicable)
#> RD = risk difference; RR = risk ratio
#> Effect RD RR Sd RD 2.5%(pct)
#> <char> <num> <num> <num> <num>
#> 1: Indirect effect 0.0061 1.0421 0.0021 0.0025
#> 2: Direct effect 0.0608 1.7168 0.0150 0.0287
#> 3: Total effect 0.0672 1.7931 0.0153 0.0323
#> 4: TE - (Direct + Indirect) 0.0003 NA 0.0004 -0.0009
#> 5: Mediation Proportion 9.5299 NA 3.7089 4.0617
#> 6: Mediation Proportion (multiplicative) 9.1585 NA 3.7171 4.0309
#> RD 97.5%(pct) Sd_RR RR 2.5%(pct) RR 97.5%(pct) RD 2.5%(norm) RD 97.5%(norm)
#> <num> <num> <num> <num> <num> <num>
#> 1: 0.0108 0.0149 1.0178 1.0732 0.0020 0.0103
#> 2: 0.0856 0.2924 1.2632 2.3606 0.0313 0.0903
#> 3: 0.0916 0.3076 1.2961 2.4283 0.0371 0.0973
#> 4: 0.0007 NA NA NA -0.0005 0.0011
#> 5: 17.6448 NA NA NA 2.2606 16.7991
#> 6: 17.7527 NA NA NA 1.8732 16.4438
#> RR 2.5%(norm) RR 97.5%(norm)
#> <num> <num>
#> 1: 1.0129 1.0713
#> 2: 1.1437 2.2899
#> 3: 1.1903 2.3959
#> 4: NA NA
#> 5: NA NA
#> 6: NA NAThe estimate component of fit_med contains:
| Effect | Definition |
|---|---|
| Indirect effect | Q(1,1) − Q(1,0): effect through the mediator pathway |
| Direct effect | Q(1,0) − Q(0,0): effect not through the mediator |
| Total effect | natural plug-in g-formula contrast E[Y₁] − E[Y₀] |
| TE − (Direct + Indirect) | mediated-interaction residual (interventional only; exactly 0, and omitted, for natural effects) |
| Mediation proportion | (Total − Direct) / Total × 100% |
For mediation_type = "I", Q(a₁, a₂) is the mean outcome when exposure
is set to a₁ and each mediator is a stochastic draw from its
marginal distribution under exposure a₂. The direct and indirect effects
then sum to the interventional overall effect Q(1,1) − Q(0,0), which
generally differs from the total effect; their gap is the residual row.
For mediation_type = "N" the references use each subject’s own
mediator and the decomposition sums exactly to the total effect (no
residual row).
Under mediation_type = "I", the mediators in every arm — the
references Q(0,0) and Q(1,1) as well as the cross-world Q(1,0) — are
drawn as joint trajectories: a natural-course cohort is simulated
under each a*, and the arm assigns each subject the full M(1:T) of a
randomly permuted reference-cohort individual (each mediator permuted
independently). This matches the SAS mGFORMULA macro (Lin et al. 2017,
eAppendix) and Yamamuro et al. 2021 (Figure 3, step 3). Mediator values
are not survival-weighted; the full reference cohort is used. See
?mediation for details.
Change one argument:
fit_natural <- mediation(
..., # same arguments as above
mediation_type = "N" # natural NDE/NIE (Zheng & van der Laan 2017)
)Natural effects condition the mediator model on the individual’s own covariate history but evaluate it at the alternative exposure level (exposure swapping). They require stronger sequential no-unmeasured-confounding assumptions than interventional effects.
library(future)
plan(multisession) # Windows; use plan(multicore) on Unix/macOS
fit_parallel <- mediation(..., R = 500)
plan(sequential) # reset after useInput data must be in long format with one row per subject per time
point. The time variable should be an ordered integer starting at 0.
Variables used in models as lags (e.g., A_lag1) must be pre-computed
and present in the data, or created via init_recode / in_recode
hooks. The nonsurvivaldata bundled with the package illustrates the
required structure.
data("nonsurvivaldata", package = "causalMed")
head(nonsurvivaldata)
#> id time V L A M Y M_lag1 L_lag1 A_lag1
#> 1 1 0 1 -0.4557851 0 0.1847927 0 0 0.000000 0
#> 2 1 1 1 0.4651968 0 0.9559358 0 0 -0.455785 0
#> 3 1 2 1 0.2498987 1 1.3503895 0 1 0.465197 0-
Westreich, D., Cole, S. R., Young, J. G., et al. (2012). The parametric g-formula to estimate the effect of HAART on incident AIDS or death. Statistics in Medicine, 31, 2000–2009. doi:10.1002/sim.5316
-
McGrath, S., Lin, V., Zhang, Z., et al. (2020). gfoRmula: An R Package for Estimating the Effects of Sustained Treatment Strategies via the Parametric g-Formula. Patterns, 1, 100008. doi:10.1016/j.patter.2020.100008
-
Lin, S. H., Young, J. G., Logan, R., & VanderWeele, T. J. (2017). Mediation analysis for a survival outcome with time-varying exposures, mediators, and confounders. Statistics in Medicine, 36(26), 4153–4166. doi:10.1002/sim.7426
-
Zheng, W., & van der Laan, M. (2017). Longitudinal mediation analysis with time-varying mediators and exposures, with application to survival outcomes. Journal of Causal Inference, 5(2). doi:10.1515/jci-2016-0006
-
VanderWeele, T. J., & Tchetgen Tchetgen, E. J. (2017). Mediation analysis with time varying exposures and mediators. Journal of the Royal Statistical Society: Series B, 79(3), 917–938. doi:10.1111/rssb.12194
-
Yamamuro, S., Shinozaki, T., Iimuro, S., & Matsuyama, Y. (2021). Mediational g-formula for time-varying treatment and repeated-measured multiple mediators: Application to atorvastatin’s effect on cardiovascular disease via cholesterol lowering and anti-inflammatory actions in elderly type 2 diabetics. Statistical Methods in Medical Research, 30(8), 1782–1799. doi:10.1177/09622802211025988