StatsOtter Causal inference workflows
11
Workflow·4 steps

Bayesian principal stratification (PStrata)

Summary by StatsOtter

Bayesian mixture modeling for principal stratification: causal effects within latent strata (e.g. compliers) under post-treatment confounding.

1

Input · what goes in

One row per unit: treatment Z, a post-treatment intermediate D, outcome Y, and any covariates.

Show data format & exampleHide example
Z D Y
1 1 2.31
0 0 -0.4
1 0 0.88
0 0 1.05
2

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Specify the principal stratification model

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

PStrataModel() declares the strata from the (Z, D) pattern - never-takers '00', compliers '01', always-takers '11' - the outcome family, priors, and which strata obey the exclusion restriction.

Formula
Y_i(1)=Y_i(0)\ \text{for}\ S_i\in\{n,a\}\quad(\text{exclusion restriction})
Reads from the input data Feeds into #2
Key code
# Install:  install.packages("PStrata")
library(PStrata)

model <- PStrataModel(
  S.formula = Z + D ~ 1,
  Y.formula = Y ~ 1,
  Y.family  = gaussian(link = "identity"),
  strata    = c(n = "00", c = "01", a = "11"),
  ER        = c("n", "a"),
  prior_intercept = prior_normal(0, 1),
  prior_sigma     = prior_inv_gamma(1)
)
summary(model)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Estimation

Fit the model with MCMC

The core estimate — where the causal quantity itself is computed.

What happens here

fit() compiles the model to Stan and samples the posterior over stratum membership probabilities and outcome parameters across multiple chains.

Formula
\pi_s=\Pr(S_i=s\mid X_i),\quad \sum_{s}\pi_s=1
Reads from #1 Feeds into #3
Key code
ps_fit <- fit(model, data = sim_data_normal,
              chains = 4, warmup = 500, iter = 1000)
ps_fit
diagnostics(ps_fit)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Inference

Estimate stratum-specific potential outcomes

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

estimate() summarizes the posterior mean potential outcomes E[Y(z)] within each principal stratum as a tidy data frame.

Formula
\mathbb E[Y_i(z)\mid S_i=s]
Reads from #2 Feeds into #4
Key code
est <- estimate(ps_fit)
summary(est, "data.frame")
plot(est)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
4
Inference

Contrast treatment effects within strata

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

contrast() with Z=TRUE forms the within-stratum causal effect Y(1)-Y(0) - notably the complier average causal effect - with full posterior summaries.

Formula
\mathrm{CACE}=\mathbb E[Y_i(1)-Y_i(0)\mid S_i=c]
Reads from #3 Feeds into the final output
Key code
ctr <- contrast(ps_fit, Z = TRUE)
summary(ctr, "data.frame")
plot(ctr)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get

Posterior distribution of the complier average causal effect (mean ≈ 1.04, 95% credible interval shaded).
Fig 1Posterior distribution of the complier average causal effect (mean ≈ 1.04, 95% credible interval shaded).

Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.

Result · the numbers

\tau_g=\mathbb E\big[Y(1)-Y(0)\,\big|\,S=g\big],\qquad S=\big(D(0),\,D(1)\big)

⚠️ Unofficial community showcase of PStrata (docs). Not affiliated with the authors — all credit to Fan Li & coauthors; this summarizes public documentation.

What it does. PStrata estimates principal causal effects — effects defined within latent principal strata such as compliers, always-takers and never-takers — when an intermediate (post-treatment) variable confounds the treatment-outcome relationship. It handles continuous, binary, count, and time-to-event outcomes (Liu & Li, 2023).

How it works. Units are modeled as a finite mixture over principal strata defined by joint potential values of the intermediate variable. A Bayesian model (compiled to Stan) jointly estimates stratum membership probabilities and outcome models within each stratum via MCMC. The workflow is PStrataModel() to specify strata, exclusion-restriction (ER) and monotonicity assumptions and priors; fit() to run MCMC; then estimate() and contrast() for stratum-specific potential outcomes and effects. Users can toggle assumptions (e.g. drop ER) to probe sensitivity.

Assumptions. SUTVA, ignorable treatment assignment, plus user-chosen structural assumptions (monotonicity, exclusion restriction) and outcome-model/prior specification; identification is driven by the mixture model.

What you get — Posterior stratum membership probabilities and stratum-specific causal contrasts (e.g. complier/LATE effect) with credible intervals.

Example output

# Posterior summary of stratum effects (contrast Z=1 vs Z=0)

  stratum    mean     sd    2.5%   median   97.5%  Rhat
1       n  0.0000  0.000   0.000   0.0000  0.0000  1.00
2       c  1.0382  0.1471   0.751   1.0376  1.3271  1.00
3       a  0.0000  0.000   0.000   0.0000  0.0000  1.00

# Stratum proportions
  stratum    mean     sd    2.5%   97.5%
1       n  0.3013  0.0205  0.262   0.342
2       c  0.4021  0.0231  0.357   0.448
3       a  0.2966  0.0198  0.259   0.336

Links: package · paper

Discussion (0)

  • No comments yet — start the conversation.