Simulation-based inference for any model (clarify)

1

Input · what goes in

A fitted regression model (lm/glm/…) and the data used to fit it.

Show data format & exampleHide example

re78>0	treat	age	educ	married
1	1	37	11	1
0	0	22	9	0
1	1	30	12	1
0	0	45	14	0

2

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1

Estimation

Fit any regression model

The core estimate — where the causal quantity itself is computed.

What happens here

Fit the substantive model — here a logit for the probability of positive earnings with a treatment-by-covariate interaction.

Formula

\hat\theta=\frac1S\sum_{s=1}^{S} g(\tilde\beta_s),\quad \tilde\beta_s\sim\mathcal N(\hat\beta,\widehat{\mathrm{Var}}(\hat\beta))

Reads from the input data Feeds into #2

Key code

# Install:  install.packages("clarify")
library(clarify)
fit <- glm(re78 > 0 ~ treat * (age + educ + married),
           data = lalonde, family = binomial)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

2

Inference

Simulate the coefficient distribution

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

Draw 1000 coefficient vectors from the model's (approximately normal) sampling distribution.

Reads from #1 Feeds into #3

Key code

sim_coefs <- sim(fit, n = 1000)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

3

Reporting

Average marginal effect of treatment

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

Push every draw through the prediction function and average to get the AME with a percentile interval.

Reads from #2 Feeds into the final output

Key code

est <- sim_ame(sim_coefs, var = "treat", verbose = FALSE)
summary(est)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

3

Output · what you get

Fig 1Simulated E[Y(0)] vs E[Y(1)] with percentile confidence intervals from 1000 coefficient draws.

Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.

Result · the numbers

\hat\theta=\frac1S\sum_{s=1}^{S} g(\tilde\beta_s),\quad \tilde\beta_s\sim\mathcal N(\hat\beta,\widehat{\mathrm{Var}}(\hat\beta))

⚠️ Unofficial community showcase of clarify (docs). Not affiliated with the authors — all credit to Noah Greifer, Steven Worthington, Stefano Iacus & Gary King; this summarizes public documentation.

What it does. clarify is the modern successor to King, Tomz & Wittenberg's Zelig idea: after you fit any regression, it post-processes the model into substantively meaningful quantities (expected values, average marginal effects, first differences) with uncertainty from parameter simulation instead of the delta method.

How it works. sim() draws many coefficient vectors from the model's sampling distribution; sim_ame() / sim_setx() push each draw through the prediction function and average, giving a simulation distribution for the quantity of interest; summary() reports its mean and percentile interval. This sidesteps fragile delta-method approximations for nonlinear models.

Assumptions. The model is correctly specified and its coefficient sampling distribution is approximately multivariate normal (or you supply a bootstrap).

What you get — Simulation-based point estimates and percentile confidence intervals for predictions and average marginal effects.

Example output

         Estimate 2.5 % 97.5 %
E[Y(0)]    0.7666 0.683  0.847
E[Y(1)]    0.7799 0.677  0.879

Using 1000 simulated values per estimate.

Links: package · paper

Simulation-based inference for any model (clarify)

Input · what goes in

Pipeline · the recipe

Output · what you get

∑Result · the numbers

Example output

Discussion (0)

Result · the numbers