StatsOtter Causal inference workflows
11
Workflow·3 steps

Causal forests for heterogeneous effects (grf)

Summary by StatsOtter

Generalized random forests that estimate conditional average treatment effects τ(x) non-parametrically, with valid confidence intervals.

1

Input · what goes in

A covariate matrix X, a treatment vector W, and an outcome vector Y.

Show data format & exampleHide example
Y W X1 X2
1.2 1 0.4 -0.7
0.3 0 -1.1 0.2
2.1 1 0.9 1.3
0.1 0 0.0 -0.4
2

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Assemble X, W, Y

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Provide covariates, a treatment indicator, and an outcome (observational or experimental).

Reads from the input data Feeds into #2
Key code
# Install:  install.packages("grf")
library(grf)
n <- 2000; p <- 10
X <- matrix(rnorm(n * p), n, p)
W <- rbinom(n, 1, 0.5)
Y <- pmax(X[, 1], 0) * W + X[, 2] + rnorm(n)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Estimation

Fit a causal forest

The core estimate — where the causal quantity itself is computed.

What happens here

causal_forest grows honest, heterogeneity-seeking trees and predicts each unit's conditional effect τ̂(x).

Formula
\tau(x)=\mathbb E[\,Y(1)-Y(0)\mid X=x\,],\qquad \hat\tau(x)=\sum_i \alpha_i(x)\,\big(Y_i-\hat\mu(X_i)\big)
Reads from #1 Feeds into #3
Key code
cf <- causal_forest(X, Y, W)
tau.hat <- predict(cf)$predictions

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Inference

Aggregate to the ATE

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

average_treatment_effect aggregates the forest to a doubly-robust ATE with a standard error.

Reads from #2 Feeds into the final output
Key code
average_treatment_effect(cf, target.sample = "all")

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get

Distribution of causal-forest CATE estimates τ̂(x) across units — spread away from the ATE reveals real effect heterogeneity.
Fig 1Distribution of causal-forest CATE estimates τ̂(x) across units — spread away from the ATE reveals real effect heterogeneity.

Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.

Result · the numbers

\tau(x)=\mathbb E[\,Y(1)-Y(0)\mid X=x\,],\qquad \hat\tau(x)=\sum_i \alpha_i(x)\,\big(Y_i-\hat\mu(X_i)\big)

⚠️ Unofficial community showcase of grf (docs). Not affiliated with the authors — all credit to Susan Athey, Julie Tibshirani, Stefan Wager & Erik Sverdrup; this summarizes public documentation.

What it does. grf estimates heterogeneous treatment effects — how the effect τ(x) varies with covariates — using forests, with honest, asymptotically-normal confidence intervals. causal_forest() is the workhorse; average_treatment_effect() aggregates to the ATE.

How it works. It grows many honest trees that split to maximize heterogeneity in the treatment effect (a gradient-based generalization of random forests), then forms a weighted local moment estimator at each x. Out-of-bag predictions give each unit's τ(x); the forest also returns variance estimates for inference.

Assumptions. Unconfoundedness and overlap; honesty (separate splitting and estimation samples) underpins the valid intervals.

This forest-based causal-ML agenda was co-developed with Imbens's program on machine learning for causal inference.

What you get — A per-unit CATE estimate τ̂(x), plus the aggregated ATE with a standard error.

Example output

  estimate    std.err 
 0.4982731  0.0461145 

Links: package · paper

Discussion (0)

  • No comments yet — start the conversation.