StatsOtter Causal inference workflows
11
Workflow·3 steps

Double machine learning in Python (econml)

Summary by StatsOtter

A Python toolkit for heterogeneous treatment effects from observational data — double ML, doubly-robust, orthogonal forests, meta-learners.

1

Input · what goes in

Outcome Y, treatment T, effect-modifier features X, and controls W (arrays or DataFrames).

Show data format & exampleHide example
Y T X1 X2
1.0 1 0.5 -0.3
0.2 0 -0.8 0.1
1.7 1 0.9 1.2
0.0 0 0.1 -0.6
2

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Build the data

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Provide outcome, a (discrete) treatment, and effect-modifier features.

Reads from the input data Feeds into #2
Key code
# Install:  pip install econml
import numpy as np
from econml.dml import LinearDML
from sklearn.ensemble import GradientBoostingRegressor

n = 1000
X = np.random.normal(size=(n, 5))
T = np.random.binomial(1, 0.5, size=n)
Y = (X[:, 0] > 0) * T + X[:, 1] + np.random.normal(size=n)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Estimation

Fit a Linear Double-ML model

The core estimate — where the causal quantity itself is computed.

What happens here

LinearDML partials out flexible ML models for Y and T (cross-fitted) and recovers the effect.

Formula
\tau(x)=\mathbb E[Y(1)-Y(0)\mid X=x];\quad \tilde Y=\tilde T\,\theta(x)+\varepsilon\ \ (\text{double ML residuals})
Reads from #1 Feeds into #3
Key code
est = LinearDML(model_y=GradientBoostingRegressor(),
                model_t=GradientBoostingRegressor(),
                discrete_treatment=True)
est.fit(Y, T, X=X)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Inference

Read the average effect

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

const_marginal_ate averages the CATE over the sample.

Reads from #2 Feeds into the final output
Key code
print(est.const_marginal_ate(X))

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get

Double-ML CATE estimates across units from econml's LinearDML, centered on the average treatment effect.
Fig 1Double-ML CATE estimates across units from econml's LinearDML, centered on the average treatment effect.

Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.

Result · the numbers

\tau(x)=\mathbb E[Y(1)-Y(0)\mid X=x];\quad \tilde Y=\tilde T\,\theta(x)+\varepsilon\ \ (\text{double ML residuals})

⚠️ Unofficial community showcase of econml (docs). Not affiliated with the authors — all credit to Microsoft Research / PyWhy (Lewis, Syrgkanis & collaborators); this summarizes public documentation.

What it does. econml brings the Athey–Imbens-style econometrics-meets-ML estimators to Python: it estimates conditional average treatment effects with machine-learning nuisance models while keeping valid inference on the causal parameter.

How it works. Estimators like LinearDML fit flexible models for the outcome and the treatment, partial both out (Neyman-orthogonal/double ML), and regress the residuals to recover the effect — optionally as a function of effect-modifiers X for a CATE. Cross-fitting removes overfitting bias.

Assumptions. Unconfoundedness given the controls and overlap; orthogonalization makes the estimate robust to first-stage ML error.

Packages the double-ML / heterogeneous-effects methods from the causal-ML agenda Imbens helped build; authored by the PyWhy/Microsoft team.

What you get — A CATE model you can evaluate at any X, and the average treatment effect implied by it.

Example output

0.5123847716

Links: package · paper

Discussion (0)

  • No comments yet — start the conversation.