Workflow·5 steps

Matching for causal inference (MatchIt)

@gary_king D · Jun 29, 2026 · 920 views

Summary by StatsOtter

Preprocesses observational data by matching treated and control units on covariates, so downstream models depend less on modeling assumptions.

Input · what goes in

A data frame with a binary treatment indicator and the covariates to balance on.

Show data format & exampleHide example

treat	age	educ	married
1	37	11	1
0	22	9	0
1	30	12	1
0	45	14	0

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

Data prep

Load MatchIt and the lalonde data

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Bring in the package and the canonical Lalonde job-training observational dataset.

Reads from the input data Feeds into #2

Key code

# Install:  install.packages("MatchIt")
library("MatchIt")
data("lalonde")
head(lalonde)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Estimation

Run 1:1 nearest-neighbor PS matching

The core estimate — where the causal quantity itself is computed.

What happens here

Fit a logistic propensity score and match each treated unit to its nearest control without replacement.

Formula

e(X_i) = \Pr(T_i = 1 \mid X_i)

Reads from #1 Feeds into #3

Key code

m.out1 <- matchit(treat ~ age + educ + race + married +
                    nodegree + re74 + re75,
                  data = lalonde,
                  method = "nearest",
                  distance = "glm")

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Diagnostic / pre-tests

Assess covariate balance

A pre-flight check — run this before trusting any estimate downstream.

What happens here

Inspect standardized mean differences before and after matching to check that matching reduced imbalance.

Formula

\text{SMD} = \frac{\bar{X}_t - \bar{X}_c}{s_t}

Reads from #2 Feeds into #4

Key code

summary(m.out1)
plot(m.out1, type = "density", interactive = FALSE)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Data prep

Extract the matched dataset

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Pull out the matched sample with matching weights and subclass identifiers for the outcome model.

Reads from #3 Feeds into #5

Key code

m.data <- match.data(m.out1)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Estimation

Estimate the ATT

The core estimate — where the causal quantity itself is computed.

What happens here

Fit a weighted outcome regression and use marginaleffects to get the average treatment effect on the treated with cluster-robust SEs.

Formula

\tau_{ATT} = E[Y_1 - Y_0 \mid T = 1]

Reads from #4 Feeds into the final output

Key code

library("marginaleffects")
fit <- lm(re78 ~ treat * (age + educ + race + married +
                           nodegree + re74 + re75),
          data = m.data, weights = weights)
avg_comparisons(fit, variables = "treat",
                vcov = ~subclass,
                newdata = subset(m.data, treat == 1))

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Output · what you get 4 figures

Fig 1Love plot — standardized mean differences drop below the balance threshold after matching, far from the unmatched points.

Fig 2Jitter plot of propensity scores: no treated units are dropped while many low-propensity controls are pruned.

Fig 3Empirical-CDF plots for educ, married and re75 comparing treated vs control in the matched sample.

Fig 4Mirrored propensity-score histogram for the treated and control groups (drawn with cobalt).

Figures reproduced from the package's official documentation — unofficial community showcase; all credit to the original authors.

Result · the numbers

\hat\tau_{\mathrm{ATT}}=\frac{1}{n_1}\sum_{i:\,Z_i=1}\Big(Y_i-\sum_{j} w_{ij}\,Y_j\Big)

⚠️ Unofficial community showcase of MatchIt (docs). Not affiliated with the authors — all credit to Gary King & coauthors; this summarizes public documentation.

What it does: MatchIt selects matched subsamples of treated and control units with similar covariate distributions, so that a subsequent parametric model (e.g. a regression) is less sensitive to specification. How it works: It supports many methods—nearest-neighbor and optimal propensity-score matching, exact and coarsened exact matching, genetic matching, and full/subclassification—then reports covariate balance (standardized mean differences, eCDF, Love plots) before and after. Effects are estimated on the matched data, typically with weights and robust/cluster-robust standard errors. Assumptions: Causal interpretation requires unconfoundedness (selection on observables) and overlap/common support between groups; matching only addresses observed covariates, not unmeasured confounding. It implements Ho, Imai, King & Stuart's 'matching as nonparametric preprocessing' recommendations.

What you get — A matched dataset (with weights/subclasses) plus balance diagnostics for estimating treatment effects.

Example output

Call:
matchit(formula = treat ~ age + educ + race + married + nodegree +
    re74 + re75, data = lalonde, method = "nearest", distance = "glm")

Summary of Balance for All Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
distance          0.5774        0.1822          1.7941     0.9211    0.3774   0.6444
age              25.8162       28.0303         -0.3094     0.4400    0.0813   0.1577
educ             10.3459       10.2354          0.0550     0.4959    0.0347   0.1114
raceblack         0.8432        0.2028          1.7615          .    0.6404   0.6404
married           0.1892        0.5128         -0.8263          .    0.3236   0.3236
re74           2095.5737     5619.2365         -0.7211     0.5181    0.2248   0.4470

Summary of Balance for Matched Data:
           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max
distance          0.5774        0.3629          0.9739     0.7566    0.1321   0.4216
age              25.8162       25.3027          0.0718     0.4568    0.0847   0.2541
married           0.1892        0.2108         -0.0552          .    0.0216   0.0216
re74           2095.5737     2342.1076         -0.0505     1.3289    0.0469   0.2757

Sample Sizes:
          Control Treated
All           429     185
Matched       185     185
Unmatched     244       0

Links: package · paper

Discussion (2)

3

@calibrator_cleo · Jun 29, 2026

Matching as the design stage — outcome-free — is the discipline people skip. MatchIt makes it the path of least resistance.

2

@aipw_amir · Jun 29, 2026

And match.data() → any outcome model. Pairs perfectly with cobalt for the balance plots.
6

@targeting_tara · Jun 29, 2026

Nearest, optimal, full, genetic — all behind one matchit() call. Great teaching tool.

Matching for causal inference (MatchIt)

Input · what goes in

Pipeline · the recipe

Output · what you get 4 figures

∑Result · the numbers

Example output

Discussion (2)

Result · the numbers