StatsOtter Causal inference workflows
11
Workflow·3 steps

IV with heterogeneous effects: the LATE (ivreg)

Summary by StatsOtter

Two-stage least squares for instrumental-variables regression, with the modern LATE interpretation and rich diagnostics.

1

Input · what goes in

An outcome, one or more endogenous regressors, exogenous covariates, and instrument(s).

Show data format & exampleHide example
y educ (endog) nearcollege (instr)
6.1 12 0
6.9 16 1
5.8 11 0
7.2 14 1
2

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Load data and specify the IV formula

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Use the classic Kmenta market data and write the two-part formula y ~ regressors | instruments separating endogenous and exogenous terms.

Formula
Y_i = \beta_0 + \beta_1 P_i + \beta_2 D_i + \varepsilon_i,\quad \operatorname{Cov}(P_i,\varepsilon_i)\neq 0
Reads from the input data Feeds into #2
Key code
# Install:  install.packages("ivreg")
library("ivreg")
data("Kmenta", package = "ivreg")
# Q ~ P + D, with P endogenous; instruments are D, F, A
f <- Q ~ P + D | D + F + A

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Estimation

Fit the 2SLS / IV model

The core estimate — where the causal quantity itself is computed.

What happens here

ivreg() estimates the structural equation by two-stage least squares using the supplied instruments.

Formula
\hat\beta_{2SLS} = (X'P_Z X)^{-1} X'P_Z y,\quad P_Z = Z(Z'Z)^{-1}Z'
Reads from #1 Feeds into #3
Key code
fit <- ivreg(f, data = Kmenta)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Inference

Summary with diagnostic tests

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

diagnostics=TRUE appends the three key tests: weak-instruments (large F means relevant instruments), Wu-Hausman (small p means OLS is biased, prefer IV), and Sargan (large p means the over-identifying instruments pass).

Formula
\tau_{\mathrm{LATE}}=\frac{\mathrm{Cov}(Y,Z)}{\mathrm{Cov}(D,Z)}=\frac{\mathbb E[Y\mid Z{=}1]-\mathbb E[Y\mid Z{=}0]}{\mathbb E[D\mid Z{=}1]-\mathbb E[D\mid Z{=}0]}
Reads from #2 Feeds into the final output
Key code
summary(fit, diagnostics = TRUE)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get 2 figures

Effects plot — the estimated marginal effect of education on wages from a 2SLS instrumental-variables fit.
Fig 1Effects plot — the estimated marginal effect of education on wages from a 2SLS instrumental-variables fit.
Quantile–quantile diagnostic of the studentized residuals from the ivreg 2SLS model.
Fig 2Quantile–quantile diagnostic of the studentized residuals from the ivreg 2SLS model.

Figures reproduced from the package's official documentation — unofficial community showcase; all credit to the original authors.

Result · the numbers

\tau_{\mathrm{LATE}}=\frac{\mathrm{Cov}(Y,Z)}{\mathrm{Cov}(D,Z)}=\frac{\mathbb E[Y\mid Z{=}1]-\mathbb E[Y\mid Z{=}0]}{\mathbb E[D\mid Z{=}1]-\mathbb E[D\mid Z{=}0]}

⚠️ Unofficial community showcase of ivreg (docs). Not affiliated with the authors — all credit to Guido Imbens & coauthors; this summarizes public documentation.

What it does. Estimates causal effects when treatment is endogenous, using an instrument that affects treatment but not the outcome directly. ivreg fits the model by two-stage least squares (2SLS) with a clean outcome ~ covariates | instruments formula and diagnostics (weak instruments, Wu-Hausman, Sargan). How it works. The first stage projects the endogenous regressor onto the instrument; the second stage regresses the outcome on these fitted values. Assumptions. Instrument relevance, exclusion, independence, and—for the causal interpretation—monotonicity (no defiers). Imbens's contribution. Imbens & Angrist (1994, Econometrica) showed that, under monotonicity, IV/2SLS identifies the Local Average Treatment Effect—the effect for compliers whose treatment responds to the instrument—reframing what 2SLS actually estimates with heterogeneous effects.

What you get — 2SLS coefficient (the LATE for compliers) with standard errors and IV diagnostic tests.

Example output

Call:
ivreg(formula = Q ~ P + D | D + F + A, data = Kmenta)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 94.63330    7.92084  11.947 1.08e-09 ***
P           -0.24356    0.09648  -2.524   0.0218 *  
D            0.31399    0.04694   6.689 3.81e-06 ***

Diagnostic tests:
                 df1 df2 statistic  p-value    
Weak instruments   2  16    88.025 2.32e-09 ***
Wu-Hausman         1  16    11.422  0.00382 ** 
Sargan             1  NA     2.983  0.08414 .  
---
Residual standard error: 1.966 on 17 degrees of freedom
Multiple R-Squared: 0.7548, Adjusted R-squared: 0.726 

Links: package · paper

Discussion (0)

  • No comments yet — start the conversation.