Design-based estimators done fast (estimatr)

1

Input · what goes in

Treatment Z, outcome Y and pre-treatment covariates X from a randomized experiment.

Show data format & exampleHide example

Y	Z	X (age)
5.2	1	34
4.1	0	51
6.0	1	29
3.8	0	60

2

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1

Data prep

Assemble the experiment

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Build a small completely-randomized experiment with one pre-treatment covariate.

Reads from the input data Feeds into #2

Key code

# Install:  install.packages("estimatr")
library(estimatr)
dat <- data.frame(Y = rnorm(100), Z = rbinom(100, 1, .5), X = rnorm(100))

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

2

Estimation

Fit Lin's estimator

The core estimate — where the causal quantity itself is computed.

What happens here

lm_lin centers covariates, adds the full treatment-by-covariate interactions, and uses HC2 standard errors.

Formula

Y_i=\alpha+\tau Z_i+\beta^\top\tilde X_i+\gamma^\top Z_i\tilde X_i+\varepsilon_i,\quad \tilde X_i=X_i-\bar X

Reads from #1 Feeds into the final output

Key code

lm_lin(Y ~ Z, covariates = ~ X, data = dat)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

3

Output · what you get

Fig 1Lin-adjusted ATE (treatment coefficient) with its HC2 robust 95% confidence interval.

Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.

Result · the numbers

Y_i=\alpha+\tau Z_i+\beta^\top\tilde X_i+\gamma^\top Z_i\tilde X_i+\varepsilon_i,\quad \tilde X_i=X_i-\bar X

⚠️ Unofficial community showcase of estimatr (docs). Not affiliated with the authors — all credit to the DeclareDesign team (Blair, Cooper, Coppock, Humphreys & Sonnet); this summarizes public documentation.

What it does. estimatr is the canonical software for the design-based estimators central to Peng Ding's work. lm_lin() implements Lin's (2013) covariate adjustment (treatment fully interacted with centered covariates); lm_robust() and difference_in_means() give the Neyman difference-in-means with conservative HC2 variance.

How it works. lm_lin() centers the covariates, adds all treatment×covariate interactions, and reads the ATE off the treatment coefficient with HC2 standard errors — the estimator Ding's textbook derives as never-harmful in large samples. Everything is implemented in C++ for speed.

Assumptions. Complete randomization (or a known design), SUTVA, fixed pre-treatment covariates; the HC2 variance is finite-sample conservative.

Implements methods Prof. Ding is known for; package authored by the DeclareDesign team.

What you get — The covariate-adjusted ATE (the Z coefficient) with an HC2 robust standard error and confidence interval.

Example output

            Estimate Std. Error t value Pr(>|t|)  CI Lower CI Upper  DF
(Intercept)   0.0123     0.1402   0.088    0.930   -0.266    0.290  96
Z            -0.0457     0.1989  -0.230    0.819   -0.440    0.349  96
X_c           0.0712     0.1421   0.501    0.617   -0.211    0.353  96
Z:X_c         0.0331     0.2011   0.165    0.870   -0.366    0.432  96

Links: package · paper