StatsOtter Causal inference workflows
11
Workflow·4 steps

Ecological inference (ei)

Summary by StatsOtter

Infers individual-level behavior from aggregate (district-level) data, the classic example being voting rates by race from precinct totals.

1

Input · what goes in

Aggregate unit-level data: for each district, two observed margins (e.g. fraction black and turnout) and the population size.

Show data format & exampleHide example
precinct x_black t_turnout n
1 0.20 0.55 800
2 0.65 0.48 1200
3 0.10 0.61 950
4 0.40 0.52 700
2

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Load ei and prepare the 2x2 margins

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Load the package and the matproii data, then specify the formula relating the row margin x to the column margin t.

Formula
T_i = \beta_i^b X_i + \beta_i^w (1 - X_i)
Reads from the input data Feeds into #2
Key code
# Install:  install.packages("ei")
library(ei)
data(matproii)
form <- t ~ x

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Estimation

Fit the ecological inference model

The core estimate — where the causal quantity itself is computed.

What happens here

Estimate the 2x2 EI model, supplying the precinct totals so counts can be recovered.

Formula
T_i=\beta_i^{b}\,X_i+\beta_i^{w}\,(1-X_i),\qquad 0\le\beta_i^{b},\beta_i^{w}\le 1
Reads from #1 Feeds into #3
Key code
dbuf <- ei(formula = form, total = "n", data = matproii)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Inference

Summarize aggregate estimates

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

Print the district-level aggregate Betab and Betaw estimates with their standard deviations and bounds.

Formula
\hat{B}^b = \frac{\sum_i X_i N_i \hat{\beta}_i^b}{\sum_i X_i N_i}
Reads from #2 Feeds into #4
Key code
summary(dbuf)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
4
Reporting

Read out precinct-level betas

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

Use eiread() to extract the precinct-specific point estimates of betab and betaw for mapping or further analysis.

Reads from #3 Feeds into the final output
Key code
betab <- eiread(dbuf, "betab")
betaw <- eiread(dbuf, "betaw")
head(cbind(betab, betaw))

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get

Aggregate group-specific rates with their logical bounds and 95% intervals.
Fig 1Aggregate group-specific rates with their logical bounds and 95% intervals.

Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.

Result · the numbers

T_i=\beta_i^{b}\,X_i+\beta_i^{w}\,(1-X_i),\qquad 0\le\beta_i^{b},\beta_i^{w}\le 1

⚠️ Unofficial community showcase of ei (docs). Not affiliated with the authors — all credit to Gary King & coauthors; this summarizes public documentation.

What it does: The ei package implements Gary King's solution to the ecological inference problem—estimating unobserved individual-level relationships (e.g. how subgroups voted) using only aggregate data such as precinct or district marginals. How it works: For 2x2 tables it combines the deterministic accounting identity (which bounds each unit's unknowns given its margins) with a statistical model: a truncated bivariate normal distribution over the unit-level parameters, estimated by maximum likelihood and simulation. This yields district-level and aggregate point estimates with uncertainty, while respecting the logical bounds each precinct imposes. Assumptions: The model's distributional form (truncated bivariate normal), no spatial autocorrelation beyond what's modeled, and absence of certain aggregation bias; diagnostics and extended models address some violations. Output: estimated internal cell quantities (e.g. fraction of each group voting), with bounds and credible intervals, at both unit and aggregate levels.

What you get — Estimated internal cell quantities (e.g. group-specific turnout) with bounds, at unit and aggregate levels.

Example output

Maximum likelihood results in scale of estimation (and se's)
           [,1]   [,2]
Zb        0.314  0.184
Zw        0.733  0.114

Aggregate Bounds
         betab betaw
lower    0.025 0.601
upper    0.444 0.762

Estimates of Aggregate Quantities of Interest
        mean     sd
Bb     0.612  0.028
Bw     0.358  0.014

Links: package · paper

Discussion (0)

  • No comments yet — start the conversation.