Workflow·4 steps

Ecological inference (ei)

@gary_king D · Jun 29, 2026 · 800 views

Summary by StatsOtter

Infers individual-level behavior from aggregate (district-level) data, the classic example being voting rates by race from precinct totals.

Input · what goes in

Aggregate unit-level data: for each district, two observed margins (e.g. fraction black and turnout) and the population size.

Show data format & exampleHide example

precinct	x_black	t_turnout	n
1	0.20	0.55	800
2	0.65	0.48	1200
3	0.10	0.61	950
4	0.40	0.52	700

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

Data prep

Load ei and prepare the 2x2 margins

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Load the package and the matproii data, then specify the formula relating the row margin x to the column margin t.

Formula

T_i = \beta_i^b X_i + \beta_i^w (1 - X_i)

Reads from the input data Feeds into #2

Key code

# Install:  install.packages("ei")
library(ei)
data(matproii)
form <- t ~ x

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Estimation

Fit the ecological inference model

The core estimate — where the causal quantity itself is computed.

What happens here

Estimate the 2x2 EI model, supplying the precinct totals so counts can be recovered.

Formula

T_i=\beta_i^{b}\,X_i+\beta_i^{w}\,(1-X_i),\qquad 0\le\beta_i^{b},\beta_i^{w}\le 1

Reads from #1 Feeds into #3

Key code

dbuf <- ei(formula = form, total = "n", data = matproii)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Inference

Summarize aggregate estimates

Uncertainty quantification — standard errors, intervals, and aggregation.

What happens here

Print the district-level aggregate Betab and Betaw estimates with their standard deviations and bounds.

Formula

\hat{B}^b = \frac{\sum_i X_i N_i \hat{\beta}_i^b}{\sum_i X_i N_i}

Reads from #2 Feeds into #4

Key code

summary(dbuf)

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Reporting

Read out precinct-level betas

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

Use eiread() to extract the precinct-specific point estimates of betab and betaw for mapping or further analysis.

Reads from #3 Feeds into the final output

Key code

betab <- eiread(dbuf, "betab")
betaw <- eiread(dbuf, "betaw")
head(cbind(betab, betaw))

Reference / docs ↗

Discussion on this step (0)

No comments on this step yet — be the first.

Output · what you get

Fig 1Aggregate group-specific rates with their logical bounds and 95% intervals.

Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.

Result · the numbers

T_i=\beta_i^{b}\,X_i+\beta_i^{w}\,(1-X_i),\qquad 0\le\beta_i^{b},\beta_i^{w}\le 1

⚠️ Unofficial community showcase of ei (docs). Not affiliated with the authors — all credit to Gary King & coauthors; this summarizes public documentation.

What it does: The ei package implements Gary King's solution to the ecological inference problem—estimating unobserved individual-level relationships (e.g. how subgroups voted) using only aggregate data such as precinct or district marginals. How it works: For 2x2 tables it combines the deterministic accounting identity (which bounds each unit's unknowns given its margins) with a statistical model: a truncated bivariate normal distribution over the unit-level parameters, estimated by maximum likelihood and simulation. This yields district-level and aggregate point estimates with uncertainty, while respecting the logical bounds each precinct imposes. Assumptions: The model's distributional form (truncated bivariate normal), no spatial autocorrelation beyond what's modeled, and absence of certain aggregation bias; diagnostics and extended models address some violations. Output: estimated internal cell quantities (e.g. fraction of each group voting), with bounds and credible intervals, at both unit and aggregate levels.

What you get — Estimated internal cell quantities (e.g. group-specific turnout) with bounds, at unit and aggregate levels.

Example output

Maximum likelihood results in scale of estimation (and se's)
           [,1]   [,2]
Zb        0.314  0.184
Zw        0.733  0.114

Aggregate Bounds
         betab betaw
lower    0.025 0.601
upper    0.444 0.762

Estimates of Aggregate Quantities of Interest
        mean     sd
Bb     0.612  0.028
Bw     0.358  0.014

Links: package · paper

Ecological inference (ei)

Input · what goes in

Pipeline · the recipe

Output · what you get

∑Result · the numbers

Example output

Discussion (0)

Result · the numbers