Fills in missing values via fast bootstrap-EM multiple imputation, producing several complete datasets you analyze and combine.
Input · what goes in
A data frame with missing values (NA), optionally with declared time-series/cross-section ID variables.
Show data format & exampleHide example
| year | country | gdp | trade |
|---|---|---|---|
| 1990 | A | 5.2 | NA |
| 1991 | A | NA | 41.0 |
| 1990 | B | 3.1 | 28.4 |
| 1991 | B | 3.4 | NA |
Pipeline · the recipe
↑ Click any step in the diagram to read its logic, code, assumptions & discussion.
Load Amelia and the freetrade data
Data preparation — shapes the raw inputs into what the estimator expects.
Load the package and the freetrade panel of trade policy in Asian countries, which has missing values in tariff and other columns.
# Install: install.packages("Amelia")
library(Amelia)
data(freetrade)
summary(freetrade)
- No comments on this step yet — be the first.
Log in to comment on this step.
Run multiple imputation with m = 5
The core estimate — where the causal quantity itself is computed.
Impute five completed datasets, declaring the panel/time structure with cs and ts.
a.out <- amelia(freetrade, m = 5, ts = "year", cs = "country")
- No comments on this step yet — be the first.
Log in to comment on this step.
Check imputation diagnostics
A pre-flight check — run this before trusting any estimate downstream.
Compare the observed and imputed densities and run an overimputation check to evaluate imputation quality.
compare.density(a.out, var = "tariff")
overimpute(a.out, var = "tariff")
- No comments on this step yet — be the first.
Log in to comment on this step.
Fit the model on each imputed dataset
The core estimate — where the causal quantity itself is computed.
Run the same regression on all five completed datasets and collect coefficients and standard errors.
b.out <- NULL; se.out <- NULL
for (i in 1:a.out$m) {
ols <- lm(tariff ~ polity + pop + gdp.pc + year + country,
data = a.out$imputations[[i]])
b.out <- rbind(b.out, coef(ols))
se.out <- rbind(se.out, coef(summary(ols))[, 2])
}
- No comments on this step yet — be the first.
Log in to comment on this step.
Combine estimates with Rubin's rules
Uncertainty quantification — standard errors, intervals, and aggregation.
Use mi.meld() to pool the per-imputation estimates into a single point estimate and standard error.
combined <- mi.meld(q = b.out, se = se.out)
combined$q.mi
combined$se.mi
- No comments on this step yet — be the first.
Log in to comment on this step.
Output · what you get
Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.
Result · the numbers
⚠️ Unofficial community showcase of Amelia (docs). Not affiliated with the authors — all credit to Gary King & coauthors; this summarizes public documentation.
What it does: Amelia (Amelia II) performs multiple imputation of missing data for cross-sectional, time-series, and time-series-cross-section datasets, generating m complete datasets that reflect imputation uncertainty. How it works: It assumes the data are jointly multivariate normal and missing at random, then uses an EMB (Expectation-Maximization with Bootstrapping) algorithm—bootstrapping the data and running EM on each replicate—which is far faster and more stable than MCMC approaches while giving comparable answers. It supports priors, transformations for skewed/bounded variables, and time and cross-section structure via polynomials of time and lags/leads. Assumptions: Multivariate normality (after transformation) and missing-at-random (MAR). Analysts fit their model on each imputed dataset and combine results with Rubin's rules. Output: m completed datasets plus diagnostics (overimputation, density comparisons) to assess imputation quality.
What you get — m completed datasets (imputations) plus diagnostics; analyze each and pool with Rubin's rules.
Example output
Amelia output with 5 imputed datasets.
Return code: 1
Message: Normal EM convergence.
Chain Lengths:
--------------
Imputation 1: 17
Imputation 2: 20
Imputation 3: 16
Imputation 4: 18
Imputation 5: 19
Rows after Listwise Deletion: 96
Rows after Imputation: 171
Patterns of missingness in the data: 8
Fraction Missing for original variables:
-----------------------------------------
Fraction Missing
tariff 0.34502924
polity 0.01169591
intresmi 0.07602339
signed 0.01754386
fiveop 0.10526316

Discussion (0)
Log in to join the discussion.