Flags when a counterfactual question is a safe interpolation versus a model-dependent extrapolation far from your data.
Input · what goes in
Observed covariates plus one or more counterfactual covariate combinations to evaluate.
Show data format & exampleHide example
| gdpcap | democ |
|---|---|
| 4200 | 6 |
| 8800 | 9 |
| 1500 | 2 |
| cfact: 5000 | 8 |
Pipeline · the recipe
↑ Click any step in the diagram to read its logic, code, assumptions & discussion.
Load data and define the counterfactual
Data preparation — shapes the raw inputs into what the estimator expects.
Take the observed covariates and specify the counterfactual covariate values you want to evaluate.
# Install: install.packages("WhatIf")
library(WhatIf)
data("unga", package = "WhatIf")
cf <- data.frame(gdpcap = 5000, democ = 8)
- No comments on this step yet — be the first.
Log in to comment on this step.
Run the convex-hull / distance test
A pre-flight check — run this before trusting any estimate downstream.
whatif() reports convex-hull membership and Gower-distance support for each counterfactual point.
wi <- whatif(formula = ~ gdpcap + democ, data = unga, cfact = cf)
summary(wi)
- No comments on this step yet — be the first.
Log in to comment on this step.
Output · what you get
Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.
Result · the numbers
⚠️ Unofficial community showcase of WhatIf (docs). Not affiliated with the authors — all credit to Heather Stoll, Gary King & Langche Zeng; this summarizes public documentation.
What it does. Before trusting a predicted counterfactual, WhatIf (King & Zeng 2006) asks whether that covariate combination is actually supported by the data or whether the answer is an extrapolation that depends entirely on the model's functional form.
How it works. whatif() checks whether each counterfactual point lies inside the convex hull of the observed covariates (an exact linear-programming test), and computes Gower-distance summaries — the share of data nearby and the mean distance — so you can rank counterfactuals from well-supported to far-off.
Assumptions. It is model-free: a purely geometric statement about how far a counterfactual sits from the data, used to gauge how model-dependent any downstream inference would be.
What you get — For each counterfactual: whether it is inside the convex hull, the percent of data nearby, and the mean distance to the data.
Example output
Call:
whatif(formula = ~gdpcap + democ, data = unga, cfact = ...)
Counterfactual 1: in convex hull = FALSE
Percent of data nearby = 12.4%
Mean distance to data = 0.318

Discussion (0)
Log in to join the discussion.