StatsOtter Causal inference workflows
11
Workflow·2 steps

Is your counterfactual an extrapolation? (WhatIf)

Summary by StatsOtter

Flags when a counterfactual question is a safe interpolation versus a model-dependent extrapolation far from your data.

1

Input · what goes in

Observed covariates plus one or more counterfactual covariate combinations to evaluate.

Show data format & exampleHide example
gdpcap democ
4200 6
8800 9
1500 2
cfact: 5000 8
2

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Load data and define the counterfactual

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Take the observed covariates and specify the counterfactual covariate values you want to evaluate.

Reads from the input data Feeds into #2
Key code
# Install:  install.packages("WhatIf")
library(WhatIf)
data("unga", package = "WhatIf")
cf <- data.frame(gdpcap = 5000, democ = 8)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Diagnostic / pre-tests

Run the convex-hull / distance test

A pre-flight check — run this before trusting any estimate downstream.

What happens here

whatif() reports convex-hull membership and Gower-distance support for each counterfactual point.

Reads from #1 Feeds into the final output
Key code
wi <- whatif(formula = ~ gdpcap + democ, data = unga, cfact = cf)
summary(wi)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get

Counterfactual points inside the convex hull of the data are safe interpolations; points outside are model-dependent extrapolations.
Fig 1Counterfactual points inside the convex hull of the data are safe interpolations; points outside are model-dependent extrapolations.

Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.

Result · the numbers

x_{\text{cf}}\in\mathrm{conv}(X)\iff \exists\,w_i\ge0,\ \textstyle\sum_i w_i=1:\ \sum_i w_i x_i=x_{\text{cf}}

⚠️ Unofficial community showcase of WhatIf (docs). Not affiliated with the authors — all credit to Heather Stoll, Gary King & Langche Zeng; this summarizes public documentation.

What it does. Before trusting a predicted counterfactual, WhatIf (King & Zeng 2006) asks whether that covariate combination is actually supported by the data or whether the answer is an extrapolation that depends entirely on the model's functional form.

How it works. whatif() checks whether each counterfactual point lies inside the convex hull of the observed covariates (an exact linear-programming test), and computes Gower-distance summaries — the share of data nearby and the mean distance — so you can rank counterfactuals from well-supported to far-off.

Assumptions. It is model-free: a purely geometric statement about how far a counterfactual sits from the data, used to gauge how model-dependent any downstream inference would be.

What you get — For each counterfactual: whether it is inside the convex hull, the percent of data nearby, and the mean distance to the data.

Example output

Call:
whatif(formula = ~gdpcap + democ, data = unga, cfact = ...)

Counterfactual 1:  in convex hull = FALSE
  Percent of data nearby = 12.4%
  Mean distance to data  = 0.318

Links: package · paper

Discussion (0)

  • No comments yet — start the conversation.