StatsOtter Causal inference workflows
11
Workflow·4 steps

Synthetic control for comparative case studies (Synth)

Summary by StatsOtter

Build a weighted 'synthetic' control from untreated units to estimate the effect of a single treated case over time.

1

Input · what goes in

Panel data: unit IDs, time periods, an outcome, and predictor variables; one treated unit and a donor pool.

Show data format & exampleHide example
unit year gdp invest
1 1990 100 22
1 1991 104 23
2 1990 88 19
2 1991 90 20
2

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Prepare the panel with dataprep()

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Package the Basque panel into the X0/X1/Z0/Z1 matrices: Basque Country (region 17) is treated, the other 16 regions form the donor pool.

Reads from the input data Feeds into #2
Key code
# Install:  install.packages("Synth")
library(Synth)
data(basque)
dp <- dataprep(
  foo = basque,
  predictors = c("school.illit","school.prim","school.med",
                 "school.high","school.post.high","invest"),
  predictors.op = "mean",
  dependent = "gdpcap",
  unit.variable = "regionno",
  time.variable = "year",
  treatment.identifier = 17,
  controls.identifier = c(2:16,18),
  time.predictors.prior = 1964:1969,
  time.optimize.ssr = 1960:1969,
  unit.names.variable = "regionname",
  time.plot = 1955:1997)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Estimation

Optimize the donor weights

The core estimate — where the causal quantity itself is computed.

What happens here

synth() solves the nested optimization for the unit weights W and predictor weights V that minimize pre-treatment outcome distance.

Formula
W^{*} = \arg\min_{W}\;(X_1 - X_0 W)' V (X_1 - X_0 W),\quad w_j\ge 0,\;\textstyle\sum_j w_j = 1
Reads from #1 Feeds into #3
Key code
so <- synth(data.prep.obj = dp)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3
Reporting

Donor weights and predictor balance tables

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

synth.tab() returns tab.w (donor weights) and tab.pred (treated vs synthetic predictor means).

Reads from #2 Feeds into #4
Key code
tabs <- synth.tab(dataprep.res = dp, synth.res = so)
tabs$tab.w     # donor weights
tabs$tab.pred  # predictor balance

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
4
Reporting

Path and gaps plots

Reporting — turn the numbers into a figure or table a reader can act on.

What happens here

Visualize treated vs synthetic GDP over time and the estimated treatment gap after 1970.

Reads from #3 Feeds into the final output
Key code
path.plot(synth.res = so, dataprep.res = dp,
          Ylab = "real per-capita GDP", Xlab = "year")
gaps.plot(synth.res = so, dataprep.res = dp,
          Ylab = "gap in per-capita GDP", Xlab = "year")

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get 2 figures

Observed California cigarette sales vs its synthetic control, diverging after the 1988 Proposition 99 law (reproduced on the canonical Prop 99 data).
Fig 1Observed California cigarette sales vs its synthetic control, diverging after the 1988 Proposition 99 law (reproduced on the canonical Prop 99 data).
Gaps plot — the difference between observed and synthetic California sales, widening after Proposition 99.
Fig 2Gaps plot — the difference between observed and synthetic California sales, widening after Proposition 99.

Figures reproduced from the package's official documentation — unofficial community showcase; all credit to the original authors.

Result · the numbers

\hat\tau_{1t}=Y_{1t}-\sum_{j=2}^{J+1} w_j^{*}\,Y_{jt},\qquad w_j^{*}\ge 0,\ \ \textstyle\sum_j w_j^{*}=1

⚠️ Unofficial community showcase of Synth (docs). Not affiliated with the authors — all credit to Guido Imbens & coauthors; this summarizes public documentation.

What it does. When one unit (a state, country, firm) is treated, Synth constructs a counterfactual by optimally weighting comparison units so the synthetic unit matches the treated unit's pre-treatment outcomes and predictors; the post-treatment gap is the estimated effect. How it works. It solves a nested optimization for non-negative weights that sum to one, minimizing pre-period discrepancy, then projects the synthetic control forward. Assumptions. No interference/spillovers to controls, no anticipation, and a convex combination of donors that tracks the treated unit pre-treatment. Imbens's contribution. The Synth package implements Abadie, Diamond & Hainmueller (2010); Imbens is a leading figure in this design-based panel literature and co-developed Synthetic Difference-in-Differences (Arkhangelsky, Athey, Hirshberg, Imbens & Wager, 2021), which unifies synthetic control with difference-in-differences.

What you get — Donor weights and a treated-vs-synthetic outcome gap (the estimated treatment effect path).

Example output

$tab.pred
                          Treated Synthetic Sample Mean
school.illit               39.888   256.337     170.786
school.prim              1031.742  2730.104    1127.186
school.med                 90.359   223.340      76.260
school.high                25.728    63.437      24.235
invest                     24.647    21.583      21.424

$tab.w
        w.weights unit.names unit.numbers
2           0.000  Andalucia            2
10          0.851  Cataluna            10
14          0.149     Madrid           14

Links: package · paper

Discussion (0)

  • No comments yet — start the conversation.