"Who Are You?" predicts an individual's probable race/ethnicity from surname, first/middle name, and geolocation using Bayesian (BISG) updating.
Input · what goes in
A data frame of individuals with surname (and optionally first/middle name) plus geographic identifiers (state, county, tract, or block FIPS codes).
Show data format & exampleHide example
| surname | state | county | tract |
|---|---|---|---|
| Smith | NJ | 021 | 000100 |
| Garcia | CA | 037 | 207103 |
| Nguyen | TX | 201 | 412900 |
| Lee | NY | 061 | 010300 |
Pipeline · the recipe
↑ Click any step in the diagram to read its logic, code, assumptions & discussion.
Load wru and the voter file
Data preparation — shapes the raw inputs into what the estimator expects.
Load the package and the bundled example voter file containing surnames and geographic identifiers.
# Install: install.packages("wru")
library(wru)
data(voters)
- No comments on this step yet — be the first.
Log in to comment on this step.
Predict race with BISG
The core estimate — where the causal quantity itself is computed.
Call predict_race() to combine the surname likelihood with Census tract-level racial composition (Bayesian Improved Surname Geocoding).
predict_race(voter.file = voters, census.geo = "tract",
census.key = Sys.getenv("CENSUS_API_KEY"), party = "PID")
- No comments on this step yet — be the first.
Log in to comment on this step.
Inspect posterior probabilities
Reporting — turn the numbers into a figure or table a reader can act on.
The returned data.frame appends posterior race probabilities (pred.whi/bla/his/asi/oth) that sum to one per voter.
head(predict_race(voter.file = voters, surname.only = TRUE))
- No comments on this step yet — be the first.
Log in to comment on this step.
Output · what you get
Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.
Result · the numbers
⚠️ Unofficial community showcase of wru (docs). Not affiliated with the authors — all credit to Kosuke Imai & coauthors; this summarizes public documentation.
What it does: wru (Who Are You) produces probabilistic predictions of an individual's racial/ethnic category when race is unobserved—common in voter files, administrative records, and audits of disparities. How it works: it applies Bayesian Improved Surname Geocoding (BISG), combining Census surname (and optionally first/middle-name) race distributions with the racial composition of the person's geographic unit (state, county, tract, or block) via Bayes' Rule. Newer versions add fully Bayesian name-and-geography models and embedding-based features. The core call predict_race() returns, per person, posterior probabilities of being White, Black, Hispanic, Asian, or Other. Assumptions: accuracy depends on the conditional independence of surname and geography given race, on correct and current Census reference tables, and on representative geocoding; predictions are population-level probabilities, not certainties, and can be biased for groups or regions where the reference data fit poorly.
What you get — Per-individual posterior probabilities for each racial/ethnic category (pred.whi, pred.bla, pred.his, pred.asi, pred.oth).
Example output
surname state county tract age sex pred.whi pred.bla pred.his pred.asi pred.oth
Khanna NJ 021 004000 29 0 0.0676 0.0043 0.0082 0.8668 0.0531
Imai NJ 021 004501 40 0 0.0812 0.0024 0.0689 0.7375 0.1100
Velasco NY 061 004800 33 0 0.0594 0.0026 0.8227 0.1051 0.0102
Fifield NJ 021 004501 27 0 0.9356 0.0022 0.0285 0.0078 0.0259
Zhou NJ 021 004501 28 1 0.0098 0.0018 0.0007 0.9820 0.0058
Ratkovic NJ 021 004000 35 0 0.9187 0.0108 0.0108 0.0108 0.0488

Discussion (0)
Log in to join the discussion.