StatsOtter Causal inference workflows
11
Workflow·2 steps

Sensitive questions, protected answers (rr)

Summary by StatsOtter

Regression for randomized-response surveys — recover predictors of a sensitive behavior while every respondent's individual answer stays private.

1

Input · what goes in

Survey data with the randomized-response item and predictor covariates.

Show data format & exampleHide example
rr.q1 asset.index married age
1 0.4 1 41
0 -1.1 0 33
1 0.9 1 52
0 0.2 0 29
2

Pipeline · the recipe

↑ Click any step in the diagram to read its logic, code, assumptions & discussion.

1
Data prep

Load the survey data

Data preparation — shapes the raw inputs into what the estimator expects.

What happens here

Use the Nigeria randomized-response survey, with the design probabilities for the forced-known design.

Reads from the input data Feeds into #2
Key code
# Install:  install.packages("rr")
library(rr)
data(nigeria)
set.seed(1)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
2
Estimation

Fit the randomized-response regression

The core estimate — where the causal quantity itself is computed.

What happens here

rrreg deconvolves the known randomization noise and fits a logistic model for the latent sensitive trait.

Formula
\Pr(Z_i^*=1\mid x_i)=\mathrm{logit}^{-1}(x_i^\top\beta),\quad Y_i=\text{noisy RR of }Z_i^*
Reads from #1 Feeds into the final output
Key code
out <- rrreg(rr.q1 ~ cov.asset.index + cov.married + age,
             data = nigeria, p = 2/3, p1 = 1/6, p0 = 1/6,
             design = "forced-known")
summary(out)

Reference / docs ↗

Discussion on this step (0)
  • No comments on this step yet — be the first.
3

Output · what you get

Randomized-response regression coefficients (95% CIs) for predictors of the sensitive behavior.
Fig 1Randomized-response regression coefficients (95% CIs) for predictors of the sensitive behavior.

Result figure rendered by StatsOtter from the package's documented example — unofficial community showcase; all credit to the original authors.

Result · the numbers

\Pr(Z_i^*=1\mid x_i)=\mathrm{logit}^{-1}(x_i^\top\beta),\quad Y_i=\text{noisy RR of }Z_i^*

⚠️ Unofficial community showcase of rr (docs). Not affiliated with the authors — all credit to Graeme Blair, Yang-Yang Zhou & Kosuke Imai; this summarizes public documentation.

What it does. The randomized-response technique lets people answer a sensitive yes/no question truthfully without revealing their answer, because a coin (known probability) sometimes dictates the response. rr (Blair, Imai & Zhou 2015) provides the multivariate regression that recovers how covariates predict the latent truthful response.

How it works. Given the design probabilities (mirrored, forced-known, or unrelated-question), rrreg() fits a maximum-likelihood logistic model for the latent sensitive trait, deconvolving the randomization noise. Companion functions predict prevalence and combine RR with direct questions.

Assumptions. Respondents follow the randomization device and answer truthfully under its protection; the design probabilities are known.

What you get — Logistic-regression coefficients (with SEs) linking covariates to the latent sensitive response, despite the privacy noise.

Example output

Randomized Response Technique Regression 

Estimated coefficients (logistic):
                     est     se      z    p
(Intercept)        -0.91  0.27  -3.37  0.00
cov.asset.index     0.14  0.06   2.33  0.02
cov.married         0.32  0.18   1.78  0.08
age                 0.01  0.01   1.02  0.31

Design: forced-known   N = 2400

Links: package · paper

Discussion (0)

  • No comments yet — start the conversation.