Causal Inference for Product
Move beyond correlation to understand causation in product decisions. Covers A/B test limitations, difference-in-differences, instrumental variables, regression discontinuity, propensity score matching, and when to use each causal inference technique.
A/B testing is the gold standard for causal inference, but it is not always feasible. You cannot randomly assign users to a data breach to study its impact on churn. You cannot randomly withhold features from paying customers. Causal inference techniques let you identify cause-and-effect relationships from observational data when experiments are impossible or impractical.
When A/B Tests Are Not Enough
Cannot A/B test:
- Pricing changes (legal, brand risk)
- Outages and incidents (ethical issues)
- Competitor actions (not in your control)
- Long-term behavior changes (experiment duration limits)
- Network effects (treatment group affects control group)
Observational data alternatives:
- Difference-in-differences
- Instrumental variables
- Regression discontinuity
- Propensity score matching
- Synthetic control
Difference-in-Differences (DiD)
Compares the change over time between a treatment and control group:
import statsmodels.formula.api as smf
# Setup: Feature launched in Region A (treatment), not Region B (control)
# Measure: Revenue before and after launch in both regions
model = smf.ols(
'revenue ~ treatment * post_launch + C(region) + C(month)',
data=df
).fit()
# treatment:post_launch coefficient = causal effect of feature on revenue
print(f"Feature effect: ${model.params['treatment:post_launch']:.2f}")
print(f"p-value: {model.pvalues['treatment:post_launch']:.4f}")
Key assumption: Parallel trends
Without treatment, both groups would have followed the same trend.
Validate: Check pre-treatment trends are parallel
If not parallel → DiD estimate is biased
Propensity Score Matching
Match treated users to similar untreated users based on observable characteristics:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors
# Step 1: Estimate propensity scores
# P(treatment | features) for each user
propensity_model = LogisticRegression()
propensity_model.fit(X[features], X['treated'])
X['propensity_score'] = propensity_model.predict_proba(X[features])[:, 1]
# Step 2: Match treated to untreated with similar propensity
treated = X[X['treated'] == 1]
untreated = X[X['treated'] == 0]
nn = NearestNeighbors(n_neighbors=1, metric='euclidean')
nn.fit(untreated[['propensity_score']])
distances, indices = nn.kneighbors(treated[['propensity_score']])
matched_untreated = untreated.iloc[indices.flatten()]
# Step 3: Compare outcomes
att = treated['outcome'].mean() - matched_untreated['outcome'].mean()
print(f"Average Treatment Effect on Treated: {att:.3f}")
Regression Discontinuity
Exploit a threshold that determines treatment:
# Users who score above 80 get premium features
# Compare users just above 80 (treated) vs just below (untreated)
# Bandwidth: Users scoring 75-85
bandwidth = 5
near_threshold = df[
(df['score'] >= 80 - bandwidth) &
(df['score'] <= 80 + bandwidth)
]
model = smf.ols(
'retention ~ treated + score_centered + treated:score_centered',
data=near_threshold
).fit()
# 'treated' coefficient = causal effect at the threshold
print(f"Effect of premium features on retention: {model.params['treated']:.3f}")
Choosing the Right Method
| Method | When to Use | Key Assumption |
|---|---|---|
| A/B Test | Random assignment possible | Random assignment |
| DiD | Treatment at specific time, control group exists | Parallel trends |
| Propensity Matching | Many observables, no time dimension | No unobserved confounders |
| Regression Discontinuity | Treatment based on a threshold | No manipulation around threshold |
| Instrumental Variables | Unobserved confounders exist | Valid instrument available |
| Synthetic Control | One treated unit, many controls | Weighted combination matches pre-treatment |
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Confusing correlation with causation | Wrong product decisions | Use causal inference techniques |
| Not checking parallel trends | Biased DiD estimates | Plot pre-treatment trends |
| Matching on too many variables | Overfitting, poor matches | Use propensity score, not exact matching |
| Small sample near discontinuity | Low statistical power | Increase bandwidth (with trade-offs) |
| Reporting results without confidence intervals | Overconfidence in estimates | Always report uncertainty |
Causal inference is how data teams move from “Feature X users have higher retention” (correlation) to “Feature X causes 5% higher retention” (causation). The distinction determines whether the next $1M investment is well-spent or wasted.