ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Causal Inference for Product

Move beyond correlation to understand causation in product decisions. Covers A/B test limitations, difference-in-differences, instrumental variables, regression discontinuity, propensity score matching, and when to use each causal inference technique.

A/B testing is the gold standard for causal inference, but it is not always feasible. You cannot randomly assign users to a data breach to study its impact on churn. You cannot randomly withhold features from paying customers. Causal inference techniques let you identify cause-and-effect relationships from observational data when experiments are impossible or impractical.


When A/B Tests Are Not Enough

Cannot A/B test:
  - Pricing changes (legal, brand risk)
  - Outages and incidents (ethical issues)
  - Competitor actions (not in your control)
  - Long-term behavior changes (experiment duration limits)
  - Network effects (treatment group affects control group)

Observational data alternatives:
  - Difference-in-differences
  - Instrumental variables
  - Regression discontinuity
  - Propensity score matching
  - Synthetic control

Difference-in-Differences (DiD)

Compares the change over time between a treatment and control group:

import statsmodels.formula.api as smf

# Setup: Feature launched in Region A (treatment), not Region B (control)
# Measure: Revenue before and after launch in both regions

model = smf.ols(
    'revenue ~ treatment * post_launch + C(region) + C(month)',
    data=df
).fit()

# treatment:post_launch coefficient = causal effect of feature on revenue
print(f"Feature effect: ${model.params['treatment:post_launch']:.2f}")
print(f"p-value: {model.pvalues['treatment:post_launch']:.4f}")
Key assumption: Parallel trends
  Without treatment, both groups would have followed the same trend.
  
  Validate: Check pre-treatment trends are parallel
  If not parallel → DiD estimate is biased

Propensity Score Matching

Match treated users to similar untreated users based on observable characteristics:

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors

# Step 1: Estimate propensity scores
# P(treatment | features) for each user
propensity_model = LogisticRegression()
propensity_model.fit(X[features], X['treated'])
X['propensity_score'] = propensity_model.predict_proba(X[features])[:, 1]

# Step 2: Match treated to untreated with similar propensity
treated = X[X['treated'] == 1]
untreated = X[X['treated'] == 0]

nn = NearestNeighbors(n_neighbors=1, metric='euclidean')
nn.fit(untreated[['propensity_score']])
distances, indices = nn.kneighbors(treated[['propensity_score']])

matched_untreated = untreated.iloc[indices.flatten()]

# Step 3: Compare outcomes
att = treated['outcome'].mean() - matched_untreated['outcome'].mean()
print(f"Average Treatment Effect on Treated: {att:.3f}")

Regression Discontinuity

Exploit a threshold that determines treatment:

# Users who score above 80 get premium features
# Compare users just above 80 (treated) vs just below (untreated)

# Bandwidth: Users scoring 75-85
bandwidth = 5
near_threshold = df[
    (df['score'] >= 80 - bandwidth) & 
    (df['score'] <= 80 + bandwidth)
]

model = smf.ols(
    'retention ~ treated + score_centered + treated:score_centered',
    data=near_threshold
).fit()

# 'treated' coefficient = causal effect at the threshold
print(f"Effect of premium features on retention: {model.params['treated']:.3f}")

Choosing the Right Method

MethodWhen to UseKey Assumption
A/B TestRandom assignment possibleRandom assignment
DiDTreatment at specific time, control group existsParallel trends
Propensity MatchingMany observables, no time dimensionNo unobserved confounders
Regression DiscontinuityTreatment based on a thresholdNo manipulation around threshold
Instrumental VariablesUnobserved confounders existValid instrument available
Synthetic ControlOne treated unit, many controlsWeighted combination matches pre-treatment

Anti-Patterns

Anti-PatternConsequenceFix
Confusing correlation with causationWrong product decisionsUse causal inference techniques
Not checking parallel trendsBiased DiD estimatesPlot pre-treatment trends
Matching on too many variablesOverfitting, poor matchesUse propensity score, not exact matching
Small sample near discontinuityLow statistical powerIncrease bandwidth (with trade-offs)
Reporting results without confidence intervalsOverconfidence in estimatesAlways report uncertainty

Causal inference is how data teams move from “Feature X users have higher retention” (correlation) to “Feature X causes 5% higher retention” (causation). The distinction determines whether the next $1M investment is well-spent or wasted.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →