Bayesian Statistics for Data Scientists
Apply Bayesian methods to make better decisions with uncertainty. Covers prior selection, posterior inference, Bayesian A/B testing, credible intervals, hierarchical models, and the patterns that quantify what you don't know as precisely as what you do.
Frequentist statistics answers: “If the null hypothesis is true, how surprised should I be by this data?” Bayesian statistics answers: “Given the data, what is the probability that the treatment works?” The Bayesian answer is what decision-makers actually want. It converts data into calibrated beliefs.
Bayesian vs. Frequentist
Frequentist A/B Test:
Result: "p-value = 0.04, statistically significant"
Interpretation: "If there were NO difference, there's a 4% chance
of observing data this extreme"
Decision-maker asks: "So is B better than A?"
Statistician: "I can only say the data is unlikely under H0"
Decision-maker: "...what?"
Bayesian A/B Test:
Result: "P(B > A) = 94.2%, expected lift = 3.8% [1.2%, 6.5%]"
Interpretation: "There is a 94.2% probability that B is better,
with expected improvement between 1.2% and 6.5%"
Decision-maker asks: "So is B better than A?"
Statistician: "94.2% probability, yes, with 3.8% expected lift"
Decision-maker: "Ship it"
Bayesian A/B Testing
import numpy as np
from scipy import stats
class BayesianABTest:
"""Bayesian approach to A/B testing."""
def __init__(self, prior_alpha=1, prior_beta=1):
# Uninformative prior: Beta(1, 1) = Uniform
# Informative prior: Beta(10, 90) if you expect ~10% conversion
self.prior_alpha = prior_alpha
self.prior_beta = prior_beta
def update(self, successes, trials):
"""Update prior with observed data to get posterior."""
posterior_alpha = self.prior_alpha + successes
posterior_beta = self.prior_beta + (trials - successes)
return stats.beta(posterior_alpha, posterior_beta)
def compare(self, control_data, treatment_data, n_samples=100_000):
"""Estimate probability that treatment beats control."""
control_post = self.update(
control_data["conversions"],
control_data["visitors"]
)
treatment_post = self.update(
treatment_data["conversions"],
treatment_data["visitors"]
)
# Sample from posteriors
control_samples = control_post.rvs(n_samples)
treatment_samples = treatment_post.rvs(n_samples)
# P(treatment > control)
prob_treatment_better = np.mean(treatment_samples > control_samples)
# Expected lift
lift_samples = (treatment_samples - control_samples) / control_samples
expected_lift = np.mean(lift_samples)
# 95% credible interval for lift
ci_lower = np.percentile(lift_samples, 2.5)
ci_upper = np.percentile(lift_samples, 97.5)
return {
"prob_treatment_better": prob_treatment_better,
"expected_lift": expected_lift,
"credible_interval": (ci_lower, ci_upper),
"risk_of_choosing_treatment": np.mean(
np.minimum(lift_samples, 0) # Expected loss if wrong
),
}
# Usage:
test = BayesianABTest()
result = test.compare(
control_data={"conversions": 120, "visitors": 2000},
treatment_data={"conversions": 145, "visitors": 2000},
)
# P(B > A) = 94.2%
# Expected lift = 3.8%
# 95% CI: [1.2%, 6.5%]
# Risk: -0.1% (tiny downside risk)
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Flat priors when you have knowledge | Slower convergence, wasted data | Informative priors from historical data |
| Only report point estimates | Decision-makers miss uncertainty | Always report credible intervals |
| Ignore prior sensitivity | Conclusions driven by prior choice | Sensitivity analysis with multiple priors |
| Bayesian with tiny datasets | Prior dominates the posterior | Acknowledge prior influence, collect more data |
| Misinterpret credible intervals | ”95% means the true value is there" | "95% probability given the data and prior” |
Bayesian statistics gives decision-makers what they actually need: probabilities of outcomes, expected values, and calibrated uncertainty. It speaks the language of decisions, not the language of p-values.