Bayesian Statistics for Data Scientists

Frequentist statistics answers: “If the null hypothesis is true, how surprised should I be by this data?” Bayesian statistics answers: “Given the data, what is the probability that the treatment works?” The Bayesian answer is what decision-makers actually want. It converts data into calibrated beliefs.

Bayesian vs. Frequentist

Frequentist A/B Test:
  Result: "p-value = 0.04, statistically significant"
  Interpretation: "If there were NO difference, there's a 4% chance
                   of observing data this extreme"
  
  Decision-maker asks: "So is B better than A?"
  Statistician: "I can only say the data is unlikely under H0"
  Decision-maker: "...what?"

Bayesian A/B Test:
  Result: "P(B > A) = 94.2%, expected lift = 3.8% [1.2%, 6.5%]"
  Interpretation: "There is a 94.2% probability that B is better,
                   with expected improvement between 1.2% and 6.5%"
  
  Decision-maker asks: "So is B better than A?"
  Statistician: "94.2% probability, yes, with 3.8% expected lift"
  Decision-maker: "Ship it"

Bayesian A/B Testing

import numpy as np
from scipy import stats

class BayesianABTest:
    """Bayesian approach to A/B testing."""
    
    def __init__(self, prior_alpha=1, prior_beta=1):
        # Uninformative prior: Beta(1, 1) = Uniform
        # Informative prior: Beta(10, 90) if you expect ~10% conversion
        self.prior_alpha = prior_alpha
        self.prior_beta = prior_beta
    
    def update(self, successes, trials):
        """Update prior with observed data to get posterior."""
        posterior_alpha = self.prior_alpha + successes
        posterior_beta = self.prior_beta + (trials - successes)
        return stats.beta(posterior_alpha, posterior_beta)
    
    def compare(self, control_data, treatment_data, n_samples=100_000):
        """Estimate probability that treatment beats control."""
        control_post = self.update(
            control_data["conversions"], 
            control_data["visitors"]
        )
        treatment_post = self.update(
            treatment_data["conversions"], 
            treatment_data["visitors"]
        )
        
        # Sample from posteriors
        control_samples = control_post.rvs(n_samples)
        treatment_samples = treatment_post.rvs(n_samples)
        
        # P(treatment > control)
        prob_treatment_better = np.mean(treatment_samples > control_samples)
        
        # Expected lift
        lift_samples = (treatment_samples - control_samples) / control_samples
        expected_lift = np.mean(lift_samples)
        
        # 95% credible interval for lift
        ci_lower = np.percentile(lift_samples, 2.5)
        ci_upper = np.percentile(lift_samples, 97.5)
        
        return {
            "prob_treatment_better": prob_treatment_better,
            "expected_lift": expected_lift,
            "credible_interval": (ci_lower, ci_upper),
            "risk_of_choosing_treatment": np.mean(
                np.minimum(lift_samples, 0)  # Expected loss if wrong
            ),
        }

# Usage:
test = BayesianABTest()
result = test.compare(
    control_data={"conversions": 120, "visitors": 2000},
    treatment_data={"conversions": 145, "visitors": 2000},
)
# P(B > A) = 94.2%
# Expected lift = 3.8%
# 95% CI: [1.2%, 6.5%]
# Risk: -0.1% (tiny downside risk)

Anti-Patterns

Anti-Pattern	Consequence	Fix
Flat priors when you have knowledge	Slower convergence, wasted data	Informative priors from historical data
Only report point estimates	Decision-makers miss uncertainty	Always report credible intervals
Ignore prior sensitivity	Conclusions driven by prior choice	Sensitivity analysis with multiple priors
Bayesian with tiny datasets	Prior dominates the posterior	Acknowledge prior influence, collect more data
Misinterpret credible intervals	”95% means the true value is there"	"95% probability given the data and prior”

Bayesian statistics gives decision-makers what they actually need: probabilities of outcomes, expected values, and calibrated uncertainty. It speaks the language of decisions, not the language of p-values.

Bayesian vs. Frequentist

Bayesian A/B Testing

Anti-Patterns

More in Data Science

A/B Testing at Scale

A/B Testing Statistical Framework

A/B Testing Infrastructure: Making Data-Driven Decisions Without Breaking Production