ESC
Type to search guides, tutorials, and reference documentation.
← Back to all categories
📊

Data Science

Bayesian statistics, NLP pipelines, data mesh, experiment design, and exploratory analysis.

54 guides
01

Data Pipeline Architecture That Scales Without Rewriting Everything

Design data pipelines that survive growing data volumes, changing schemas, and the inevitable 3 AM failure. Covers batch vs streaming, orchestration, schema evolution, data quality gates, and the patterns that prevent 'Big Rewrite 2.0.'

02

A/B Testing Infrastructure: Making Data-Driven Decisions Without Breaking Production

Build experimentation infrastructure that produces trustworthy results. Covers statistical foundations, feature flag integration, sample size calculations, metric selection, guardrail metrics, and the organizational patterns that prevent HiPPO-driven decisions.

03

SQL Performance Tuning: Making Queries Fast Without Rewriting Everything

Diagnose and fix slow SQL queries using systematic analysis. Covers EXPLAIN plans, index design, query anti-patterns, N+1 problems, connection pooling, and the performance investigation workflow that finds the root cause instead of guessing.

04

Data Warehouse Design: From Raw Events to Business Insights

Design a data warehouse that transforms raw event streams into analytics that drive business decisions. Covers dimensional modeling, the medallion architecture, slowly changing dimensions, ETL vs ELT, data quality frameworks, and the warehouse design that scales without becoming an unmaintainable mess.

05

Feature Engineering for Machine Learning Pipelines

Production feature engineering patterns for ML pipelines. Covers feature stores, temporal features, automated feature selection, and data leakage prevention.

06

Experiment Tracking with MLflow

Production MLflow setup for experiment tracking, model versioning, and artifact management. Covers local and remote tracking servers, model registry, and CI/CD integration.

07

A/B Testing Statistical Framework

Rigorous A/B testing for product decisions. Covers sample size calculation, statistical significance, Bayesian vs frequentist approaches, and common pitfalls that invalidate experiments.

08

Time Series Forecasting in Production

End-to-end time series forecasting for production systems. Covers classical methods, modern ML approaches, forecast evaluation, and building automated forecasting pipelines.

09

Data Quality Monitoring in Production

How to monitor data quality in production pipelines. Covers data contracts, schema validation, anomaly detection, lineage tracking, and building a data quality culture.

10

Causal Inference for Product

Move beyond correlation to understand causation in product decisions. Covers A/B test limitations, difference-in-differences, instrumental variables, regression discontinuity, propensity score matching, and when to use each causal inference technique.

11

Time Series Forecasting

Build reliable time series forecasting models for business metrics. Covers statistical methods (ARIMA, ETS), machine learning approaches, Prophet, neural forecasting, feature engineering for temporal data, and the evaluation frameworks that separate useful forecasts from noise.

12

A/B Testing at Scale

Design and run rigorous A/B tests that produce trustworthy results. Covers experiment design, sample size calculation, statistical significance, guardrail metrics, multi-variant testing, and the common statistical mistakes that lead to wrong conclusions.

13

Recommendation Systems

Build recommendation engines that surface relevant content, products, and experiences. Covers collaborative filtering, content-based filtering, hybrid approaches, evaluation metrics, cold start problem, and the patterns that power personalized recommendations at scale.

14

Survival Analysis for Churn

Apply survival analysis to predict customer churn, subscription retention, and time-to-event outcomes. Covers Kaplan-Meier estimators, Cox proportional hazards, censored data handling, and the patterns that turn retention data into actionable insights.

15

Feature Engineering at Scale

Transform raw data into predictive features for machine learning at production scale. Covers feature stores, feature pipelines, temporal features, encoding strategies, feature drift detection, and the patterns that make feature engineering systematic rather than ad-hoc.

16

Bayesian Statistics for Data Scientists

Apply Bayesian methods to make better decisions with uncertainty. Covers prior selection, posterior inference, Bayesian A/B testing, credible intervals, hierarchical models, and the patterns that quantify what you don't know as precisely as what you do.

17

Natural Language Processing Pipelines

Build production NLP systems that extract meaning from text. Covers text preprocessing, tokenization strategies, named entity recognition, sentiment analysis, text classification, and the patterns that turn unstructured text into actionable structured data.

18

Data Mesh Architecture

Decentralize data ownership using data mesh principles. Covers domain-oriented data ownership, data as a product, self-serve data infrastructure, federated governance, and the patterns that scale data systems with organizational growth.

19

Reinforcement Learning Fundamentals

Train agents to make sequential decisions through trial and error. Covers Markov decision processes, Q-learning, policy gradients, reward shaping, and the patterns that let AI systems learn optimal behavior from interaction with an environment.

20

Data Lakehouse Architecture

Combine data lake flexibility with data warehouse performance. Covers lakehouse design principles, Delta Lake, Apache Iceberg, table formats, schema evolution, time travel, and the patterns that eliminate the data lake vs. warehouse tradeoff.

21

Explainable AI (XAI) Methods

Make machine learning model decisions interpretable and transparent. Covers SHAP values, LIME explanations, feature importance, model-agnostic methods, and the patterns that bridge the gap between model accuracy and human understanding.

22

Anomaly Detection at Scale

Detect unusual patterns in high-volume data streams. Covers statistical anomaly detection, isolation forests, time-series anomaly detection, and the patterns that find needles in the haystack of millions of data points per second.

23

Feature Engineering for Machine Learning: From Raw Data to Predictive Power

A practitioner's guide to feature engineering — transforming raw data into features that improve model performance through encoding, scaling, creation, and selection techniques.

24

Online Learning Pipeline for Real-Time Predictions

Production-ready guide covering online learning pipeline for real-time predictions with implementation patterns, code examples, and anti-patterns for enterprise engineering teams.

25

Statistical Power Analysis for Sample Size Planning

Production-ready guide covering statistical power analysis for sample size planning with implementation patterns, code examples, and anti-patterns for enterprise engineering teams.

26

Ab Test Power Analysis

Production engineering guide for ab test power analysis covering patterns, implementation strategies, and operational best practices.

27

Bayesian Optimization

Production engineering guide for bayesian optimization covering patterns, implementation strategies, and operational best practices.

28

Causal Inference Methods

Production engineering guide for causal inference methods covering patterns, implementation strategies, and operational best practices.

29

Clustering Algorithms

Production engineering guide for clustering algorithms covering patterns, implementation strategies, and operational best practices.

30

Cohort Analysis

Production engineering guide for cohort analysis covering patterns, implementation strategies, and operational best practices.

31

Cross Validation Strategies

Production engineering guide for cross validation strategies covering patterns, implementation strategies, and operational best practices.

32

Data Visualization Best Practices

Production engineering guide for data visualization best practices covering patterns, implementation strategies, and operational best practices.

33

Dimensionality Reduction

Production engineering guide for dimensionality reduction covering patterns, implementation strategies, and operational best practices.

34

Ensemble Methods

Production engineering guide for ensemble methods covering patterns, implementation strategies, and operational best practices.

35

Experiment Design Patterns

Production engineering guide for experiment design patterns covering patterns, implementation strategies, and operational best practices.

36

Feature Store Design

Production engineering guide for feature store design covering patterns, implementation strategies, and operational best practices.

37

Geospatial Analytics

Production engineering guide for geospatial analytics covering patterns, implementation strategies, and operational best practices.

38

Hypothesis Testing Framework

Production engineering guide for hypothesis testing framework covering patterns, implementation strategies, and operational best practices.

39

Metric Design Patterns

Production engineering guide for metric design patterns covering patterns, implementation strategies, and operational best practices.

40

Propensity Score Matching

Production engineering guide for propensity score matching covering patterns, implementation strategies, and operational best practices.

41

Regression Diagnostics

Production engineering guide for regression diagnostics covering patterns, implementation strategies, and operational best practices.

42

Sampling Strategies

Production engineering guide for sampling strategies covering patterns, implementation strategies, and operational best practices.

43

Statistical Process Control

Production engineering guide for statistical process control covering patterns, implementation strategies, and operational best practices.

44

Survival Analysis

Production engineering guide for survival analysis covering patterns, implementation strategies, and operational best practices.

45

Text Mining Techniques

Production engineering guide for text mining techniques covering patterns, implementation strategies, and operational best practices.

46

Ab Testing Statistical Rigor

Production-grade guide to ab testing statistical rigor covering architecture patterns, implementation strategies, testing approaches, and operational best practices for enterprise engineering teams.

47

Data Drift Monitoring

Production-grade guide to data drift monitoring covering architecture patterns, implementation strategies, testing approaches, and operational best practices for enterprise engineering teams.

48

Experiment Tracking Systems

Production-grade guide to experiment tracking systems covering architecture patterns, implementation strategies, testing approaches, and operational best practices for enterprise engineering teams.

49

Feature Store Engineering

Production-grade guide to feature store engineering covering architecture patterns, implementation strategies, testing approaches, and operational best practices for enterprise engineering teams.

50

Interpretable Ml Techniques

Production-grade guide to interpretable ml techniques covering architecture patterns, implementation strategies, testing approaches, and operational best practices for enterprise engineering teams.

51

Mlops Production Patterns

Production-grade guide to mlops production patterns covering architecture patterns, implementation strategies, testing approaches, and operational best practices for enterprise engineering teams.

52

Model Fairness Auditing

Production-grade guide to model fairness auditing covering architecture patterns, implementation strategies, testing approaches, and operational best practices for enterprise engineering teams.

53

Model Registry Design

Production-grade guide to model registry design covering architecture patterns, implementation strategies, testing approaches, and operational best practices for enterprise engineering teams.

54

Recommendation System Architecture

Production-grade guide to recommendation system architecture covering architecture patterns, implementation strategies, testing approaches, and operational best practices for enterprise engineering teams.