Time Series Forecasting in Production
End-to-end time series forecasting for production systems. Covers classical methods, modern ML approaches, forecast evaluation, and building automated forecasting pipelines.
Time series forecasting powers demand planning, capacity management, financial projections, and anomaly detection. Yet most forecasting projects fail in production — not because the model is wrong, but because the pipeline around it is fragile. A production forecasting system needs automated retraining, forecast evaluation, uncertainty quantification, and graceful handling of missing data and regime changes.
Method Selection Guide
| Data Characteristics | Recommended Method | When It Fails |
|---|---|---|
| Strong seasonality, stable trend | SARIMA / ETS | Regime changes, multiple seasonalities |
| Multiple seasonalities (daily+weekly) | Prophet / MSTL | Complex non-linear patterns |
| Many related series | LightGBM / XGBoost | Short series (< 100 points) |
| Long history, complex patterns | N-BEATS / TFT | Limited data, need interpretability |
| Hierarchical data | Reconciliation + any base model | Incoherent base forecasts |
The practical rule: Start with a seasonal naive baseline (repeat last year’s pattern). If you can’t beat seasonal naive, your model isn’t learning anything useful. Then try ETS and Prophet. Only reach for deep learning if simpler methods plateau and you have > 1,000 data points per series.
Production Pipeline Architecture
Data Ingestion → Preprocessing → Feature Engineering →
Model Training → Forecast Generation → Evaluation →
Reconciliation → Storage → API / Dashboard
Preprocessing for Real-World Data
Real time series data is messy: missing values, outliers, calendar effects, and structural breaks.
class TimeSeriesPreprocessor:
def __init__(self, freq='D'):
self.freq = freq
def process(self, series: pd.Series) -> pd.Series:
# 1. Ensure regular frequency
series = series.asfreq(self.freq)
# 2. Handle missing values (max 3 consecutive)
series = series.interpolate(method='time', limit=3)
# 3. Detect and replace outliers (IQR method)
q1, q3 = series.quantile([0.25, 0.75])
iqr = q3 - q1
lower, upper = q1 - 3 * iqr, q3 + 3 * iqr
series = series.clip(lower, upper)
# 4. Flag remaining gaps (fill with seasonal average)
if series.isna().any():
seasonal_avg = series.groupby(series.index.dayofweek).transform('mean')
series = series.fillna(seasonal_avg)
return series
Forecast Evaluation
Metrics That Matter
| Metric | Formula | Use When |
|---|---|---|
| MAPE | Mean Absolute Percentage Error | Comparing across different scales |
| RMSSE | Root Mean Squared Scaled Error | M5/M6 competition standard |
| WAPE | Weighted Absolute Percentage Error | Aggregated business reporting |
| Coverage | % of actuals within prediction interval | Evaluating uncertainty |
Never use MAPE alone. It’s undefined when actuals are zero and biased toward under-forecasting. Use WAPE for business reporting and RMSSE for model comparison.
Backtesting Protocol
def time_series_cv(series, model_fn, n_splits=5, horizon=30, gap=7):
"""Walk-forward cross-validation with gap."""
results = []
for i in range(n_splits):
# Training: all data up to cutoff
cutoff = len(series) - (n_splits - i) * horizon - gap
train = series[:cutoff]
# Gap: skip `gap` days to simulate production delay
test_start = cutoff + gap
test = series[test_start:test_start + horizon]
# Forecast
model = model_fn(train)
forecast = model.predict(horizon)
results.append({
'fold': i,
'mape': mean_absolute_percentage_error(test, forecast),
'coverage': prediction_interval_coverage(test, forecast)
})
return pd.DataFrame(results)
The gap parameter simulates the real-world delay between data availability and forecast generation. If your data pipeline has a 2-day lag, set gap=2. Omitting the gap overestimates accuracy.
Uncertainty Quantification
Point forecasts are almost always wrong. Prediction intervals communicate the range of likely outcomes, which is far more useful for decision-making.
Methods:
- Parametric: Assume residuals follow a distribution (normal, Student-t)
- Bootstrap: Resample residuals and regenerate forecasts
- Quantile regression: Directly model quantiles (10th, 50th, 90th)
- Conformal prediction: Distribution-free, guaranteed coverage
For business use: Report the 10th, 50th, and 90th percentile forecasts. The 10th percentile is your conservative plan, the 50th is the expected outcome, and the 90th is the optimistic scenario. This maps cleanly to “worst case / base case / best case” planning.
Monitoring and Retraining
Forecast Monitoring
Track these metrics in production:
- Forecast accuracy decay: Does accuracy degrade over the forecast horizon?
- Bias detection: Are forecasts systematically high or low?
- Coverage calibration: Do 90% prediction intervals actually contain 90% of actuals?
- Concept drift: Has the underlying data distribution changed?
Retraining Triggers
- Scheduled: Weekly or monthly, depending on data velocity
- Performance-based: Retrain when rolling MAPE exceeds threshold
- Event-driven: After known structural changes (new product launch, policy change)
- Drift-detected: When statistical drift tests fail (PSI > 0.2)
The best forecasting systems aren’t the ones with the most sophisticated models — they’re the ones that detect when their models are wrong and adapt automatically.