Time Series Forecasting
Build reliable time series forecasting models for business metrics. Covers statistical methods (ARIMA, ETS), machine learning approaches, Prophet, neural forecasting, feature engineering for temporal data, and the evaluation frameworks that separate useful forecasts from noise.
Time series forecasting predicts future values based on historical patterns. Revenue forecasting, capacity planning, demand prediction, and anomaly detection all depend on accurate time series models. The challenge is that time series data has unique properties — trend, seasonality, autocorrelation — that standard ML models do not handle well out of the box.
Time Series Components
Raw signal = Trend + Seasonality + Residual
Trend: Long-term direction (growing, declining, flat)
Seasonality: Repeating patterns (daily, weekly, monthly, yearly)
Residual: Random noise after removing trend and seasonality
Example (e-commerce revenue):
Trend: 15% YoY growth
Weekly seasonality: Peak on weekends
Yearly seasonality: Holiday spikes (Nov-Dec)
Residual: Day-to-day variation
Statistical Methods
ARIMA
from statsmodels.tsa.arima.model import ARIMA
# ARIMA(p, d, q)
# p: autoregressive order (how many past values)
# d: differencing order (how many times to difference)
# q: moving average order (how many past errors)
model = ARIMA(train_data, order=(2, 1, 2))
fitted = model.fit()
# Forecast next 30 days
forecast = fitted.forecast(steps=30)
# With confidence intervals
forecast_result = fitted.get_forecast(steps=30)
ci = forecast_result.conf_int(alpha=0.05) # 95% CI
Prophet (Meta)
from prophet import Prophet
# Prophet handles trend, seasonality, and holidays automatically
model = Prophet(
yearly_seasonality=True,
weekly_seasonality=True,
daily_seasonality=False,
changepoint_prior_scale=0.05, # Flexibility of trend
)
# Add custom seasonality
model.add_seasonality(name='monthly', period=30.5, fourier_order=5)
# Add holidays
model.add_country_holidays(country_name='US')
# Fit
model.fit(df[['ds', 'y']]) # ds: date, y: value
# Forecast
future = model.make_future_dataframe(periods=90)
forecast = model.predict(future)
# Components (trend, seasonality breakdown)
model.plot_components(forecast)
Feature Engineering for Temporal Data
def create_temporal_features(df, date_col='date'):
"""Create features that capture temporal patterns."""
df = df.copy()
# Calendar features
df['day_of_week'] = df[date_col].dt.dayofweek
df['month'] = df[date_col].dt.month
df['quarter'] = df[date_col].dt.quarter
df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int)
df['is_month_end'] = df[date_col].dt.is_month_end.astype(int)
# Lag features
for lag in [1, 7, 14, 28]:
df[f'lag_{lag}'] = df['value'].shift(lag)
# Rolling statistics
for window in [7, 14, 28]:
df[f'rolling_mean_{window}'] = df['value'].rolling(window).mean()
df[f'rolling_std_{window}'] = df['value'].rolling(window).std()
# Expanding features
df['expanding_mean'] = df['value'].expanding().mean()
return df
Evaluation
# Time series cross-validation (walk-forward)
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=5)
errors = []
for train_idx, test_idx in tscv.split(data):
train = data.iloc[train_idx]
test = data.iloc[test_idx]
model.fit(train)
predictions = model.predict(test)
mape = mean_absolute_percentage_error(test['y'], predictions)
errors.append(mape)
print(f"Average MAPE across folds: {np.mean(errors):.2%}")
| Metric | Formula | When to Use |
|---|---|---|
| MAE | mean(abs(actual - predicted)) | When all errors matter equally |
| MAPE | mean(abs(actual - predicted) / actual) | When relative error matters |
| RMSE | sqrt(mean((actual - predicted)²)) | When large errors are costly |
| sMAPE | Symmetric MAPE | When values cross zero |
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Random train/test split | Data leakage, inflated metrics | Walk-forward / time series CV |
| Ignoring seasonality | Model misses repeating patterns | Decompose and model seasonality |
| No baseline comparison | Cannot assess model value | Compare vs naive forecast (last value) |
| Forecasting too far ahead | Accuracy degrades rapidly | Limit horizon, quantify uncertainty |
| Single point forecast | False precision, no uncertainty | Prediction intervals with confidence levels |
Time series forecasting is not about predicting the future perfectly — it is about quantifying uncertainty and providing decision-makers with ranges and probabilities.