ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

On-Call Compensation and Burnout Prevention

Design on-call programs that compensate engineers fairly, prevent burnout, and maintain team morale. Covers compensation models, workload balancing, schedule design, burnout indicators, escalation policies, and the organizational practices that make on-call sustainable long-term.

On-call is a tax on engineers’ personal time. When that tax is unacknowledged — no extra pay, no time off, no reduction in sprint work — engineers burn out or leave. The teams with the worst on-call programs have the highest attrition rates, and replacing an experienced engineer costs 6-12 months of productivity.

This guide covers how to build an on-call program that is sustainable, fair, and does not require heroism.


Compensation Models

ModelHow It WorksProsCons
Flat stipendFixed amount per on-call weekSimple, predictableDoes not reflect actual workload
Hourly rateExtra pay for hours actively respondingDirectly proportional to effortComplex to track
Comp timePaid time off after on-call weekAddresses fatigue directlyTeam needs coverage during comp time
CombinationStipend + overtime for incidentsFair and predictableMost complex to administer
Nothing”It’s part of the job”Attrition, burnout, resentment
Base on-call stipend: $500/week
  Covers: carrying the pager, being available, responding to pages

Incident compensation:
  During business hours: included in salary (no extra)
  After hours (6 PM - 9 AM): $100/hour for active incident work
  Weekends/holidays: $150/hour for active incident work

Comp time:
  After every on-call shift: 1 comp day (taken within 2 weeks)
  After a major incident (> 2 hours after-hours): additional comp day

Sprint load:
  On-call week = 50% sprint capacity (not 100%)
  The other 50% is reserved for incident response and toil

Burnout Indicators

IndicatorHealthyWarningCrisis
Pages per shift< 5 per week (business hours)5-15 per week> 15 per week
After-hours pages< 1 per week1-3 per week> 3 per week
Mean time to resolve< 30 minutes30-60 minutes> 60 minutes
On-call frequencyOnce every 4-6 weeksOnce every 2-3 weeksEvery other week or more
Team sentiment”On-call is manageable""On-call is annoying but OK""I dread my on-call week”

On-Call Health Dashboard

On-Call Health Report — March 2024

  Team: Checkout
  Rotation size: 6 engineers
  On-call frequency: once every 6 weeks ✅

  This month's paging load:
    Total pages: 23 (avg 5.75/week)
    After-hours pages: 4 (avg 1/week) ⚠️
    False alarms: 8 (35%) ❌ — needs alert tuning

  Top paging sources:
    1. checkout-api timeout alerts: 12 (52%) — investigate root cause
    2. payment webhook failures: 5 (22%) — upstream issue, add retry
    3. database connection pool: 4 (17%) — tune pool settings
    4. false alarms: 2 (9%) — remove or tune these alerts

  Action items:
    □ Investigate checkout-api timeout root cause (reduces 52% of pages)
    □ Add retry logic for payment webhooks
    □ Tune database connection pool alert threshold
    □ Remove 2 false alarm alerts

Schedule Design

6-person rotation (recommended minimum):

  Week 1: Alice (primary), Bob (secondary)
  Week 2: Bob (primary), Carol (secondary)
  Week 3: Carol (primary), Dave (secondary)
  Week 4: Dave (primary), Eve (secondary)
  Week 5: Eve (primary), Frank (secondary)
  Week 6: Frank (primary), Alice (secondary)

  Result: each person is on-call once every 6 weeks

  Secondary on-call:
  - Backup if primary is unavailable
  - Automatically escalated to after 15 minutes
  - NOT expected to wake up for every page

Protected Time Rules

RuleWhy
No on-call during PTOOn-call and vacation are incompatible
No back-to-back on-call weeksConsecutive weeks cause acute burnout
Swap requests honored within 48 hoursPeople have lives — emergencies happen
New hires exempt for first 3 monthsNeed ramp-up time and shadow shifts first
On-call week = reduced sprint workCannot do 100% project work AND respond to incidents

Escalation Policy

Incident occurs:
  T+0:    Primary on-call paged (PagerDuty, phone call)
  T+5min: If no acknowledgment → escalate to secondary
  T+15min: If no acknowledgment → escalate to engineering manager
  T+30min: If unresolved → engage senior engineer or architect
  T+60min: If still unresolved → incident commander, broader team

  At any point: on-call engineer can request help without stigma
  "I need help" is a positive signal, not a failure

Implementation Checklist

  • Compensate on-call: stipend + hourly rate for after-hours incidents
  • Give comp time after every on-call shift (1 day off within 2 weeks)
  • Reduce sprint capacity to 50% during on-call weeks
  • Maintain 6+ person rotation (no more frequent than once every 4 weeks)
  • Track paging load: target < 5 pages/week, < 1 after-hours page/week
  • Audit false alarms monthly and eliminate them aggressively
  • Shadow new engineers for 2 on-call shifts before adding to rotation
  • Never schedule on-call during PTO or back-to-back weeks
  • Run quarterly on-call retrospectives: what caused the most pain?
  • Make “asking for help” explicitly encouraged and non-penalized
Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →