Graceful Shutdown: Draining Connections Without Dropping Requests
Implement graceful shutdown patterns that let your services restart, deploy, and scale without dropping in-flight requests. Covers signal handling, connection draining, health check coordination, Kubernetes preStop hooks, and timeout strategies for zero-downtime deployments.
Every deployment is a controlled crash. You terminate a process that is actively handling requests, and you hope nothing breaks. Without graceful shutdown, “nothing breaks” is a hope, not a guarantee.
Graceful shutdown converts that hope into a protocol: stop accepting new work, finish in-flight work, release resources, then exit. It is the difference between a deployment that drops 0.01% of requests and one that drops zero.
The Shutdown Sequence
A properly implemented shutdown follows this order:
- Receive termination signal (SIGTERM)
- Stop accepting new connections (close the listening socket)
- Fail health checks (tell the load balancer to stop routing)
- Drain in-flight requests (wait for active handlers to complete)
- Close downstream connections (database pools, message brokers, caches)
- Flush buffers (logs, metrics, traces)
- Exit with status 0
The critical insight is that steps 2 and 3 must happen before step 4. If the load balancer continues routing while you are draining, new requests arrive on a dying process.
Signal Handling
Unix processes receive signals for lifecycle events. The two that matter for shutdown:
- SIGTERM (15): “Please shut down gracefully.” Sent by Kubernetes, systemd, and most orchestrators.
- SIGKILL (9): “Die immediately.” Cannot be caught or handled. Sent after SIGTERM times out.
import signal
import asyncio
class GracefulServer:
def __init__(self):
self.shutting_down = False
self.active_requests = 0
def setup_signals(self):
signal.signal(signal.SIGTERM, self._handle_shutdown)
signal.signal(signal.SIGINT, self._handle_shutdown)
def _handle_shutdown(self, signum, frame):
print(f"Received signal {signum}, initiating graceful shutdown")
self.shutting_down = True
# Stop accepting new connections
self.server.close()
Common Mistakes
Ignoring SIGTERM entirely. Many frameworks handle it by default, but custom scripts and workers often do not. The process receives SIGTERM, ignores it, then gets SIGKILL 30 seconds later — dropping everything.
Calling sys.exit() in the signal handler. This raises SystemExit in the main thread, which can interrupt critical sections. Instead, set a flag and let the main loop exit cleanly.
Health Check Coordination
Load balancers and service meshes use health checks to decide where to route traffic. During shutdown, your health check must start failing before you stop processing:
@app.get("/healthz")
async def health():
if server.shutting_down:
return Response(status_code=503, content="shutting down")
return Response(status_code=200, content="ok")
The timing matters. Most load balancers check health every 5-10 seconds. If you stop accepting connections before the load balancer detects you are unhealthy, requests hit a closed port and fail with connection refused.
The Grace Period
Add a delay between failing health checks and closing connections:
def _handle_shutdown(self, signum, frame):
self.shutting_down = True # Health checks start returning 503
# Wait for load balancer to detect and de-register
time.sleep(5)
# Now stop accepting new connections
self.server.close()
# Wait for in-flight requests to complete
self._drain_connections(timeout=25)
# Clean up resources
self._cleanup()
Connection Draining
Once you stop accepting new connections, you must wait for in-flight requests to complete:
async def _drain_connections(self, timeout: int = 30):
"""Wait for all active requests to complete, with timeout."""
start = time.time()
while self.active_requests > 0:
elapsed = time.time() - start
if elapsed > timeout:
print(f"Drain timeout: {self.active_requests} requests abandoned")
break
print(f"Draining: {self.active_requests} requests remaining "
f"({timeout - elapsed:.0f}s left)")
await asyncio.sleep(0.5)
print("All connections drained")
Request Tracking
Track active requests with a counter:
@app.middleware("http")
async def track_requests(request, call_next):
if server.shutting_down:
return Response(status_code=503, content="Service shutting down")
server.active_requests += 1
try:
response = await call_next(request)
return response
finally:
server.active_requests -= 1
Kubernetes-Specific Patterns
The preStop Hook
Kubernetes sends SIGTERM and updates the Endpoints object concurrently, not sequentially. This means traffic can arrive after SIGTERM. The preStop hook adds a delay:
lifecycle:
preStop:
exec:
command: ["sleep", "5"]
terminationGracePeriodSeconds: 35
The sequence becomes:
- Pod marked for termination
preStophook runs (5s sleep)- During sleep, kube-proxy removes the pod from service endpoints
- SIGTERM sent to the container
- Application drains connections (up to 30s)
- If still running after 35s total, SIGKILL
Setting terminationGracePeriodSeconds
This is the total budget: preStop delay + drain time + cleanup time. Set it higher than the sum:
preStop: 5s
Drain timeout: 25s
Cleanup: 3s
Total needed: 33s
Set to: 35s (with buffer)
Background Workers and Queues
Long-running background jobs need different handling than HTTP requests:
class GracefulWorker:
def run(self):
while not self.shutting_down:
job = self.queue.get(timeout=1)
if job is None:
continue
try:
self.process(job)
self.queue.ack(job)
except Exception:
self.queue.nack(job) # Return to queue for retry
# Shutdown: stop pulling new jobs but finish current one
print("Worker shutdown: no new jobs will be pulled")
The key principle: never acknowledge a job you have not completed. If SIGKILL arrives mid-processing, an unacknowledged job returns to the queue and gets picked up by another worker.
Database and Resource Cleanup
Close connections in reverse order of dependency:
async def _cleanup(self):
# 1. Flush outgoing buffers
await self.metrics_exporter.flush()
await self.logger.flush()
# 2. Close downstream clients
await self.redis_pool.close()
await self.http_session.close()
# 3. Close database pool last (other cleanup may need it)
await self.db_pool.close()
print("All resources released")
Connection Pool Draining
Database connection pools may have outstanding queries. Close them with a timeout:
# SQLAlchemy
engine.dispose()
# asyncpg
await pool.close() # Waits for in-flight queries
# Redis
await redis.close()
await redis.connection_pool.disconnect()
Testing Graceful Shutdown
Graceful shutdown is one of the hardest things to test because it only matters under load during a deployment.
Load Test + Rolling Restart
- Start a sustained load test (100 RPS for 5 minutes)
- Trigger a rolling deployment midway
- Assert zero 5xx errors during the deployment window
- Assert zero dropped connections in client logs
Signal Test
# Start the service
python server.py &
PID=$!
# Send requests in a loop
for i in $(seq 1 100); do
curl -s http://localhost:8000/api/health &
done
# Send SIGTERM while requests are in flight
kill -TERM $PID
# Wait for exit
wait $PID
echo "Exit code: $?" # Should be 0
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| No signal handler | SIGKILL after timeout, dropped requests | Catch SIGTERM |
| Immediate socket close | Connection refused errors | Delay after health check flip |
| No drain timeout | Process hangs forever on stuck request | Always set a max drain duration |
| Acking jobs before completion | Lost work on crash | Ack after processing |
| Cleanup before drain | Resource errors in in-flight requests | Drain first, then cleanup |
Graceful shutdown is not glamorous work. But it is the difference between deployments that require a maintenance window and deployments that nobody notices.