Graceful Shutdown: Draining Connections Without Dropping Requests

Every deployment is a controlled crash. You terminate a process that is actively handling requests, and you hope nothing breaks. Without graceful shutdown, “nothing breaks” is a hope, not a guarantee.

Graceful shutdown converts that hope into a protocol: stop accepting new work, finish in-flight work, release resources, then exit. It is the difference between a deployment that drops 0.01% of requests and one that drops zero.

The Shutdown Sequence

A properly implemented shutdown follows this order:

Receive termination signal (SIGTERM)
Stop accepting new connections (close the listening socket)
Fail health checks (tell the load balancer to stop routing)
Drain in-flight requests (wait for active handlers to complete)
Close downstream connections (database pools, message brokers, caches)
Flush buffers (logs, metrics, traces)
Exit with status 0

The critical insight is that steps 2 and 3 must happen before step 4. If the load balancer continues routing while you are draining, new requests arrive on a dying process.

Signal Handling

Unix processes receive signals for lifecycle events. The two that matter for shutdown:

SIGTERM (15): “Please shut down gracefully.” Sent by Kubernetes, systemd, and most orchestrators.
SIGKILL (9): “Die immediately.” Cannot be caught or handled. Sent after SIGTERM times out.

import signal
import asyncio

class GracefulServer:
    def __init__(self):
        self.shutting_down = False
        self.active_requests = 0
        
    def setup_signals(self):
        signal.signal(signal.SIGTERM, self._handle_shutdown)
        signal.signal(signal.SIGINT, self._handle_shutdown)
    
    def _handle_shutdown(self, signum, frame):
        print(f"Received signal {signum}, initiating graceful shutdown")
        self.shutting_down = True
        # Stop accepting new connections
        self.server.close()

Common Mistakes

Ignoring SIGTERM entirely. Many frameworks handle it by default, but custom scripts and workers often do not. The process receives SIGTERM, ignores it, then gets SIGKILL 30 seconds later — dropping everything.

Calling sys.exit() in the signal handler. This raises SystemExit in the main thread, which can interrupt critical sections. Instead, set a flag and let the main loop exit cleanly.

Health Check Coordination

Load balancers and service meshes use health checks to decide where to route traffic. During shutdown, your health check must start failing before you stop processing:

@app.get("/healthz")
async def health():
    if server.shutting_down:
        return Response(status_code=503, content="shutting down")
    return Response(status_code=200, content="ok")

The timing matters. Most load balancers check health every 5-10 seconds. If you stop accepting connections before the load balancer detects you are unhealthy, requests hit a closed port and fail with connection refused.

The Grace Period

Add a delay between failing health checks and closing connections:

def _handle_shutdown(self, signum, frame):
    self.shutting_down = True  # Health checks start returning 503
    
    # Wait for load balancer to detect and de-register
    time.sleep(5)
    
    # Now stop accepting new connections
    self.server.close()
    
    # Wait for in-flight requests to complete
    self._drain_connections(timeout=25)
    
    # Clean up resources
    self._cleanup()

Connection Draining

Once you stop accepting new connections, you must wait for in-flight requests to complete:

async def _drain_connections(self, timeout: int = 30):
    """Wait for all active requests to complete, with timeout."""
    start = time.time()
    
    while self.active_requests > 0:
        elapsed = time.time() - start
        if elapsed > timeout:
            print(f"Drain timeout: {self.active_requests} requests abandoned")
            break
        
        print(f"Draining: {self.active_requests} requests remaining "
              f"({timeout - elapsed:.0f}s left)")
        await asyncio.sleep(0.5)
    
    print("All connections drained")

Request Tracking

Track active requests with a counter:

@app.middleware("http")
async def track_requests(request, call_next):
    if server.shutting_down:
        return Response(status_code=503, content="Service shutting down")
    
    server.active_requests += 1
    try:
        response = await call_next(request)
        return response
    finally:
        server.active_requests -= 1

Kubernetes-Specific Patterns

The preStop Hook

Kubernetes sends SIGTERM and updates the Endpoints object concurrently, not sequentially. This means traffic can arrive after SIGTERM. The preStop hook adds a delay:

lifecycle:
  preStop:
    exec:
      command: ["sleep", "5"]
terminationGracePeriodSeconds: 35

The sequence becomes:

Pod marked for termination
preStop hook runs (5s sleep)
During sleep, kube-proxy removes the pod from service endpoints
SIGTERM sent to the container
Application drains connections (up to 30s)
If still running after 35s total, SIGKILL

Setting `terminationGracePeriodSeconds`

This is the total budget: preStop delay + drain time + cleanup time. Set it higher than the sum:

preStop:          5s
Drain timeout:   25s
Cleanup:          3s
Total needed:    33s
Set to:          35s  (with buffer)

Background Workers and Queues

Long-running background jobs need different handling than HTTP requests:

class GracefulWorker:
    def run(self):
        while not self.shutting_down:
            job = self.queue.get(timeout=1)
            if job is None:
                continue
            
            try:
                self.process(job)
                self.queue.ack(job)
            except Exception:
                self.queue.nack(job)  # Return to queue for retry
        
        # Shutdown: stop pulling new jobs but finish current one
        print("Worker shutdown: no new jobs will be pulled")

The key principle: never acknowledge a job you have not completed. If SIGKILL arrives mid-processing, an unacknowledged job returns to the queue and gets picked up by another worker.

Database and Resource Cleanup

Close connections in reverse order of dependency:

async def _cleanup(self):
    # 1. Flush outgoing buffers
    await self.metrics_exporter.flush()
    await self.logger.flush()
    
    # 2. Close downstream clients
    await self.redis_pool.close()
    await self.http_session.close()
    
    # 3. Close database pool last (other cleanup may need it)
    await self.db_pool.close()
    
    print("All resources released")

Connection Pool Draining

Database connection pools may have outstanding queries. Close them with a timeout:

# SQLAlchemy
engine.dispose()

# asyncpg
await pool.close()  # Waits for in-flight queries

# Redis
await redis.close()
await redis.connection_pool.disconnect()

Testing Graceful Shutdown

Graceful shutdown is one of the hardest things to test because it only matters under load during a deployment.

Load Test + Rolling Restart

Start a sustained load test (100 RPS for 5 minutes)
Trigger a rolling deployment midway
Assert zero 5xx errors during the deployment window
Assert zero dropped connections in client logs

Signal Test

# Start the service
python server.py &
PID=$!

# Send requests in a loop
for i in $(seq 1 100); do
    curl -s http://localhost:8000/api/health &
done

# Send SIGTERM while requests are in flight
kill -TERM $PID

# Wait for exit
wait $PID
echo "Exit code: $?"  # Should be 0

Anti-Patterns

Anti-Pattern	Consequence	Fix
No signal handler	SIGKILL after timeout, dropped requests	Catch SIGTERM
Immediate socket close	Connection refused errors	Delay after health check flip
No drain timeout	Process hangs forever on stuck request	Always set a max drain duration
Acking jobs before completion	Lost work on crash	Ack after processing
Cleanup before drain	Resource errors in in-flight requests	Drain first, then cleanup

Graceful shutdown is not glamorous work. But it is the difference between deployments that require a maintenance window and deployments that nobody notices.