Serverless Container Architecture
Run containers without managing infrastructure using serverless container platforms. Covers AWS Fargate, Google Cloud Run, Azure Container Apps, cold start optimization, scaling patterns, cost comparison, and when serverless containers beat Kubernetes.
Serverless containers give you the packaging benefits of containers (consistent runtime, dependency isolation) without server management (no nodes, no patching, no capacity planning). You push a container image, the platform runs it, and you pay per request or per second of execution.
Platform Comparison
| Feature | AWS Fargate | Cloud Run | Azure Container Apps |
|---|---|---|---|
| Max memory | 120 GB | 32 GB | 4 GB |
| Max vCPU | 16 | 8 | 4 |
| Max timeout | 24 hours | 60 min (HTTP), 24h (jobs) | 30 min |
| Scale to zero | Yes (ECS) | Yes | Yes |
| Min instances | 0 | 0 | 0 |
| GPU support | No | Yes (L4, A100) | No |
| Cold start | 10-30s | 1-5s | 5-15s |
| Pricing | Per vCPU-second + memory | Per request + vCPU-second | Per vCPU-second |
Cloud Run
# Deploy with minimal config
# gcloud run deploy order-service \
# --image gcr.io/project/order-service:v1.2.3 \
# --region us-central1 \
# --memory 512Mi \
# --cpu 1 \
# --min-instances 1 \
# --max-instances 100 \
# --concurrency 80 \
# --timeout 30s \
# --set-env-vars "DB_HOST=10.0.0.5,CACHE_HOST=10.0.0.6"
# Cloud Run service YAML
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: order-service
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "1"
autoscaling.knative.dev/maxScale: "100"
run.googleapis.com/cpu-throttling: "false"
spec:
containerConcurrency: 80
timeoutSeconds: 30
containers:
- image: gcr.io/project/order-service:v1.2.3
resources:
limits:
memory: 512Mi
cpu: "1"
ports:
- containerPort: 8080
startupProbe:
httpGet:
path: /healthz
initialDelaySeconds: 0
periodSeconds: 1
failureThreshold: 30
Cold Start Optimization
Cold start timeline:
1. Pull container image (1-10s depending on size)
2. Start container runtime (0.5-2s)
3. Application initialization (0.5-30s)
4. Ready to serve traffic
Optimization strategies:
Image size:
FROM node:20-alpine # 150MB vs 1GB (node:20)
Multi-stage builds # Only runtime artifacts
Distroless base images # Minimal attack surface + size
Startup time:
Lazy initialization # Connect to DB on first request
Connection pooling # Pre-warm connections
Avoid heavy framework startup # Prefer lightweight frameworks
Pre-compile/cache # Avoid JIT compilation on start
Min instances:
Keep 1-3 instances warm # Zero cold starts for baseline traffic
Costs more but eliminates p99 latency spikes
When Serverless Containers vs Kubernetes
Choose serverless containers when:
✅ Traffic is spiky or unpredictable
✅ Team is small (< 5 engineers)
✅ No need for custom scheduling
✅ Simple request-response workloads
✅ Cost optimization matters (scale to zero)
✅ No Kubernetes expertise in house
Choose Kubernetes when:
✅ Persistent workloads (always running)
✅ Complex networking (service mesh)
✅ Custom scheduling requirements
✅ GPU/specialized hardware
✅ Compliance requirements (control plane)
✅ Large team with Kubernetes expertise
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Large container images (2GB+) | Slow cold starts (10-30s) | Multi-stage builds, Alpine/distroless |
| Heavy startup initialization | Cold start latency spike | Lazy init, min instances |
| Stateful in-memory data | Lost on scale-down | External state (Redis, database) |
| Long-running background jobs | Timeout kills process | Use job/task mode, not HTTP mode |
| No concurrency tuning | Under-utilized instances | Set concurrency to match app capacity |
Serverless containers are the sweet spot between serverless functions (limited runtime) and Kubernetes (operational overhead). You get container flexibility with serverless simplicity.