Service Mesh Deep Dive
Implement service mesh for secure, observable, and resilient microservice communication. Covers Istio, Linkerd, sidecar proxies, mTLS, traffic management, observability, and the patterns that manage the complexity of service-to-service communication.
A service mesh handles the networking between microservices so your application code does not have to. Instead of every service implementing its own retries, circuit breakers, mTLS, and observability, the mesh handles it transparently through sidecar proxies. Your code just makes HTTP calls; the mesh handles the rest.
Why Service Mesh
Without service mesh:
Every service must implement:
☐ Retry logic
☐ Circuit breakers
☐ Timeouts
☐ Load balancing
☐ mTLS certificates
☐ Request tracing
☐ Metrics collection
☐ Rate limiting
Result: Same code in 50 services, each slightly different
Bug in retry logic? Fix in 50 places.
With service mesh:
Sidecar proxy handles ALL of the above
Application code: simple HTTP call
Mesh handles: encryption, retries, routing, metrics
┌────────────────────────────────┐
│ Pod │
│ ┌──────────┐ ┌──────────────┐ │
│ │ Your App │→→│ Envoy Proxy │→→→ Network
│ │ (code) │ │ (sidecar) │ │
│ └──────────┘ └──────────────┘ │
└────────────────────────────────┘
App thinks it's calling localhost
Proxy intercepts and handles everything
Istio vs Linkerd
Istio:
Sidecar: Envoy proxy
Features: Most comprehensive
Complexity: High
Resource overhead: ~50MB per sidecar
Best for: Large enterprises, complex routing needs
Linkerd:
Sidecar: linkerd2-proxy (Rust)
Features: Core mesh features
Complexity: Lower
Resource overhead: ~10MB per sidecar
Best for: Teams wanting simplicity, lower overhead
Comparison:
Feature | Istio | Linkerd
---------------------|--------|--------
mTLS | ✓ | ✓
Traffic management | ✓✓✓ | ✓✓
Observability | ✓✓✓ | ✓✓
Multi-cluster | ✓✓ | ✓✓
Resource usage | Higher | Lower
Learning curve | Steep | Moderate
Traffic Management
# Canary deployment: Route 10% of traffic to v2
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: product-service
spec:
hosts:
- product-service
http:
- route:
- destination:
host: product-service
subset: v1
weight: 90
- destination:
host: product-service
subset: v2
weight: 10
retries:
attempts: 3
retryOn: "5xx,reset,connect-failure"
timeout: 5s
# Circuit breaker
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: product-service
spec:
host: product-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
h2UpgradePolicy: DEFAULT
http1MaxPendingRequests: 100
http2MaxRequests: 1000
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 60s
maxEjectionPercent: 50
mTLS (Mutual TLS)
# Enforce mTLS for all services in namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: STRICT # All traffic must be encrypted
# Istio automatically:
# 1. Provisions certificates for every pod (SPIFFE identity)
# 2. Rotates certificates (24-hour default)
# 3. Encrypts all pod-to-pod traffic
# 4. Verifies identity on both ends
#
# Your code doesn't change at all.
# Service A calls Service B on HTTP.
# Sidecar intercepts, encrypts with mTLS, decrypts on other side.
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Service mesh for 3 services | Overhead not justified | Mesh at 10+ services |
| No resource limits on sidecars | Sidecar memory leak affects app | Set sidecar resource limits |
| Mesh replaces all application logic | Over-reliance on mesh | Mesh for infra concerns, app logic in code |
| No gradual rollout | Mesh breaks all services at once | Namespace-by-namespace adoption |
| Ignoring sidecar latency | P99 latency spikes | Measure and tune proxy resources |
A service mesh is infrastructure, not magic. It solves networking concerns so your services do not have to — but it adds operational complexity that must be justified by the scale of your microservice architecture.