ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Container Networking Deep Dive: From Pods to Service Mesh

Understand container networking from first principles — how pods communicate, how Kubernetes services route traffic, how network policies enforce security, and when a service mesh is worth the operational complexity.

Container networking is the plumbing that connects your microservices. When a pod in namespace A sends a request to a pod in namespace B, the packet traverses virtual interfaces, bridge networks, overlay protocols, and iptables rules — all invisible until something breaks. Understanding these layers is the difference between debugging a network issue in minutes and debugging it for days.


Pod Networking Fundamentals

Every pod in Kubernetes gets its own IP address. Containers within a pod share that IP and communicate via localhost. This simplifies networking: no port mapping, no NAT, no container-to-container port conflicts.

The Container Network Interface (CNI)

The CNI plugin is responsible for assigning IPs to pods and configuring routes:

Popular CNI plugins:
  Calico:     Network policies + BGP routing (most popular in production)
  Cilium:     eBPF-based networking + observability
  Flannel:    Simple overlay networking (VXLAN)
  AWS VPC CNI: Native AWS VPC IPs for pods
  Weave:      Mesh overlay with encryption

How Pod-to-Pod Communication Works

  1. Pod A sends a packet to Pod B’s IP
  2. The packet hits the virtual ethernet (veth) pair connecting the pod to the host
  3. The host routes the packet based on the CNI plugin’s routing table
  4. If Pod B is on a different node, the packet is encapsulated (overlay) or routed (native)
  5. The destination node decapsulates and delivers to Pod B
Pod A (10.244.1.5) → veth → Node 1 bridge → overlay/route → Node 2 bridge → veth → Pod B (10.244.2.8)

Kubernetes Services

Services provide stable endpoints for pod groups that scale and restart frequently:

ClusterIP (Internal)

apiVersion: v1
kind: Service
metadata:
  name: order-service
spec:
  type: ClusterIP
  selector:
    app: order-service
  ports:
    - port: 80
      targetPort: 8080

kube-proxy programs iptables (or IPVS) rules that load-balance traffic across all matching pods. The ClusterIP is a virtual IP — it only exists in the kernel’s routing tables.

NodePort (External via Node)

Exposes the service on every node’s IP at a static port (30000-32767). Useful for development; rarely used in production.

LoadBalancer (Cloud Provider)

Provisions a cloud load balancer (ALB, NLB, GCP LB) that routes external traffic to the service.

Headless Services

spec:
  clusterIP: None

No load balancing — DNS returns individual pod IPs. Used for stateful workloads (databases, Kafka) that need direct pod addressing.


DNS in Kubernetes

CoreDNS resolves service names to ClusterIPs:

order-service                          → 10.96.0.15 (same namespace)
order-service.payments                 → 10.96.0.15 (cross-namespace)
order-service.payments.svc.cluster.local → 10.96.0.15 (fully qualified)

DNS Debugging

# From inside a pod
nslookup order-service.payments.svc.cluster.local

# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns

# Common issues:
# - Pod DNS policy not set correctly
# - CoreDNS out of memory (increase resources)
# - ndots:5 default causing excessive DNS queries

ndots Optimization

Kubernetes defaults to ndots:5, meaning any name with fewer than 5 dots gets search domain suffixes appended. A lookup for api.stripe.com generates:

api.stripe.com.payments.svc.cluster.local  → NXDOMAIN
api.stripe.com.svc.cluster.local           → NXDOMAIN
api.stripe.com.cluster.local               → NXDOMAIN
api.stripe.com                             → resolved

Four DNS queries instead of one. Fix with:

spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"

Network Policies

By default, all pods can communicate with all other pods. Network policies restrict this:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: order-service-policy
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: order-service
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: api-gateway
        - podSelector:
            matchLabels:
              app: checkout
      ports:
        - port: 8080
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - port: 5432
    - to:  # Allow DNS
        - namespaceSelector: {}
      ports:
        - port: 53
          protocol: UDP

This policy says: the order service can only receive traffic from the api-gateway namespace or the checkout app, and can only send traffic to postgres on port 5432 (plus DNS).

Critical: Many CNI plugins do not enforce network policies by default. Verify with your CNI documentation.


Service Mesh

A service mesh adds a sidecar proxy (Envoy, Linkerd-proxy) to every pod that intercepts all network traffic:

Pod A                    Pod B
┌─────────────┐          ┌─────────────┐
│ App Container│          │ App Container│
│     ↕       │          │     ↕       │
│ Sidecar Proxy│ ──────▶ │ Sidecar Proxy│
└─────────────┘          └─────────────┘

What a Service Mesh Provides

  • mTLS everywhere: Automatic encryption and authentication between all services
  • Traffic management: Canary deployments, traffic splitting, retries, timeouts
  • Observability: Request-level metrics, distributed tracing, access logs
  • Circuit breaking: Per-service failure thresholds

When You Need a Service Mesh

ScenarioService Mesh?
5 services, single teamNo — too much overhead
20+ services, multiple teamsMaybe — evaluate complexity budget
Regulatory requirement for mTLSYes — mesh provides this automatically
Need canary deploymentsMaybe — Argo Rollouts may suffice
Zero-trust networking requirementYes — mutual TLS is a core feature

When You Do Not

A service mesh adds 10-15% latency overhead, increases memory usage (each sidecar consumes 50-100 MB), and significantly increases operational complexity. If you do not need the features, the cost is not justified.


Troubleshooting

Common Network Issues

SymptomLikely CauseDebug Command
Pod cannot reach another podCNI misconfiguration, network policykubectl exec -it pod -- ping <IP>
Service name does not resolveCoreDNS issue, wrong namespacekubectl exec -it pod -- nslookup svc-name
Intermittent timeoutsConntrack table fullconntrack -S on the node
High DNS latencyndots too high, CoreDNS overloadedReduce ndots, scale CoreDNS
Cross-node pod communication failsOverlay network issue (VXLAN/Calico)Check node-to-node connectivity on overlay port

Anti-Patterns

Anti-PatternConsequenceFix
No network policiesAny compromised pod can reach everythingDefault deny + explicit allow
Hardcoded pod IPsBreaks on pod restartUse service names
Service mesh for 3 servicesMassive overhead for minimal benefitRe-evaluate at 15+ services
Ignoring DNS performance4x DNS queries per external lookupSet ndots:2, enable DNS caching
No egress controlsPods can reach the internet freelyEgress network policies + NAT gateway logging

Container networking is layers of abstraction built on top of Linux networking primitives. When it works, it is invisible. When it breaks, understanding the layers — from veth pairs to iptables rules to CNI plugins — is what gets you from “it’s not working” to “I know exactly why” in minutes instead of hours.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →