Multi-Cloud Architecture
Design systems that run across multiple cloud providers to avoid vendor lock-in, improve resilience, and optimize costs. Covers abstraction layers, data sovereignty, multi-cloud networking, and the real-world trade-offs of multi-cloud strategies.
Multi-cloud means running workloads across two or more cloud providers — AWS, GCP, Azure, or others. It promises freedom from vendor lock-in, improved resilience, and the ability to use best-of-breed services. The reality is more nuanced: multi-cloud adds operational complexity, and the benefits must outweigh the costs.
Multi-Cloud Strategies
Active-Active
AWS (us-east-1): 50% traffic
├── Order Service
├── Payment Service
└── PostgreSQL (primary)
GCP (us-central1): 50% traffic
├── Order Service
├── Payment Service
└── PostgreSQL (replica)
Global Load Balancer → Route by latency/geography
Benefit: Provider outage affects only 50% of traffic. Cost: 2x operational complexity, cross-cloud data sync.
Active-Passive
AWS (primary): 100% traffic
└── Full application stack
GCP (standby): 0% traffic (warm standby)
└── Data replicated, services ready to activate
Failover: DNS switch to GCP on AWS outage
Benefit: DR protection without daily multi-cloud complexity. Cost: Standby infrastructure cost, failover testing required.
Best-of-Breed
AWS: Core application (EC2, RDS, ECS)
GCP: ML/AI workloads (Vertex AI, BigQuery)
Cloudflare: Edge/CDN (Workers, R2)
Each cloud for what it does best.
Benefit: Optimize each workload for the best platform. Cost: Multiple billing, multiple expertise requirements.
Abstraction Layers
Infrastructure as Code
# Terraform with multi-cloud modules
module "kubernetes" {
source = var.cloud_provider == "aws" ? "./modules/eks" : "./modules/gke"
cluster_name = "production"
node_count = 5
node_type = var.instance_type
kubernetes_version = "1.28"
}
# Kubernetes workloads are cloud-agnostic
resource "kubernetes_deployment" "order_service" {
metadata { name = "order-service" }
spec {
replicas = 3
# Same deployment spec works on EKS or GKE
}
}
Container Orchestration
Kubernetes is the de facto multi-cloud abstraction:
Application → Kubernetes API → [EKS | GKE | AKS]
Same YAML manifests deploy to any cloud's managed Kubernetes.
Cloud-specific only: storage classes, load balancer annotations, IAM.
Data Strategy
The hardest part of multi-cloud is data:
Option 1: Data in one cloud, compute in multiple
+ Simple data management
- Cross-cloud latency and egress costs
Option 2: Data replicated across clouds
+ Low-latency access everywhere
- Replication lag, conflict resolution, 2x storage cost
Option 3: Data layer abstraction (CockroachDB, Spanner)
+ Transparent multi-cloud data
- Vendor-specific data layer, operational complexity
Cross-Cloud Networking
# Cloud interconnect options
networking:
aws_gcp:
type: "Cloud Interconnect + Direct Connect"
bandwidth: "10 Gbps"
latency: "5-10ms"
cost: "$0.02/GB transfer"
vpn_tunnel:
type: "Site-to-site VPN"
bandwidth: "1.25 Gbps per tunnel"
latency: "10-30ms"
cost: "$0.05/GB + hourly VPN cost"
Decision Framework
Should you go multi-cloud?
YES if:
✅ Regulatory requirement (data sovereignty)
✅ Proven vendor reliability concern
✅ Best-of-breed services matter significantly
✅ Organization has multi-cloud expertise
✅ Workload justifies the complexity
NO if:
❌ "Avoiding lock-in" is the only reason
❌ Team lacks operational expertise for one cloud
❌ Workload is small (< $100K/year cloud spend)
❌ Adding complexity without clear business benefit
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Multi-cloud to avoid lock-in | Locked into lowest common denominator | Use cloud-native services, manage portability risk |
| No abstraction layer | Cloud-specific code everywhere | Kubernetes + Terraform + cloud-neutral data layer |
| Ignoring egress costs | Cross-cloud transfer costs explode | Minimize cross-cloud data movement |
| Same architecture on all clouds | Suboptimal use of each platform | Optimize per cloud, abstract at orchestration layer |
| No DR testing | Failover does not work when needed | Monthly failover drills |
Multi-cloud is a strategy, not a goal. The goal is resilience, cost optimization, or regulatory compliance. If a single cloud achieves those goals with less complexity, that is the better choice.