Cloud Networking: VPC Design & Connectivity
Design enterprise cloud networking. Covers VPC architecture, subnet strategies, transit gateways, hybrid connectivity, DNS, security groups, network segmentation, and multi-account networking.
Cloud networking mistakes are the most expensive to fix because they require re-architecting infrastructure that everything depends on. A VPC CIDR block that overlaps with your on-premises network means you can’t connect them. A flat network without segmentation means a compromised instance has lateral access to everything. This guide covers the networking patterns you need to get right the first time.
VPC Architecture
Multi-Account Network Topology
┌─────────────────────────────────────────────────────┐
│ Transit Gateway │
│ (Central hub for all VPC-to-VPC and hybrid traffic) │
└─────────────────┬───────────────────────────────────┘
│
┌─────────────┼─────────────┬──────────────┐
│ │ │ │
┌───┴───┐ ┌───┴───┐ ┌───┴───┐ ┌──────┴──────┐
│Shared │ │ Prod │ │ Dev │ │ On-Premises │
│Services│ │ VPC │ │ VPC │ │ (VPN/DX) │
│ VPC │ │ │ │ │ │ │
├────────┤ ├───────┤ ├───────┤ └─────────────┘
│DNS │ │App │ │App │
│Logging │ │tier │ │tier │
│CI/CD │ │Data │ │Data │
│VPN │ │tier │ │tier │
└────────┘ └───────┘ └───────┘
CIDR Planning
# Enterprise CIDR allocation plan
network_plan:
supernet: "10.0.0.0/8"
regions:
us-east-1:
range: "10.0.0.0/12" # 1M addresses
accounts:
shared-services: "10.0.0.0/16"
production: "10.1.0.0/16"
staging: "10.2.0.0/16"
development: "10.3.0.0/16"
eu-west-1:
range: "10.16.0.0/12"
accounts:
shared-services: "10.16.0.0/16"
production: "10.17.0.0/16"
on_premises:
range: "172.16.0.0/12" # Non-overlapping with cloud
reserved:
future_expansion: "10.32.0.0/11"
Subnet Strategy
| Subnet Type | Purpose | CIDR Size | Route Table |
|---|---|---|---|
| Public | Load balancers, NAT gateways, bastions | /24 | IGW route |
| Private App | Application servers, containers | /20 | NAT GW route |
| Private Data | Databases, caches, queues | /22 | No internet route |
| Isolated | Sensitive workloads (PCI, HIPAA) | /24 | No routes out |
# VPC subnet layout (per AZ)
vpc:
cidr: "10.1.0.0/16"
availability_zones:
az-a:
public: "10.1.0.0/24" # 254 IPs
private_app: "10.1.16.0/20" # 4094 IPs
private_data: "10.1.32.0/22" # 1022 IPs
isolated: "10.1.36.0/24" # 254 IPs
az-b:
public: "10.1.1.0/24"
private_app: "10.1.48.0/20"
private_data: "10.1.64.0/22"
isolated: "10.1.68.0/24"
az-c:
public: "10.1.2.0/24"
private_app: "10.1.80.0/20"
private_data: "10.1.96.0/22"
isolated: "10.1.100.0/24"
Network Security
Security Group Design
# Layered security groups
security_groups:
web_tier:
ingress:
- port: 443
source: "0.0.0.0/0" # Public HTTPS
- port: 80
source: "0.0.0.0/0" # Public HTTP (redirect to HTTPS)
egress:
- port: 8080
destination: sg-app-tier
- port: 443
destination: "0.0.0.0/0" # External API calls
app_tier:
ingress:
- port: 8080
source: sg-web-tier # Only from web tier
- port: 8080
source: sg-internal-lb # Or from internal LB
egress:
- port: 5432
destination: sg-data-tier
- port: 6379
destination: sg-data-tier
- port: 443
destination: "0.0.0.0/0" # External services
data_tier:
ingress:
- port: 5432
source: sg-app-tier # Only from app tier
- port: 6379
source: sg-app-tier
egress:
- none # No outbound internet access
Hybrid Connectivity
| Option | Bandwidth | Latency | Cost | Setup Time |
|---|---|---|---|---|
| Site-to-Site VPN | Up to 1.25 Gbps | 10-50ms | Low ($0.05/hr) | Hours |
| AWS Direct Connect | 1-100 Gbps | 1-5ms | High (port + data) | 2-4 weeks |
| Azure ExpressRoute | 1-100 Gbps | 1-5ms | High (port + data) | 2-4 weeks |
| GCP Interconnect | 10-200 Gbps | 1-5ms | High (port + data) | 2-4 weeks |
DNS Architecture
┌──────────────────────────────────┐
│ Corporate DNS │
│ corp.example.com │
│ (on-prem Active Directory) │
└───────────────┬──────────────────┘
│ Conditional forwarding
↓
┌──────────────────────────────────┐
│ Cloud DNS │
│ aws.example.com │
│ (Route 53 Private Hosted Zones) │
│ │
│ ├── prod.aws.example.com │
│ ├── dev.aws.example.com │
│ └── shared.aws.example.com │
└──────────────────────────────────┘
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Overlapping CIDRs | Can’t connect VPCs or on-prem networks | Plan CIDR allocation centrally before provisioning |
| Flat network | No segmentation, lateral movement risk | Tier subnets: public, app, data, isolated |
| One big VPC | Everything in one blast radius | Multi-account, multi-VPC with transit gateway |
| Public subnets for everything | Unnecessary attack surface | Only load balancers and NAT gateways in public |
| No VPC endpoints | Traffic to S3/DynamoDB goes through NAT ($$) | VPC endpoints for AWS service access |
| IP address exhaustion | /24 subnets run out of IPs | Size subnets appropriately (/20 for app tier) |
Checklist
- CIDR planning: non-overlapping ranges across all environments
- Subnet strategy: public, private app, private data, isolated tiers
- Security groups: deny-by-default, explicit allow per tier
- Transit gateway: central hub for VPC-to-VPC routing
- VPC endpoints: S3, DynamoDB, SSM, ECR (avoid NAT costs)
- NAT gateway: HA (one per AZ) for private subnet internet access
- DNS: private hosted zones, conditional forwarding for hybrid
- Hybrid connectivity: VPN or Direct Connect to on-prem
- Network monitoring: VPC Flow Logs, Traffic Mirroring
- DDoS protection: Shield Advanced on public endpoints
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For cloud networking consulting, visit garnetgrid.com. :::