Infrastructure Testing: Validating Before You Deploy
Test infrastructure changes before they reach production. Covers Terraform plan analysis, policy-as-code with OPA, integration testing for IaC, chaos engineering for infrastructure, and building confidence in infrastructure changes.
Infrastructure code deserves the same testing rigor as application code. A misconfigured security group, an incorrect IAM policy, or a missing resource tag can cause outages, security breaches, or compliance violations. Testing infrastructure changes before deployment catches these issues when the cost of fixing them is minutes, not hours.
The Testing Pyramid for Infrastructure
╱╲
╱ ╲ Chaos / Integration Tests
╱ ╲ (real cloud resources, slow, expensive)
╱──────╲
╱ ╲ Policy Tests
╱ ╲ (OPA/Sentinel, fast, comprehensive)
╱────────────╲
╱ ╲ Static Analysis / Linting
╱________________╲ (tflint, checkov, fastest)
Layer 1: Static Analysis
Catch syntax errors, deprecated resources, and security misconfigurations without deploying anything:
# Terraform linting
tflint --recursive
# Security scanning
checkov -d .
trivy config .
# Format validation
terraform fmt -check -recursive
Layer 2: Policy-as-Code
Enforce organizational rules before apply:
# OPA policy: All S3 buckets must have encryption
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
not resource.change.after.server_side_encryption_configuration
msg := sprintf("S3 bucket '%s' must have encryption enabled", [resource.name])
}
# All resources must have required tags
deny[msg] {
resource := input.resource_changes[_]
required_tags := {"Environment", "Team", "CostCenter"}
provided_tags := {tag | resource.change.after.tags[tag]}
missing := required_tags - provided_tags
count(missing) > 0
msg := sprintf("Resource '%s' missing tags: %v", [resource.name, missing])
}
Layer 3: Integration Tests
Deploy to ephemeral environments and validate:
// Terratest example
func TestVPCModule(t *testing.T) {
terraformOptions := &terraform.Options{
TerraformDir: "../modules/vpc",
Vars: map[string]interface{}{
"environment": "test",
"cidr_block": "10.99.0.0/16",
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
vpcId := terraform.Output(t, terraformOptions, "vpc_id")
assert.NotEmpty(t, vpcId)
subnets := terraform.OutputList(t, terraformOptions, "subnet_ids")
assert.Equal(t, 3, len(subnets))
}
Terraform Plan Analysis
Automated Plan Review
import json
def analyze_plan(plan_file):
with open(plan_file) as f:
plan = json.load(f)
changes = plan['resource_changes']
creates = [c for c in changes if 'create' in c['change']['actions']]
updates = [c for c in changes if 'update' in c['change']['actions']]
deletes = [c for c in changes if 'delete' in c['change']['actions']]
# Alert on destructive changes
if deletes:
print(f"WARNING: {len(deletes)} resources will be DESTROYED")
for d in deletes:
print(f" - {d['type']}.{d['name']}")
# Alert on replacement (destroy + create)
replaces = [c for c in changes if 'delete' in c['change']['actions']
and 'create' in c['change']['actions']]
if replaces:
print(f"CRITICAL: {len(replaces)} resources will be REPLACED")
CI Pipeline Integration
jobs:
plan:
steps:
- run: terraform init
- run: terraform plan -out=tfplan
- run: terraform show -json tfplan > plan.json
- run: python analyze_plan.py plan.json
- run: conftest test plan.json --policy policies/
- run: checkov -f plan.json
Drift Detection
Infrastructure drift occurs when real-world resources differ from the Terraform state:
# Detect drift
terraform plan -detailed-exitcode
# Exit code 0: No changes
# Exit code 1: Error
# Exit code 2: Changes detected (drift)
Automated Drift Monitoring
# Run daily via cron
- name: Drift Detection
schedule: "0 6 * * *"
steps:
- run: terraform plan -detailed-exitcode
- if: exit_code == 2
run: |
terraform plan -no-color > drift-report.txt
notify_slack "Infrastructure drift detected" drift-report.txt
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| No plan review | Unexpected destructive changes | Automated plan analysis in CI |
| Manual infrastructure changes | State drift, “who changed this?” | All changes through IaC + CI |
| No policy enforcement | Security/compliance violations | OPA/Sentinel policies in pipeline |
| Testing in production only | Outages from untested changes | Ephemeral test environments |
| No drift detection | Reality diverges from code | Daily drift scans with alerting |
Infrastructure testing is not optional. Every terraform apply without prior testing is a deployment to production without tests — and the blast radius of infrastructure changes is typically larger than application changes.