Infrastructure Testing: Validating Before You Deploy

Infrastructure code deserves the same testing rigor as application code. A misconfigured security group, an incorrect IAM policy, or a missing resource tag can cause outages, security breaches, or compliance violations. Testing infrastructure changes before deployment catches these issues when the cost of fixing them is minutes, not hours.

The Testing Pyramid for Infrastructure

          ╱╲
         ╱  ╲       Chaos / Integration Tests
        ╱    ╲      (real cloud resources, slow, expensive)
       ╱──────╲
      ╱        ╲    Policy Tests
     ╱          ╲   (OPA/Sentinel, fast, comprehensive)
    ╱────────────╲
   ╱              ╲  Static Analysis / Linting
  ╱________________╲ (tflint, checkov, fastest)

Layer 1: Static Analysis

Catch syntax errors, deprecated resources, and security misconfigurations without deploying anything:

# Terraform linting
tflint --recursive

# Security scanning
checkov -d .
trivy config .

# Format validation
terraform fmt -check -recursive

Layer 2: Policy-as-Code

Enforce organizational rules before apply:

# OPA policy: All S3 buckets must have encryption
deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_s3_bucket"
    not resource.change.after.server_side_encryption_configuration
    msg := sprintf("S3 bucket '%s' must have encryption enabled", [resource.name])
}

# All resources must have required tags
deny[msg] {
    resource := input.resource_changes[_]
    required_tags := {"Environment", "Team", "CostCenter"}
    provided_tags := {tag | resource.change.after.tags[tag]}
    missing := required_tags - provided_tags
    count(missing) > 0
    msg := sprintf("Resource '%s' missing tags: %v", [resource.name, missing])
}

Layer 3: Integration Tests

Deploy to ephemeral environments and validate:

// Terratest example
func TestVPCModule(t *testing.T) {
    terraformOptions := &terraform.Options{
        TerraformDir: "../modules/vpc",
        Vars: map[string]interface{}{
            "environment": "test",
            "cidr_block":  "10.99.0.0/16",
        },
    }
    
    defer terraform.Destroy(t, terraformOptions)
    terraform.InitAndApply(t, terraformOptions)
    
    vpcId := terraform.Output(t, terraformOptions, "vpc_id")
    assert.NotEmpty(t, vpcId)
    
    subnets := terraform.OutputList(t, terraformOptions, "subnet_ids")
    assert.Equal(t, 3, len(subnets))
}

Terraform Plan Analysis

Automated Plan Review

import json

def analyze_plan(plan_file):
    with open(plan_file) as f:
        plan = json.load(f)
    
    changes = plan['resource_changes']
    
    creates = [c for c in changes if 'create' in c['change']['actions']]
    updates = [c for c in changes if 'update' in c['change']['actions']]
    deletes = [c for c in changes if 'delete' in c['change']['actions']]
    
    # Alert on destructive changes
    if deletes:
        print(f"WARNING: {len(deletes)} resources will be DESTROYED")
        for d in deletes:
            print(f"  - {d['type']}.{d['name']}")
    
    # Alert on replacement (destroy + create)
    replaces = [c for c in changes if 'delete' in c['change']['actions'] 
                and 'create' in c['change']['actions']]
    if replaces:
        print(f"CRITICAL: {len(replaces)} resources will be REPLACED")

CI Pipeline Integration

jobs:
  plan:
    steps:
      - run: terraform init
      - run: terraform plan -out=tfplan
      - run: terraform show -json tfplan > plan.json
      - run: python analyze_plan.py plan.json
      - run: conftest test plan.json --policy policies/
      - run: checkov -f plan.json

Drift Detection

Infrastructure drift occurs when real-world resources differ from the Terraform state:

# Detect drift
terraform plan -detailed-exitcode
# Exit code 0: No changes
# Exit code 1: Error
# Exit code 2: Changes detected (drift)

Automated Drift Monitoring

# Run daily via cron
- name: Drift Detection
  schedule: "0 6 * * *"
  steps:
    - run: terraform plan -detailed-exitcode
    - if: exit_code == 2
      run: |
        terraform plan -no-color > drift-report.txt
        notify_slack "Infrastructure drift detected" drift-report.txt

Anti-Patterns

Anti-Pattern	Consequence	Fix
No plan review	Unexpected destructive changes	Automated plan analysis in CI
Manual infrastructure changes	State drift, “who changed this?”	All changes through IaC + CI
No policy enforcement	Security/compliance violations	OPA/Sentinel policies in pipeline
Testing in production only	Outages from untested changes	Ephemeral test environments
No drift detection	Reality diverges from code	Daily drift scans with alerting

Infrastructure testing is not optional. Every terraform apply without prior testing is a deployment to production without tests — and the blast radius of infrastructure changes is typically larger than application changes.