ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Infrastructure as Code Testing Strategies

How to test infrastructure as code before deployment. Covers unit testing Terraform, policy-as-code with OPA, integration testing, drift detection, and CI/CD for infrastructure.

Infrastructure as Code (IaC) changed how we provision infrastructure. But most teams still deploy IaC changes without testing — they run terraform plan, eyeball the diff, and hope for the best. This is the equivalent of deploying application code without running tests. At scale, untested infrastructure changes cause outages that are harder to diagnose and longer to resolve than application bugs.

Testing IaC requires different strategies than testing application code. You can’t unit test a VPC the same way you unit test a function. But you can validate configurations, simulate plans, enforce policies, and integration test in ephemeral environments before anything touches production.


The IaC Testing Pyramid

LayerWhat It TestsSpeedCost
Static AnalysisSyntax, formatting, security rulesSecondsFree
Unit TestsModule logic, variable validationSecondsFree
Policy TestsCompliance rules, guardrailsSecondsFree
Plan TestsExpected resource changesMinutesFree
Integration TestsActual infrastructure behavior10-30 minCloud costs
E2E TestsFull stack deployment30-60 minCloud costs

Layer 1: Static Analysis

Run these on every commit. They catch 40% of issues before any cloud API is called.

# Terraform
terraform fmt -check -recursive
terraform validate
tflint --recursive

# Security scanning
tfsec .
checkov -d .

These tools catch:

  • Insecure defaults (S3 bucket without encryption, security group open to 0.0.0.0/0)
  • Syntax errors and deprecated features
  • Missing required tags
  • Resource naming convention violations

Layer 2: Policy-as-Code with OPA

Open Policy Agent (OPA) lets you write compliance rules that block non-compliant infrastructure before it’s created.

# policy/security.rego

# Deny public S3 buckets
deny[msg] {
    resource := input.planned_values.root_module.resources[_]
    resource.type == "aws_s3_bucket"
    resource.values.acl == "public-read"
    msg := sprintf("S3 bucket '%s' cannot be public", [resource.address])
}

# Require encryption on all RDS instances
deny[msg] {
    resource := input.planned_values.root_module.resources[_]
    resource.type == "aws_db_instance"
    not resource.values.storage_encrypted
    msg := sprintf("RDS instance '%s' must have encryption enabled", [resource.address])
}

# Enforce instance size limits
deny[msg] {
    resource := input.planned_values.root_module.resources[_]
    resource.type == "aws_instance"
    allowed := {"t3.micro", "t3.small", "t3.medium", "t3.large", "t3.xlarge"}
    not allowed[resource.values.instance_type]
    msg := sprintf("Instance '%s' uses disallowed type '%s'", 
        [resource.address, resource.values.instance_type])
}

Run OPA against terraform plan -out=plan.json && terraform show -json plan.json:

opa eval --data policy/ --input plan.json "data.terraform.deny[msg]"

Layer 3: Integration Testing

For critical infrastructure, spin up real resources in an isolated test account, validate they work, then tear everything down.

# test_network.py (using Terratest patterns)
import pytest
import subprocess
import json

class TestNetworkModule:
    @pytest.fixture(autouse=True)
    def setup_teardown(self, tmp_path):
        # Apply infrastructure
        subprocess.run(["terraform", "init"], cwd="modules/network", check=True)
        subprocess.run(["terraform", "apply", "-auto-approve", 
                        f"-var=env=test-{uuid4().hex[:8]}"], 
                       cwd="modules/network", check=True)
        
        yield
        
        # Destroy after test
        subprocess.run(["terraform", "destroy", "-auto-approve"],
                       cwd="modules/network", check=True)
    
    def test_vpc_has_correct_cidr(self):
        output = subprocess.run(
            ["terraform", "output", "-json"],
            cwd="modules/network", capture_output=True, text=True
        )
        outputs = json.loads(output.stdout)
        assert outputs["vpc_cidr"]["value"] == "10.0.0.0/16"
    
    def test_private_subnets_not_publicly_accessible(self):
        # Use AWS SDK to verify subnet routing
        pass

Cost control: Integration tests run in a dedicated test account with aggressive auto-cleanup. Set a maximum test duration (30 minutes) with automatic terraform destroy on timeout.


Drift Detection

Infrastructure drift — when reality diverges from your IaC definitions — is the silent killer. Common causes: manual console changes, out-of-band scripts, and cloud provider auto-updates.

Detection Protocol:

# Run daily via CI/CD
terraform plan -detailed-exitcode
# Exit code 0: No changes (in sync)
# Exit code 1: Error
# Exit code 2: Changes detected (drift!)

When drift is detected:

  1. Alert the infrastructure team immediately
  2. Determine if drift is intentional (emergency fix) or accidental
  3. Either update IaC to match reality or revert the drift
  4. Document the root cause and prevent recurrence

CI/CD Pipeline for Infrastructure

Commit → Lint → Validate → Policy Check → Plan → Review → Apply → Verify

Non-negotiable rules:

  1. plan output must be reviewed by a human before apply (for production)
  2. Policy check failures block the pipeline (no exceptions)
  3. Apply to staging before production (always)
  4. Keep plan and apply in the same pipeline run (prevent plan staleness)
  5. Lock state files during applies (prevent concurrent modifications)

The teams that test their infrastructure as rigorously as their application code deploy with confidence. Everyone else deploys with anxiety — and anxiety scales poorly.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →