Infrastructure as Code: Terraform Patterns That Scale

Terraform starts simple: write some HCL, run terraform apply, infrastructure appears. At 50 resources, it is manageable. At 500 resources, it is a single state file that takes 10 minutes to plan, where one developer’s change blocks everyone else, and nobody is sure which resources are managed by Terraform and which were created manually in the console.

This guide covers how to structure Terraform for organizations where multiple teams need to manage infrastructure without creating a central bottleneck or an unmaintainable monolith.

State Management: The Foundation

State is the core concept of Terraform. Get it wrong and everything else falls apart.

Remote State Configuration

# backend.tf — Always use remote state in production
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "services/checkout-api/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-lock"    # Prevents concurrent applies
    encrypt        = true
  }
}

State Splitting Strategy

Strategy	When to Use	Risk
Monolithic (one state)	< 50 resources, single team	Slow plans, single point of failure
Per-environment	Multiple environments (staging/prod)	Still large per-environment
Per-service	Microservices, team ownership	More state files to manage
Per-layer	Separate network, compute, data	Cross-layer dependencies

Recommended: Per-service + Per-environment

terraform/
├── modules/                        # Shared, reusable modules
│   ├── vpc/
│   ├── ecs-service/
│   └── rds/
├── environments/
│   ├── staging/
│   │   ├── networking/             # VPC, subnets, NAT
│   │   │   ├── main.tf
│   │   │   └── terraform.tfstate   # Separate state
│   │   ├── checkout-api/           # Service infrastructure
│   │   │   ├── main.tf
│   │   │   └── terraform.tfstate   # Separate state
│   │   └── shared-database/
│   │       ├── main.tf
│   │       └── terraform.tfstate
│   └── production/
│       ├── networking/
│       ├── checkout-api/
│       └── shared-database/

Module Design

Good modules are reusable, versioned, and have clear interfaces.

Module Interface Rules

# modules/ecs-service/variables.tf

variable "service_name" {
  description = "Name of the ECS service"
  type        = string

  validation {
    condition     = can(regex("^[a-z][a-z0-9-]+$", var.service_name))
    error_message = "Service name must be lowercase alphanumeric with hyphens."
  }
}

variable "container_image" {
  description = "Docker image URI (e.g., 123456789.dkr.ecr.us-east-1.amazonaws.com/my-app:v1.2.3)"
  type        = string
}

variable "cpu" {
  description = "CPU units (1 vCPU = 1024 units)"
  type        = number
  default     = 256
}

variable "memory" {
  description = "Memory in MB"
  type        = number
  default     = 512
}

variable "environment" {
  description = "Environment name (staging, production)"
  type        = string

  validation {
    condition     = contains(["staging", "production"], var.environment)
    error_message = "Environment must be 'staging' or 'production'."
  }
}

variable "tags" {
  description = "Tags to apply to all resources"
  type        = map(string)
  default     = {}
}

Module Versioning

# Pin module versions — never use unversioned source references

# ✅ Good: pinned version
module "checkout_api" {
  source  = "git::https://github.com/company/terraform-modules.git//ecs-service?ref=v2.3.1"

  service_name    = "checkout-api"
  container_image = "123456789.dkr.ecr.us-east-1.amazonaws.com/checkout:v1.5.0"
  environment     = "production"
}

# ❌ Bad: unpinned (uses latest, breaks without warning)
module "checkout_api" {
  source = "git::https://github.com/company/terraform-modules.git//ecs-service"
}

Terraform CI/CD Integration

Never run terraform apply from a laptop in production. All infrastructure changes should flow through CI/CD.

# GitHub Actions: Terraform CI/CD
name: Terraform
on:
  pull_request:
    paths: ['terraform/**']
  push:
    branches: [main]
    paths: ['terraform/**']

jobs:
  plan:
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4

      - name: Terraform Init
        run: terraform init
        working-directory: terraform/environments/production/checkout-api

      - name: Terraform Plan
        run: terraform plan -out=tfplan -no-color
        working-directory: terraform/environments/production/checkout-api

      - name: Post plan to PR
        uses: actions/github-script@v7
        with:
          script: |
            const plan = require('fs').readFileSync('terraform/tfplan.txt', 'utf8');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              body: `## Terraform Plan\n\`\`\`\n${plan}\n\`\`\``
            });

  apply:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    steps:
      - uses: actions/checkout@v4
      - name: Terraform Init
        run: terraform init
      - name: Terraform Apply
        run: terraform apply -auto-approve

Drift Detection

Infrastructure drift happens when someone changes a resource outside of Terraform — through the cloud console, CLI, or another tool. If you do not detect drift, your Terraform state becomes a lie.

# Run plan regularly to detect drift
# Schedule this in CI (daily or on-commit)
terraform plan -detailed-exitcode

# Exit codes:
#   0 = No changes (state matches reality)
#   1 = Error
#   2 = Changes detected (DRIFT!)

Drift Response	When to Use
Auto-remediate	Non-critical resources, well-tested modules
Alert and investigate	Production resources, security-sensitive configs
Import into state	Legitimate change made outside Terraform
Ignore	Never. Drift always gets worse.

Common Mistakes

Mistake	Consequence	Prevention
Hardcoded values	Cannot reuse across environments	Use variables with validation
No state locking	Concurrent applies corrupt state	DynamoDB lock table (AWS), GCS lock (GCP)
Giant state files	10-minute plans, blast radius of entire infrastructure	Split state per-service or per-layer
No module versioning	Module changes break consumers	Pin versions, use semver
Manual console changes	Drift between state and reality	Run daily drift detection
Secrets in `.tf` files	Credentials in version control	Use `data` sources, SSM Parameter Store, Vault

State Management: The Foundation

Remote State Configuration

State Splitting Strategy

Module Design

Module Interface Rules

Module Versioning

Terraform CI/CD Integration

Drift Detection

Common Mistakes

Implementation Checklist

More in Automation

Ansible for Infrastructure Automation: Playbooks That Do Not Break at 3 AM

Automated Dependency Updates

Automated Change Management Workflow