ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Infrastructure as Code: Terraform Patterns That Scale

Write Terraform code that multiple teams can maintain without stepping on each other. Covers module design, state management, workspace strategies, drift detection, CI/CD integration, and the organizational patterns that prevent Terraform from becoming a bottleneck.

Terraform starts simple: write some HCL, run terraform apply, infrastructure appears. At 50 resources, it is manageable. At 500 resources, it is a single state file that takes 10 minutes to plan, where one developer’s change blocks everyone else, and nobody is sure which resources are managed by Terraform and which were created manually in the console.

This guide covers how to structure Terraform for organizations where multiple teams need to manage infrastructure without creating a central bottleneck or an unmaintainable monolith.


State Management: The Foundation

State is the core concept of Terraform. Get it wrong and everything else falls apart.

Remote State Configuration

# backend.tf — Always use remote state in production
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "services/checkout-api/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-state-lock"    # Prevents concurrent applies
    encrypt        = true
  }
}

State Splitting Strategy

StrategyWhen to UseRisk
Monolithic (one state)< 50 resources, single teamSlow plans, single point of failure
Per-environmentMultiple environments (staging/prod)Still large per-environment
Per-serviceMicroservices, team ownershipMore state files to manage
Per-layerSeparate network, compute, dataCross-layer dependencies
Recommended: Per-service + Per-environment

terraform/
├── modules/                        # Shared, reusable modules
│   ├── vpc/
│   ├── ecs-service/
│   └── rds/
├── environments/
│   ├── staging/
│   │   ├── networking/             # VPC, subnets, NAT
│   │   │   ├── main.tf
│   │   │   └── terraform.tfstate   # Separate state
│   │   ├── checkout-api/           # Service infrastructure
│   │   │   ├── main.tf
│   │   │   └── terraform.tfstate   # Separate state
│   │   └── shared-database/
│   │       ├── main.tf
│   │       └── terraform.tfstate
│   └── production/
│       ├── networking/
│       ├── checkout-api/
│       └── shared-database/

Module Design

Good modules are reusable, versioned, and have clear interfaces.

Module Interface Rules

# modules/ecs-service/variables.tf

variable "service_name" {
  description = "Name of the ECS service"
  type        = string

  validation {
    condition     = can(regex("^[a-z][a-z0-9-]+$", var.service_name))
    error_message = "Service name must be lowercase alphanumeric with hyphens."
  }
}

variable "container_image" {
  description = "Docker image URI (e.g., 123456789.dkr.ecr.us-east-1.amazonaws.com/my-app:v1.2.3)"
  type        = string
}

variable "cpu" {
  description = "CPU units (1 vCPU = 1024 units)"
  type        = number
  default     = 256
}

variable "memory" {
  description = "Memory in MB"
  type        = number
  default     = 512
}

variable "environment" {
  description = "Environment name (staging, production)"
  type        = string

  validation {
    condition     = contains(["staging", "production"], var.environment)
    error_message = "Environment must be 'staging' or 'production'."
  }
}

variable "tags" {
  description = "Tags to apply to all resources"
  type        = map(string)
  default     = {}
}

Module Versioning

# Pin module versions — never use unversioned source references

# ✅ Good: pinned version
module "checkout_api" {
  source  = "git::https://github.com/company/terraform-modules.git//ecs-service?ref=v2.3.1"

  service_name    = "checkout-api"
  container_image = "123456789.dkr.ecr.us-east-1.amazonaws.com/checkout:v1.5.0"
  environment     = "production"
}

# ❌ Bad: unpinned (uses latest, breaks without warning)
module "checkout_api" {
  source = "git::https://github.com/company/terraform-modules.git//ecs-service"
}

Terraform CI/CD Integration

Never run terraform apply from a laptop in production. All infrastructure changes should flow through CI/CD.

# GitHub Actions: Terraform CI/CD
name: Terraform
on:
  pull_request:
    paths: ['terraform/**']
  push:
    branches: [main]
    paths: ['terraform/**']

jobs:
  plan:
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    steps:
      - uses: actions/checkout@v4

      - name: Terraform Init
        run: terraform init
        working-directory: terraform/environments/production/checkout-api

      - name: Terraform Plan
        run: terraform plan -out=tfplan -no-color
        working-directory: terraform/environments/production/checkout-api

      - name: Post plan to PR
        uses: actions/github-script@v7
        with:
          script: |
            const plan = require('fs').readFileSync('terraform/tfplan.txt', 'utf8');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              body: `## Terraform Plan\n\`\`\`\n${plan}\n\`\`\``
            });

  apply:
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    steps:
      - uses: actions/checkout@v4
      - name: Terraform Init
        run: terraform init
      - name: Terraform Apply
        run: terraform apply -auto-approve

Drift Detection

Infrastructure drift happens when someone changes a resource outside of Terraform — through the cloud console, CLI, or another tool. If you do not detect drift, your Terraform state becomes a lie.

# Run plan regularly to detect drift
# Schedule this in CI (daily or on-commit)
terraform plan -detailed-exitcode

# Exit codes:
#   0 = No changes (state matches reality)
#   1 = Error
#   2 = Changes detected (DRIFT!)
Drift ResponseWhen to Use
Auto-remediateNon-critical resources, well-tested modules
Alert and investigateProduction resources, security-sensitive configs
Import into stateLegitimate change made outside Terraform
IgnoreNever. Drift always gets worse.

Common Mistakes

MistakeConsequencePrevention
Hardcoded valuesCannot reuse across environmentsUse variables with validation
No state lockingConcurrent applies corrupt stateDynamoDB lock table (AWS), GCS lock (GCP)
Giant state files10-minute plans, blast radius of entire infrastructureSplit state per-service or per-layer
No module versioningModule changes break consumersPin versions, use semver
Manual console changesDrift between state and realityRun daily drift detection
Secrets in .tf filesCredentials in version controlUse data sources, SSM Parameter Store, Vault

Implementation Checklist

  • Use remote state with locking (S3 + DynamoDB or GCS + built-in locking)
  • Split state files: per-service or per-layer, never one giant state
  • Create reusable modules with clear interfaces (variables, outputs, validation)
  • Pin module versions — never use unversioned source references
  • Run terraform plan on every PR, post results as PR comment
  • Run terraform apply only from CI/CD, never from laptops (production)
  • Schedule daily drift detection and alert on any unexpected changes
  • Never store secrets in .tf files — use parameter store or Vault
  • Tag all resources with owner, service, environment, and managed-by
  • Review state files quarterly: remove dead resources, consolidate where useful
Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →