Environment Management: Reproducible Dev, Staging, and Production
Build environment management that guarantees consistency between development, staging, and production. Covers environment parity, infrastructure-as-code patterns, data management, feature toggles across environments, and preventing the 'works on staging' category of production outages.
“It worked in staging” is the most expensive sentence in software engineering. It means the deployment to production failed, and the difference between staging and production was the root cause. The purpose of environment management is to make that sentence impossible by guaranteeing that environments are reproducible, consistent, and explicitly differentiated only where necessary.
Environment Tiers
Development
Purpose: Individual developer work
Lifecycle: Created/destroyed per feature branch
Data: Synthetic or seeded
Scale: Single instance
Cost: Minimal
Staging
Purpose: Pre-production validation
Lifecycle: Long-running, continuously deployed
Data: Anonymized production subset
Scale: Production-equivalent (or close)
Cost: Moderate
Production
Purpose: Serving real users
Lifecycle: Permanent
Data: Real user data
Scale: Full
Cost: Primary infrastructure budget
The Critical Rule
Staging must match production in architecture, not just code. Same database engine, same cache version, same networking topology, same IAM permissions. The differences should be limited to:
- Scale (fewer instances is acceptable)
- Data (anonymized or subset)
- External integrations (sandbox APIs, not production)
Infrastructure-as-Code for Environments
Parameterized Configurations
# variables.tf
variable "environment" {
type = string
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "Must be dev, staging, or production"
}
}
# Shared module, different parameters
module "api_service" {
source = "./modules/api-service"
environment = var.environment
instance_count = var.environment == "production" ? 3 : 1
instance_type = var.environment == "production" ? "m5.xlarge" : "t3.medium"
database_size = var.environment == "production" ? "db.r5.2xlarge" : "db.t3.medium"
monitoring = {
alerting_enabled = var.environment == "production"
log_retention = var.environment == "production" ? 90 : 7
}
}
Environment Drift Detection
# Compare Terraform state between staging and production
terraform plan -var="environment=staging" -out=staging.plan
terraform plan -var="environment=production" -out=production.plan
# Diff the plans to find configuration drift
terraform show -json staging.plan > staging.json
terraform show -json production.plan > production.json
python compare_environments.py staging.json production.json
Automated drift detection runs daily and alerts on unintended differences.
Data Management Across Environments
Production Data for Staging
Copy production data to staging with anonymization:
-- Anonymize before copying
UPDATE users SET
email = 'user_' || id || '@example.com',
name = 'User ' || id,
phone = '555-' || LPAD(id::text, 7, '0'),
address = '123 Test St, Testville, TS 00000';
-- Remove sensitive records entirely
DELETE FROM payment_methods;
DELETE FROM audit_logs WHERE created_at > NOW() - INTERVAL '7 days';
Synthetic Data for Development
from faker import Faker
fake = Faker()
def seed_dev_data():
for i in range(100):
create_user(
name=fake.name(),
email=fake.email(),
company=fake.company()
)
for i in range(500):
create_order(
user_id=random.randint(1, 100),
items=random.randint(1, 5),
total=round(random.uniform(10, 5000), 2),
status=random.choice(['pending', 'completed', 'cancelled'])
)
Feature Toggles Across Environments
Feature toggles enable different behavior per environment without code branches:
{
"new_checkout_flow": {
"development": true,
"staging": true,
"production": false // Not yet rolled out
},
"experimental_pricing": {
"development": true,
"staging": false, // Not ready for staging validation
"production": false
}
}
Promotion Workflow
Feature coded → Toggle ON in dev
→ QA validates → Toggle ON in staging
→ Performance verified → Toggle ON for 5% production
→ Metrics OK → Toggle ON for 100% production
→ Stable for 7 days → Remove toggle, hardcode
Ephemeral Environments
On-demand environments for feature branches, demos, and testing:
# On PR creation, create an ephemeral environment
name: Ephemeral Environment
on:
pull_request:
types: [opened, synchronize]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Deploy preview environment
run: |
BRANCH_NAME="${GITHUB_HEAD_REF//\//-}"
terraform apply -var="environment=preview-${BRANCH_NAME}" -auto-approve
- name: Comment PR with URL
uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
body: `Preview environment: https://preview-${branchName}.example.com`
})
cleanup:
if: github.event.action == 'closed'
steps:
- name: Destroy preview environment
run: terraform destroy -var="environment=preview-${BRANCH_NAME}" -auto-approve
Common Environment Parity Failures
| Failure | Consequence | Prevention |
|---|---|---|
| Different DB versions | Query behavior differences | Pin versions in IaC |
| Missing env variables | Service crashes on deploy | Validate env vars in CI |
| Different network topology | Latency and timeout differences | Mirror network architecture |
| Stale staging data | Tests pass on old data patterns | Weekly data refresh |
| Manual staging config | Undocumented differences | All config in Git |
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Snowflake environments | ”Works in staging” failures | Infrastructure as code, drift detection |
| Shared staging database | Tests interfere with each other | Isolated databases per environment |
| No environment teardown | Costs accumulate, stale resources | Automated lifecycle management |
| Production data in dev | Security/compliance risk | Synthetic data generation |
| Manual environment setup | Inconsistent, slow, error-prone | Automated provisioning via templates |
Environment management is not glamorous work. But every hour invested in environment parity saves days of debugging “works on staging” production incidents.