ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Infrastructure Drift Detection

Detect and remediate unauthorized changes to infrastructure. Covers drift detection tools, reconciliation strategies, policy enforcement, and the patterns that ensure your actual infrastructure matches your declared state.

Infrastructure drift occurs when the actual state of your infrastructure diverges from its declared state in code. Someone SSH’d in and changed a config file. A manual security group rule was added during an incident and never reverted. A developer clicked through the cloud console to test something and forgot to clean up. Drift is inevitable — detecting and fixing it is engineering.


Drift Detection

Sources of Drift:

Manual Changes (most common):
  ☐ Console/portal clicks that bypass IaC
  ☐ SSH into servers for "quick fixes"
  ☐ Manual security group or IAM changes
  ☐ Emergency changes during incidents

Automation Gaps:
  ☐ Terraform apply failed partway through
  ☐ Resources created outside of Terraform
  ☐ State file out of sync with reality

External Forces:
  ☐ Cloud provider changes defaults
  ☐ Auto-scaling creates resources not in state
  ☐ Third-party integrations modify resources

Detection approaches:
  1. Terraform plan (shows diff)
  2. AWS Config rules (continuous monitoring)
  3. Cloud Custodian (policy-based scanning)
  4. Firefly / env0 / Spacelift (drift as a service)

Terraform Drift Detection

class DriftDetector:
    """Automated drift detection using Terraform."""
    
    def detect_drift(self, workspace: str):
        """Run terraform plan and parse for drift."""
        result = self.run_terraform_plan(workspace)
        
        changes = []
        for resource in result.resource_changes:
            if resource.change.actions != ["no-op"]:
                changes.append({
                    "resource": resource.address,
                    "type": resource.type,
                    "action": resource.change.actions,
                    "before": resource.change.before,
                    "after": resource.change.after,
                    "drift_fields": self.diff_fields(
                        resource.change.before,
                        resource.change.after,
                    ),
                })
        
        if changes:
            severity = self.classify_severity(changes)
            self.alert(
                channel="#infrastructure",
                message=f"Drift detected in {workspace}: "
                        f"{len(changes)} resources changed",
                severity=severity,
                changes=changes,
            )
            
            if severity == "critical":
                # Auto-remediate critical drift
                self.auto_remediate(workspace, changes)
            else:
                # Create ticket for non-critical drift
                self.create_ticket(workspace, changes)
        
        return changes
    
    def classify_severity(self, changes):
        """Classify drift severity by resource type."""
        critical_types = [
            "aws_security_group_rule",
            "aws_iam_policy",
            "aws_s3_bucket_policy",
            "aws_kms_key",
        ]
        for change in changes:
            if change["type"] in critical_types:
                return "critical"
        return "warning"

Anti-Patterns

Anti-PatternConsequenceFix
No drift detection at allInfrastructure diverges silentlyScheduled drift scans (daily minimum)
Detect but never remediateDrift accumulates, state becomes fictionAuto-remediate critical drift, ticket the rest
Console access without guardrailsEvery console user can cause driftRead-only console, SCPs preventing manual changes
No incident drift trackingEmergency changes become permanent driftPost-incident drift review: revert or codify
Ignore state file healthCorrupt state = no drift detectionState file versioning, locking, regular validation

Infrastructure drift is the gap between what you think you have and what you actually have. The smaller that gap, the safer your infrastructure. Detect drift daily, remediate immediately, and prevent it at the source with policy enforcement.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →