Event-Driven Automation: Reacting to Changes Instead of Polling

Most automation starts with cron. A script runs every 5 minutes, checks if something changed, and acts if it did. This works until you have 200 cron jobs, half of which run for no reason because nothing changed, and the other half miss events because they happened between polling intervals.

Event-driven automation inverts the model. Instead of asking “has anything changed?” every N minutes, you subscribe to change events and react immediately. The infrastructure scales naturally because compute only runs when there is work to do.

Event Sources in Practice

Infrastructure Events

Cloud platforms emit events for virtually every state change:

AWS EventBridge: EC2 state changes, S3 object creation, ECS task status, CloudFormation stack updates
Azure Event Grid: Resource group modifications, blob storage events, service health changes
GCP Eventarc: Cloud Storage, Pub/Sub, Cloud Audit Logs, Cloud Build

Example: Auto-tag any new EC2 instance that lacks a cost-center tag:

{
  "source": "aws.ec2",
  "detail-type": "EC2 Instance State-change Notification",
  "detail": {
    "state": "running"
  }
}

When this event fires, a Lambda function checks if the instance has required tags and applies defaults if missing. No polling. No cron. Instant compliance.

Application Events

Webhooks: Stripe payment events, GitHub push/PR events, Slack interactions
Database CDC: PostgreSQL logical replication, DynamoDB Streams, MongoDB Change Streams
Message Queues: RabbitMQ, Kafka, SQS — application-generated domain events

Observability Events

Alert triggers: PagerDuty, Datadog, Prometheus Alertmanager
Log patterns: CloudWatch Logs subscription filters, Loki alert rules
Metric thresholds: Auto-scaling on CPU, queue depth, error rate

Webhook Architecture

Webhooks are the simplest form of event-driven automation. A source system sends an HTTP POST to your endpoint when something happens.

Building Reliable Webhook Receivers

from fastapi import FastAPI, Request, HTTPException
import hmac
import hashlib

app = FastAPI()

@app.post("/webhooks/stripe")
async def handle_stripe(request: Request):
    # 1. Verify signature
    payload = await request.body()
    sig = request.headers.get("Stripe-Signature")
    if not verify_stripe_signature(payload, sig):
        raise HTTPException(status_code=401)
    
    # 2. Parse event
    event = json.loads(payload)
    
    # 3. Idempotency check
    if await already_processed(event["id"]):
        return {"status": "duplicate"}
    
    # 4. Process asynchronously
    await queue.enqueue("process_stripe_event", event)
    
    # 5. Acknowledge immediately
    return {"status": "accepted"}

Webhook Best Practices

Verify signatures — Every reputable webhook provider signs payloads. Always verify.
Respond fast — Return 200 within 3 seconds. Process asynchronously.
Handle retries — Providers retry on failure. Use idempotency keys.
Store raw payloads — Log the raw webhook body before processing. Invaluable for debugging.
Version your handlers — Webhook schemas change. Handle unknown fields gracefully.

Event Bus Patterns

For internal event-driven automation, an event bus decouples producers from consumers:

[Service A] --publish--> [Event Bus] --subscribe--> [Automation 1]
[Service B] --publish--> [Event Bus] --subscribe--> [Automation 2]
[Service C] --publish--> [Event Bus] --subscribe--> [Automation 3]

AWS EventBridge Rules

{
  "Source": ["custom.deployment"],
  "DetailType": ["DeploymentCompleted"],
  "Detail": {
    "environment": ["production"],
    "status": ["success"]
  }
}

Target: Lambda function that posts to Slack, updates a status page, and triggers smoke tests.

Dead Letter Queues

Events that fail processing must not be lost:

EventRule:
  Type: AWS::Events::Rule
  Properties:
    Targets:
      - Arn: !GetAtt ProcessFunction.Arn
        DeadLetterConfig:
          Arn: !GetAtt FailedEventsQueue.Arn
        RetryPolicy:
          MaximumRetryAttempts: 3
          MaximumEventAgeInSeconds: 86400

Replacing Cron with Events

Before: Polling Pattern

# Check for new S3 files every 5 minutes
*/5 * * * * python check_new_uploads.py

Problems:

5-minute latency on detection
Runs 288 times per day even if zero uploads
Must track “already processed” state
Fails silently if the host is down

After: Event-Driven

# Triggered by S3 PutObject event
def handle_new_upload(event):
    bucket = event["detail"]["bucket"]["name"]
    key = event["detail"]["object"]["key"]
    
    process_file(bucket, key)
    notify_team(f"Processed {key}")

Benefits:

Sub-second latency
Runs only when files are uploaded
No state tracking needed (each event is self-contained)
Retry and dead-letter built into the platform

When to Keep Cron

Not everything should be event-driven:

Daily reports/summaries: No triggering event, time-based by nature
Data cleanup/archival: Periodic maintenance
Health checks: Regular heartbeat verification
Batch aggregation: Collecting data before processing

The rule: if there is a clear triggering event, use events. If the trigger is “it’s Tuesday at 3am,” use cron.

Building Reactive Pipelines

Chain events into pipelines where the output of one automation triggers the next:

[Code Push] 
  → [CI Build] 
    → [Tests Pass Event] 
      → [Deploy to Staging] 
        → [Smoke Test Pass Event] 
          → [Deploy to Production] 
            → [Post-Deploy Event] 
              → [Slack Notification + Metrics Reset]

Pipeline Design Principles

Each stage is independently retriable — Failure at stage 3 does not require re-running stages 1-2
Events carry context — Each event includes enough data for the next stage to operate without querying back
Stages are idempotent — Re-processing the same event produces the same result
Timeouts exist everywhere — A stage that does not emit a completion event within N minutes triggers an alert

Observability for Event-Driven Systems

Event-driven systems are harder to debug because there is no linear request flow. Invest in:

Event tracing: Correlation IDs that follow events through the entire pipeline
Event logs: Every event received, processed, or failed — with the full payload
Lag monitoring: Time between event emission and processing completion
Dead letter monitoring: Alerts when the DLQ receives items

# Always log event lifecycle
logger.info("event_received", extra={
    "event_id": event["id"],
    "event_type": event["type"],
    "correlation_id": event.get("correlation_id"),
    "age_seconds": time.time() - event["timestamp"]
})

Anti-Patterns

Anti-Pattern	Risk	Fix
Fire and forget	Lost events, silent failures	Acknowledge only after processing
Unbounded fan-out	One event triggers thousands of actions	Rate limit consumers, batch where possible
Circular events	A→B→A infinite loop	Include event lineage, detect cycles
Tight coupling to event schema	Breaking changes cascade	Version events, use schema registry
No dead letter queue	Failed events vanish	Always configure DLQ with alerting

Event-driven automation is not a silver bullet — it adds complexity in exchange for responsiveness and scalability. Start by replacing your most painful cron jobs. Prove the pattern works. Then expand.