Workflow Orchestration Patterns
Build reliable multi-step automated workflows. Covers orchestration vs. choreography, workflow engines, retry and compensation patterns, and the patterns that make complex automation reliable and observable.
Simple automation is a single script. Complex automation involves multiple steps, conditional branching, error handling, retries, and rollbacks across distributed systems. Workflow orchestration provides the framework to define, execute, monitor, and recover multi-step processes — turning fragile scripts into reliable production workflows.
Orchestration vs. Choreography
Orchestration (centralized coordinator):
┌──────────────┐
│ Orchestrator │ Knows the full workflow
│ (Conductor) │ Controls execution order
└──────┬───────┘
│
┌────┼────┬────┐
▼ ▼ ▼ ▼
Step1 Step2 Step3 Step4
Pros: Full visibility, easy error handling, clear flow
Cons: Single point of failure, can become bottleneck
Best for: Business processes, data pipelines, deployments
Choreography (decentralized events):
Step1 ──event──► Step2 ──event──► Step3 ──event──► Step4
Each step reacts to events, no central coordinator
Pros: Loosely coupled, independently deployable
Cons: Hard to trace, complex error handling, eventual consistency
Best for: Microservice interaction, event-driven systems
Recommendation:
< 5 steps → Simple orchestration (scripts, Makefile)
5-20 steps → Workflow engine (Temporal, Step Functions)
Event-driven → Choreography with saga pattern
Complex + critical → Orchestration with Temporal/Conductor
Temporal Workflow Example
from temporalio import workflow, activity
from datetime import timedelta
@activity.defn
async def validate_order(order_id: str) -> dict:
"""Validate order data and inventory."""
order = await get_order(order_id)
if not order.items:
raise ValueError("Empty order")
await check_inventory(order.items)
return {"order_id": order_id, "status": "validated"}
@activity.defn
async def process_payment(order_id: str, amount: float) -> dict:
"""Charge customer payment method."""
result = await payment_gateway.charge(order_id, amount)
return {"payment_id": result.id, "status": "charged"}
@activity.defn
async def ship_order(order_id: str) -> dict:
"""Create shipping label and dispatch."""
tracking = await shipping_service.create_shipment(order_id)
return {"tracking_number": tracking.number}
@workflow.defn
class OrderFulfillmentWorkflow:
"""Orchestrate the full order fulfillment process."""
@workflow.run
async def run(self, order_id: str):
# Step 1: Validate
validation = await workflow.execute_activity(
validate_order,
order_id,
start_to_close_timeout=timedelta(seconds=30),
retry_policy=RetryPolicy(maximum_attempts=3),
)
# Step 2: Payment
try:
payment = await workflow.execute_activity(
process_payment,
order_id,
validation["total"],
start_to_close_timeout=timedelta(seconds=60),
retry_policy=RetryPolicy(maximum_attempts=3),
)
except Exception:
# Compensation: release inventory hold
await workflow.execute_activity(release_inventory, order_id)
raise
# Step 3: Ship
shipping = await workflow.execute_activity(
ship_order,
order_id,
start_to_close_timeout=timedelta(minutes=5),
)
return {
"order_id": order_id,
"payment_id": payment["payment_id"],
"tracking": shipping["tracking_number"],
}
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| No retry logic | Transient failures kill entire workflow | Configurable retry policies per step |
| No compensation (rollback) | Failed workflow leaves partial state | Saga pattern: compensating actions for each step |
| Workflow state in memory only | Process restart loses workflow state | Durable execution (Temporal, Step Functions) |
| Monolithic workflow | Any change requires full redeployment | Composable activities, versioned workflows |
| No workflow observability | Cannot debug stuck or slow workflows | Workflow dashboard, step-level metrics |
Workflow orchestration is the difference between “it works on my machine” automation and production-grade automation. The investment in a workflow engine pays for itself the first time a step fails and the system automatically retries, compensates, and alerts — instead of leaving orphaned state across multiple systems.