Background Job Processing | The Garnet Wiki

Not everything belongs in the request-response cycle. Sending emails, processing images, generating reports, syncing data with third-party APIs — these operations are too slow, too unreliable, or too resource-intensive to run synchronously. Background job processing moves this work out of the critical path, improving response times and system resilience.

Architecture

Web Request → API Server → Job Queue → Worker Process → External Systems
                              ↓              ↓
                         Job Storage     Dead Letter Queue
                        (persistent)     (failed jobs)

Components

Producer: The API server that enqueues jobs
Queue: The ordered list of pending jobs (Redis, RabbitMQ, SQS)
Worker: The process that dequeues and executes jobs
Storage: Persistent job metadata for monitoring and replay
DLQ: Dead letter queue for jobs that fail all retries

Job Queue Selection

Queue	Strengths	Scale	Persistence
Redis (Sidekiq/BullMQ)	Fast, simple, real-time	Medium	AOF/RDB
RabbitMQ	Routing, exchanges, reliability	High	Disk
AWS SQS	Managed, infinite scale	Unlimited	Managed
PostgreSQL (SKIP LOCKED)	No extra infrastructure	Moderate	Full ACID
Kafka	Streaming, replay, ordering	Unlimited	Disk

PostgreSQL as a Job Queue

For small to medium workloads, your database is a perfectly good job queue:

CREATE TABLE jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    queue TEXT NOT NULL DEFAULT 'default',
    job_type TEXT NOT NULL,
    payload JSONB NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',
    priority INTEGER DEFAULT 0,
    run_at TIMESTAMPTZ DEFAULT NOW(),
    attempts INTEGER DEFAULT 0,
    max_attempts INTEGER DEFAULT 3,
    last_error TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    locked_at TIMESTAMPTZ,
    locked_by TEXT
);

-- Worker fetches next job atomically
UPDATE jobs
SET status = 'running', locked_at = NOW(), locked_by = 'worker-1'
WHERE id = (
    SELECT id FROM jobs
    WHERE status = 'pending'
      AND run_at <= NOW()
    ORDER BY priority DESC, created_at ASC
    FOR UPDATE SKIP LOCKED
    LIMIT 1
)
RETURNING *;

FOR UPDATE SKIP LOCKED is the key — it allows multiple workers to poll concurrently without blocking each other.

Retry Strategies

Exponential Backoff

def calculate_retry_delay(attempt, base_delay=60):
    delay = base_delay * (2 ** attempt)
    jitter = random.uniform(0, delay * 0.1)
    return min(delay + jitter, 3600)  # Cap at 1 hour

Retry Classification

Not every error should be retried:

RETRYABLE = [ConnectionError, TimeoutError, RateLimitError]
NOT_RETRYABLE = [ValidationError, AuthenticationError, NotFoundError]

Idempotency

Jobs must be safe to run more than once. Network failures, worker crashes, and queue redelivery all cause duplicate execution:

def process_payment(job):
    idempotency_key = f"payment-{job.order_id}"
    
    if already_processed(idempotency_key):
        return  # Already done, skip
    
    result = stripe.charges.create(
        amount=job.amount,
        idempotency_key=idempotency_key
    )
    
    mark_processed(idempotency_key, result)

Pattern	How	Use When
Deduplication table	Store processed job IDs	Any job type
Natural idempotency key	Use business identifier (order_id)	Payment, notification
Conditional update	`UPDATE ... WHERE status != 'done'`	State machine transitions

Dead Letter Queues

Jobs that exhaust all retries go to a dead letter queue for manual investigation:

def execute_job(job):
    try:
        run_job(job)
        mark_completed(job)
    except RetryableError:
        if job.attempts < job.max_attempts:
            reschedule(job, delay=calculate_retry_delay(job.attempts))
        else:
            move_to_dlq(job, reason="max retries exceeded")
    except NonRetryableError as e:
        move_to_dlq(job, reason=str(e))

Anti-Patterns

Anti-Pattern	Consequence	Fix
Fire-and-forget (no persistence)	Lost jobs on crash	Persistent queue with acknowledgment
No idempotency	Double charges, duplicate emails	Idempotency keys on every job
Retry without backoff	Thundering herd on recovering service	Exponential backoff with jitter
No dead letter queue	Failed jobs disappear silently	DLQ with monitoring and replay
Giant job payloads	Queue memory pressure	Store payload in DB, pass ID in queue

Background jobs are the workhorses of production systems. Designed well, they handle millions of operations reliably.