ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Message Queue Architecture

Design reliable message-driven systems using queues and event streaming. Covers queue selection, delivery guarantees, dead letter handling, consumer patterns, and production monitoring.

Message queues decouple producers from consumers, allowing systems to handle traffic spikes, process work asynchronously, and survive temporary failures. The difference between a well-designed queue system and a poorly designed one is the difference between graceful degradation and silent data loss.


When to Use Queues vs. Direct Calls

Use Queues WhenUse Direct Calls When
Work can be processed asynchronouslyResponse is needed immediately
Producer and consumer have different throughput ratesLow latency is critical
Downstream service may be temporarily unavailableTarget service is always available
Work is expensive and should be rate-limitedProcessing is fast and cheap
You need durability (work survives crashes)Fire-and-forget is acceptable
Multiple consumers need the same messageSingle target consumer

Queue Technology Comparison

TechnologyModelBest ForOrdering
RabbitMQMessage brokerTask queues, routing patternsPer-queue FIFO
Apache KafkaEvent streamingEvent sourcing, high-throughput logsPer-partition
AWS SQSManaged queueServerless, AWS-nativeBest-effort (FIFO available)
AWS SNS + SQSFan-outOne-to-many notificationPer-subscription
Redis StreamsLightweight streamingSimple streaming, existing RedisPer-stream
Azure Service BusManaged brokerEnterprise, .NET integrationFIFO with sessions
Google Pub/SubManaged pub/subGCP-native, global distributionPer-subscription
NATSLightweight messagingMicroservices, low-latencyPer-subject

Delivery Guarantees

GuaranteeMeaningTradeoff
At-most-onceMessage delivered 0 or 1 timesFast, but may lose messages
At-least-onceMessage delivered 1+ timesReliable, but may duplicate
Exactly-onceMessage delivered exactly 1 timeExpensive, complex, often impossible at scale

Making At-Least-Once Safe (Idempotency)

Producer:
    message = {
        id: "order-12345-payment",    // Idempotency key
        action: "process_payment",
        payload: { order_id: 12345, amount: 99.99 }
    }

Consumer:
    if already_processed(message.id):
        acknowledge and skip    // Duplicate detected
    else:
        process(message.payload)
        mark_processed(message.id)
        acknowledge

Dead Letter Queue (DLQ) Design

Main Queue → Consumer attempts processing
    ↓ (success) → Acknowledge, done
    ↓ (failure) → Retry (up to N times)
        ↓ (max retries exceeded) → Dead Letter Queue
            ↓ → Alert, manual investigation, replay
ConfigurationRecommended ValueRationale
Max retries3-5Enough for transient errors, not infinite loops
Retry backoffExponential (1s, 5s, 25s)Gives downstream time to recover
DLQ retention14 daysTime for investigation and replay
DLQ alertingImmediate on first messageEvery DLQ message is a failure worth investigating

Consumer Patterns

PatternDescriptionWhen to Use
Competing consumersMultiple consumers on same queueScale processing horizontally
Fan-outOne message to multiple queuesMultiple services need same event
Request-replyResponse sent to reply queueAsync RPC
SagaSequence of queue-based steps with compensationDistributed transactions
Priority queueHigher-priority messages processed firstMixed-urgency workloads

Monitoring Metrics

MetricWhat It Tells YouAlert Threshold
Queue depthMessages waiting to be processedGrowing over time = consumer can’t keep up
Processing latencyTime from enqueue to dequeue> SLA target
Consumer lagHow far behind the consumer isGrowing lag = falling behind
Error rateFailed message processing> 1% sustained
DLQ depthMessages that failed permanently> 0 (every DLQ message is an issue)
ThroughputMessages produced/consumed per secondDropping below expected rate

Anti-Patterns

Anti-PatternProblemFix
No DLQFailed messages silently disappearAlways configure DLQ with alerting
No idempotencyDuplicate processing causes data corruptionIdempotency keys on every message
Unbounded retriesPoison messages retry foreverMax retry count + exponential backoff + DLQ
Queue as databaseStoring state in queue messagesUse a database for state, queue for events
Too large messagesQueue performance degradesStore payload in S3/blob, pass reference in message
No monitoringSilent queue failuresMonitor depth, lag, error rate, DLQ depth

Checklist

  • Queue technology selected based on delivery guarantees and team expertise
  • Dead letter queue configured with alerting
  • Idempotency implemented in all consumers
  • Retry policy: exponential backoff with max retry count
  • Monitoring: queue depth, consumer lag, error rate, DLQ depth
  • Message serialization format standardized (JSON, Protobuf, Avro)
  • Consumer scaling strategy defined (competing consumers)
  • Message ordering requirements documented
  • Poison message handling tested
  • Capacity planned for peak traffic

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For messaging architecture consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →