ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Log Management & Centralized Logging

Build a centralized logging system. Covers structured logging, log aggregation with ELK/Loki, log levels, retention policies, and searching logs effectively in production.

Centralized logging is the difference between debugging in production and guessing in production. When a user reports “my order disappeared,” you need to search across 20 services to trace what happened. Without centralized logging, that means SSH-ing into 20 servers and grepping log files. With centralized logging, it’s a single query.


Centralized Logging Architecture

Application Pods                Log Agent            Log Store           UI
┌──────────┐                 ┌──────────┐        ┌──────────────┐   ┌──────────┐
│ Service A├── stdout/err ──▶│          │        │              │   │          │
│ Service B├── stdout/err ──▶│ Fluent   │───────▶│ Elasticsearch│──▶│ Kibana   │
│ Service C├── stdout/err ──▶│ Bit /    │        │ (or Loki)    │   │ (or      │
│ Service D├── stdout/err ──▶│ Fluentd  │        │              │   │  Grafana)│
└──────────┘                 └──────────┘        └──────────────┘   └──────────┘

Structured Logging

# BAD: Unstructured logging
logger.info(f"Order {order_id} created by user {user_id} for ${amount}")
# Output: "Order 12345 created by user 789 for $99.99"
# Problem: Can't search or filter by order_id, user_id, or amount

# GOOD: Structured logging (JSON)
logger.info("order_created", extra={
    "order_id": order_id,
    "user_id": user_id,
    "amount": amount,
    "currency": "USD",
    "items_count": len(items),
    "payment_method": "credit_card",
})
# Output: {"message": "order_created", "order_id": 12345, 
#          "user_id": 789, "amount": 99.99, ...}
# Benefit: Search by any field, aggregate, alert

Log Levels

LevelWhen to UseExample
ERRORSomething broke, needs attentionDatabase connection failed
WARNSomething unexpected, not broken yetRetry succeeded after failure
INFONormal business eventsOrder created, user logged in
DEBUGDevelopment/troubleshooting detailsQuery parameters, cache hit/miss
TRACEVery verbose, rarely neededFull request/response bodies

Production Log Level Strategy

production:
  default_level: INFO
  
  per_service_override:
    order-service: INFO
    payment-service: INFO   # Keep for compliance audit
    recommendation-service: WARN  # High volume, reduce noise
  
  temporary_debug:
    # Enable DEBUG for specific user/request for troubleshooting
    mechanism: feature_flag
    example: "debug_logging_user_789 = true"
    auto_expire: "30 minutes"

Platform Comparison

FeatureELK StackGrafana LokiDatadog LogsCloudWatch
Cost modelSelf-hosted + storageLow (labels, not full-text)Per GB ingestedPer GB ingested
Full-text searchExcellentLimited (label-based)GoodBasic
ScalabilityComplex to scaleSimple (S3 backend)ManagedManaged
Best forLarge-scale searchKubernetes + GrafanaAll-in-one platformAWS-native

Anti-Patterns

Anti-PatternProblemFix
Unstructured log messagesCan’t search or aggregateStructured JSON logging
DEBUG level in productionStorage explosion, noiseINFO default, DEBUG via feature flag
Logging PIICompliance violationSanitize PII before logging
No correlation IDCan’t trace request across servicesInject trace ID in every log entry
No log retention policyStorage grows foreverRetention: hot (7 days), warm (30 days), cold (90 days)
Alerting on log patternsFragile, breaks on message changesAlert on structured fields, not text patterns

Checklist

  • Centralized logging platform deployed (ELK, Loki, or managed)
  • All services: structured JSON logging to stdout
  • Correlation ID (trace ID) in every log entry
  • Log levels: INFO in production, DEBUG via feature flag
  • PII sanitized from all log output
  • Retention policy: hot/warm/cold tiers
  • Dashboards: error rates, log volume by service
  • Alerts: structured field-based, not text pattern matching
  • Log access controls: who can see which logs

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For logging architecture consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →