ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Service Catalog Design Patterns

Design and operate an internal service catalog that gives developers and platform teams visibility into every service, its ownership, health, and dependencies. Covers catalog data models, ownership enforcement, health scoring, Backstage integration, and service maturity frameworks.

An internal service catalog answers the question every engineer asks during an incident: “Who owns this service, what does it do, and how do I contact the team?” Without a catalog, this information lives in Slack history, wiki pages last updated two years ago, and one person’s head.


What a Service Catalog Tracks

Per service:
  Identity:      Name, description, team, tier
  Ownership:     Team, on-call rotation, escalation path
  Health:        SLO status, error rate, latency p95
  Dependencies:  What it calls, what calls it
  Infrastructure: Where it runs, how many instances
  Documentation: Runbooks, architecture docs, API docs
  Code:          Repository, language, framework
  Deployment:    Last deploy, deploy frequency, rollback status
  Cost:          Monthly infrastructure cost

Data Model

apiVersion: catalog/v1
kind: Service
metadata:
  name: order-service
  description: "Manages order lifecycle from creation to fulfillment"
  tier: 1

spec:
  owner: order-team
  lifecycle: production
  
  links:
    - title: API Docs
      url: https://docs.internal/order-service/api
    - title: Runbook
      url: https://runbooks.internal/order-service
    - title: Dashboard
      url: https://grafana.internal/d/order-service

  dependencies:
    consumes:
      - payment-service
      - inventory-service
    provides:
      - order-api (REST)
      - order-events (Kafka)

  slos:
    - name: availability
      target: 99.95%
    - name: latency_p95
      target: 200ms

  contacts:
    oncall: order-team-oncall
    slack: "#order-team"

Health Scoring

health_score_components:
  reliability: 30%
    - SLO compliance (28 days)
    - Incident count and severity
    
  operational_readiness: 25%
    - Has runbook: yes/no
    - Has on-call rotation: yes/no
    - Has alerting configured: yes/no
    
  code_quality: 20%
    - Test coverage > 80%
    - No critical security vulnerabilities
    
  documentation: 15%
    - API docs up to date
    - Architecture diagram exists
    
  deployment_health: 10%
    - Deploy frequency (weekly+)
    - Rollback rate (< 5%)

scores:
  A (90-100): Excellent
  B (75-89):  Good
  C (60-74):  Needs attention
  D (< 60):   Critical risk

Ownership Enforcement

catalog_validation:
  required_fields:
    - metadata.name
    - spec.owner
    - spec.lifecycle
    - spec.contacts.oncall
    
  enforcement:
    - PR check: Validates descriptor
    - Weekly audit: Flags missing info
    - Quarterly: Review orphan services

Anti-Patterns

Anti-PatternConsequenceFix
Static wiki page as catalogOutdated within weeksAutomated catalog from code + infra
No ownership enforcementOrphan servicesOwner required for all services
Catalog without health dataNo signal on what needs attentionIntegrate SLO, incident, deploy data
Only infra team maintainsBottleneck, stale dataEach team owns their entries
No dependency mappingCannot assess blast radiusAuto-discover from traffic + config

A service catalog is a living system, not a project. If it is not continuously updated, it becomes another stale wiki page.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →