Service Catalog Design Patterns

An internal service catalog answers the question every engineer asks during an incident: “Who owns this service, what does it do, and how do I contact the team?” Without a catalog, this information lives in Slack history, wiki pages last updated two years ago, and one person’s head.

What a Service Catalog Tracks

Per service:
  Identity:      Name, description, team, tier
  Ownership:     Team, on-call rotation, escalation path
  Health:        SLO status, error rate, latency p95
  Dependencies:  What it calls, what calls it
  Infrastructure: Where it runs, how many instances
  Documentation: Runbooks, architecture docs, API docs
  Code:          Repository, language, framework
  Deployment:    Last deploy, deploy frequency, rollback status
  Cost:          Monthly infrastructure cost

Data Model

apiVersion: catalog/v1
kind: Service
metadata:
  name: order-service
  description: "Manages order lifecycle from creation to fulfillment"
  tier: 1

spec:
  owner: order-team
  lifecycle: production
  
  links:
    - title: API Docs
      url: https://docs.internal/order-service/api
    - title: Runbook
      url: https://runbooks.internal/order-service
    - title: Dashboard
      url: https://grafana.internal/d/order-service

  dependencies:
    consumes:
      - payment-service
      - inventory-service
    provides:
      - order-api (REST)
      - order-events (Kafka)

  slos:
    - name: availability
      target: 99.95%
    - name: latency_p95
      target: 200ms

  contacts:
    oncall: order-team-oncall
    slack: "#order-team"

Health Scoring

health_score_components:
  reliability: 30%
    - SLO compliance (28 days)
    - Incident count and severity
    
  operational_readiness: 25%
    - Has runbook: yes/no
    - Has on-call rotation: yes/no
    - Has alerting configured: yes/no
    
  code_quality: 20%
    - Test coverage > 80%
    - No critical security vulnerabilities
    
  documentation: 15%
    - API docs up to date
    - Architecture diagram exists
    
  deployment_health: 10%
    - Deploy frequency (weekly+)
    - Rollback rate (< 5%)

scores:
  A (90-100): Excellent
  B (75-89):  Good
  C (60-74):  Needs attention
  D (< 60):   Critical risk

Ownership Enforcement

catalog_validation:
  required_fields:
    - metadata.name
    - spec.owner
    - spec.lifecycle
    - spec.contacts.oncall
    
  enforcement:
    - PR check: Validates descriptor
    - Weekly audit: Flags missing info
    - Quarterly: Review orphan services

Anti-Patterns

Anti-Pattern	Consequence	Fix
Static wiki page as catalog	Outdated within weeks	Automated catalog from code + infra
No ownership enforcement	Orphan services	Owner required for all services
Catalog without health data	No signal on what needs attention	Integrate SLO, incident, deploy data
Only infra team maintains	Bottleneck, stale data	Each team owns their entries
No dependency mapping	Cannot assess blast radius	Auto-discover from traffic + config

A service catalog is a living system, not a project. If it is not continuously updated, it becomes another stale wiki page.

What a Service Catalog Tracks

Data Model

Health Scoring

Ownership Enforcement

Anti-Patterns

More in Platform Engineering

Platform Build vs. Buy Framework

Developer Experience Engineering: Making Engineers Productive on Day One

Developer Experience Engineering