Platform Engineering: Building Your Internal Developer Platform

Platform engineering reduces cognitive load on developers by abstracting infrastructure complexity behind self-service interfaces. Instead of filing tickets and waiting days for a database, developers run a CLI command and get a fully provisioned service in minutes — complete with CI/CD, monitoring, and DNS.

A 100-developer organization with a good platform team ships 30-50% faster than one without. The key insight is that platform engineering is not about building infrastructure — it’s about building products for developers. Treat your fellow engineers as customers, measure their satisfaction, and iterate like a product team.

The Internal Developer Platform (IDP)

An IDP is the set of tools, workflows, and self-service capabilities that abstract away infrastructure complexity.

Developers                    Internal Developer Platform
┌──────────────────┐    ┌──────────────────────────────────┐
│ Self-Service UI  │───▶│ Service Catalog (Backstage/Port) │
│ CLI, or PR       │    │                                  │
└──────────────────┘    │ ┌──────────────────────────────┐ │
                        │ │ Golden Paths (Templates)     │ │
                        │ │ - Node.js microservice       │ │
                        │ │ - Python data pipeline       │ │
                        │ │ - React frontend             │ │
                        │ └──────────────────────────────┘ │
                        │ ┌──────────────────────────────┐ │
                        │ │ Infrastructure Orchestration  │ │
                        │ │ (Terraform, Crossplane, K8s) │ │
                        │ └──────────────────────────────┘ │
                        │ ┌──────────────────────────────┐ │
                        │ │ Observability (Built-in)     │ │
                        │ │ Traces, Metrics, Logs, Alerts│ │
                        │ └──────────────────────────────┘ │
                        └──────────────────────────────────┘

IDP vs Traditional Ops

Dimension	Traditional Ops (Ticket-Based)	Platform Engineering (Self-Service)
New service deployment	2-5 days (ticket → approval → manual setup)	15-30 minutes (self-service)
Database provisioning	1-3 days	5 minutes
Environment creation	Days	Minutes
Monitoring setup	Manual per service	Automatic with golden paths
Developer satisfaction	Low (waiting, blocked)	High (autonomous, fast)

Step 1: Define Golden Paths

Golden paths are opinionated, pre-paved roads for common developer tasks. They encode your organization’s best practices into templates that produce production-ready services out of the box.

What a Golden Path Includes

A golden path template should provision everything a developer needs to go from zero to production:

✅ Source code repository (from skeleton template)
✅ CI/CD pipeline (GitHub Actions / Azure Pipelines)
✅ Kubernetes namespace with RBAC
✅ Database (if requested)
✅ Monitoring dashboard (Grafana / Datadog)
✅ Alerting rules (error rate, latency, CPU)
✅ DNS entry (service.internal.company.com)
✅ TLS certificate (auto-renewed)
✅ Service catalog entry (Backstage)
✅ Documentation template (README, ADR template)

Service Template (Backstage Scaffolder)

# backstage/templates/node-microservice/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: node-microservice
  title: Node.js Microservice
  description: Production-ready Node.js service with observability, CI/CD, and database
spec:
  owner: platform-team
  type: service
  parameters:
    - title: Service Details
      required: [name, description, owner]
      properties:
        name:
          type: string
          description: Service name (lowercase, hyphens)
          pattern: '^[a-z][a-z0-9-]*$'
        description:
          type: string
        owner:
          type: string
          ui:field: OwnerPicker
    - title: Infrastructure
      properties:
        database:
          type: string
          enum: [postgres, none]
          default: postgres
        cache:
          type: string
          enum: [redis, none]
          default: none
        tier:
          type: string
          enum: [standard, high-availability]
          default: standard
  steps:
    - id: scaffold
      name: Generate Code
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          description: ${{ parameters.description }}
          database: ${{ parameters.database }}
    - id: publish
      name: Create Repository
      action: publish:github
      input:
        repoUrl: github.com?owner=yourorg&repo=${{ parameters.name }}
        defaultBranch: main
    - id: deploy
      name: Provision Infrastructure
      action: custom:terraform-apply
      input:
        template: microservice
        vars:
          name: ${{ parameters.name }}
          database: ${{ parameters.database }}
          cache: ${{ parameters.cache }}
          tier: ${{ parameters.tier }}
    - id: register
      name: Register in Catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}

Golden Path Catalog

Template	Target Audience	Provisions
Node.js Microservice	Backend developers	Repo, CI/CD, K8s, PostgreSQL, Grafana
Python Data Pipeline	Data engineers	Repo, Airflow DAG, S3 bucket, monitoring
React Frontend	Frontend developers	Repo, CI/CD, CDN, feature flags
Event Consumer	Backend developers	Repo, Kafka consumer group, DLQ, alerts
Batch Job	All	Repo, CronJob, logging, error alerting

Step 2: Self-Service Infrastructure

The developer experience should feel like a consumer product — simple, fast, and predictable.

# Developer experience — what the CLI looks like
$ platform create service --name order-api --database postgres --cache redis

Creating service 'order-api'...
✅ GitHub repo created: github.com/org/order-api
✅ CI/CD pipeline configured (GitHub Actions)
✅ Kubernetes namespace: order-api-dev
✅ PostgreSQL database provisioned (order-api-db)
✅ Redis cache provisioned (order-api-redis)
✅ Grafana dashboard: https://grafana.internal/d/order-api
✅ PagerDuty service created with default alert rules
✅ DNS: order-api.internal.company.com
✅ Catalog entry: https://backstage.internal/catalog/order-api

Service ready! Run `git clone` and start coding.
Total time: 4 minutes

Self-Service Capabilities to Prioritize

Build self-service capabilities in this order (highest developer pain points first):

Priority	Capability	Developer Pain Eliminated
1	New service creation	Multi-day ticket process → minutes
2	Development environments	”It works on my machine” → consistent parity
3	Database provisioning	DBA bottleneck → self-service with guardrails
4	Secret management	Manual secret distribution → Vault integration
5	Feature flags	Full deploys for toggles → instant flag flips
6	Temporary elevated access	Permanent broad permissions → JIT access

Step 3: Platform Team Structure

Role	Responsibility	Count (per 100 devs)
Platform Lead	Strategy, roadmap, stakeholder management, budget	1
Platform Engineer	IDP development, tooling, automation, IaC	2-3
SRE	Reliability, incident response, observability	1-2
DevX Engineer	Developer experience, documentation, onboarding	1
Total		5-7 (5-7% of engineering)

Team Principles

Treat developers as customers — Run quarterly satisfaction surveys, track NPS, hold office hours
Build products, not projects — The platform is a long-lived product with a roadmap, not a series of one-off scripts
Paved roads, not walls — Guide developers with golden paths, but allow departures when justified. Don’t block; make the right choice the easy choice
Measure everything — Time to first deploy, developer satisfaction, incident MTTR, platform adoption rate
Start small, iterate — Launch with 1-2 golden paths. Expand based on demand, not speculation
Documentation is product — If developers can’t find how to use it, it doesn’t exist

Platform Maturity Levels

Level	Capabilities	Time to Build
Level 1: Foundation	CI/CD templates, basic monitoring, Terraform modules	1-3 months
Level 2: Self-Service	Service templates (Backstage), self-service DBs, secrets management	3-6 months
Level 3: Product	Developer portal, cost visibility, compliance automation, SLO tracking	6-12 months
Level 4: Autonomous	AI-assisted operations, automated remediation, predictive scaling	12+ months

Step 4: Measure Platform Success

Metric	Target	How to Measure	Why It Matters
Time to first deploy (new service)	< 30 minutes	Track from `platform create` to first prod deploy	Measures platform’s core value proposition
Developer satisfaction (NPS)	> 30	Quarterly survey (1-10 scale)	Leading indicator of adoption
Platform adoption	> 80% of services	Catalog count vs actual services	Low adoption = platform not solving real problems
Incident MTTR	< 1 hour	Incident tracking system	Built-in observability reduces MTTR
Change failure rate	< 5%	Deploys that cause incidents / total deploys	Golden paths should produce reliable services
Deployment frequency	Daily per team	CI/CD metrics	Friction removal enables faster shipping
Self-service ratio	> 90% of provisioning	Self-service events / (self-service + tickets)	Measures ticket elimination
Cost per service	Trending down	Infrastructure cost / number of services	Platform should enable efficiency

Platform Team Sizing Guide

Engineering Org Size	Platform Team Size	Focus Areas
10-30 engineers	0 (shared responsibility)	Standardize CI/CD, basic templates
30-80 engineers	2-4 platform engineers	IDP foundation, golden paths, CI/CD
80-200 engineers	5-10 platform engineers	Full IDP, self-service infrastructure, developer portal
200+ engineers	10-20+ platform engineers	Multiple platform sub-teams (infra, CI/CD, developer experience)

Platform Engineering Anti-Patterns

Building before understanding — Build the platform for problems developers actually have, not problems you think they should have
Mandating adoption — If engineers do not voluntarily use your platform, it is not solving their problems
Over-engineering — Start with CLI tools and templates, not a full Kubernetes-based IDP
No product mindset — Treat the platform as a product. Run user research, measure adoption, iterate based on feedback
Ignoring existing tools — Do not rebuild what GitHub Actions, Argo CD, or Backstage already provide

Platform Engineering Checklist

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For DevOps maturity assessments, visit garnetgrid.com. :::