Platform Engineering: Building Your Internal Developer Platform
Build an IDP that accelerates developer productivity. Covers golden paths, service catalogs, self-service infrastructure, and platform team structure.
Platform engineering reduces cognitive load on developers by abstracting infrastructure complexity behind self-service interfaces. Instead of filing tickets and waiting days for a database, developers run a CLI command and get a fully provisioned service in minutes — complete with CI/CD, monitoring, and DNS.
A 100-developer organization with a good platform team ships 30-50% faster than one without. The key insight is that platform engineering is not about building infrastructure — it’s about building products for developers. Treat your fellow engineers as customers, measure their satisfaction, and iterate like a product team.
The Internal Developer Platform (IDP)
An IDP is the set of tools, workflows, and self-service capabilities that abstract away infrastructure complexity.
Developers Internal Developer Platform
┌──────────────────┐ ┌──────────────────────────────────┐
│ Self-Service UI │───▶│ Service Catalog (Backstage/Port) │
│ CLI, or PR │ │ │
└──────────────────┘ │ ┌──────────────────────────────┐ │
│ │ Golden Paths (Templates) │ │
│ │ - Node.js microservice │ │
│ │ - Python data pipeline │ │
│ │ - React frontend │ │
│ └──────────────────────────────┘ │
│ ┌──────────────────────────────┐ │
│ │ Infrastructure Orchestration │ │
│ │ (Terraform, Crossplane, K8s) │ │
│ └──────────────────────────────┘ │
│ ┌──────────────────────────────┐ │
│ │ Observability (Built-in) │ │
│ │ Traces, Metrics, Logs, Alerts│ │
│ └──────────────────────────────┘ │
└──────────────────────────────────┘
IDP vs Traditional Ops
| Dimension | Traditional Ops (Ticket-Based) | Platform Engineering (Self-Service) |
|---|---|---|
| New service deployment | 2-5 days (ticket → approval → manual setup) | 15-30 minutes (self-service) |
| Database provisioning | 1-3 days | 5 minutes |
| Environment creation | Days | Minutes |
| Monitoring setup | Manual per service | Automatic with golden paths |
| Developer satisfaction | Low (waiting, blocked) | High (autonomous, fast) |
Step 1: Define Golden Paths
Golden paths are opinionated, pre-paved roads for common developer tasks. They encode your organization’s best practices into templates that produce production-ready services out of the box.
What a Golden Path Includes
A golden path template should provision everything a developer needs to go from zero to production:
✅ Source code repository (from skeleton template)
✅ CI/CD pipeline (GitHub Actions / Azure Pipelines)
✅ Kubernetes namespace with RBAC
✅ Database (if requested)
✅ Monitoring dashboard (Grafana / Datadog)
✅ Alerting rules (error rate, latency, CPU)
✅ DNS entry (service.internal.company.com)
✅ TLS certificate (auto-renewed)
✅ Service catalog entry (Backstage)
✅ Documentation template (README, ADR template)
Service Template (Backstage Scaffolder)
# backstage/templates/node-microservice/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: node-microservice
title: Node.js Microservice
description: Production-ready Node.js service with observability, CI/CD, and database
spec:
owner: platform-team
type: service
parameters:
- title: Service Details
required: [name, description, owner]
properties:
name:
type: string
description: Service name (lowercase, hyphens)
pattern: '^[a-z][a-z0-9-]*$'
description:
type: string
owner:
type: string
ui:field: OwnerPicker
- title: Infrastructure
properties:
database:
type: string
enum: [postgres, none]
default: postgres
cache:
type: string
enum: [redis, none]
default: none
tier:
type: string
enum: [standard, high-availability]
default: standard
steps:
- id: scaffold
name: Generate Code
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
description: ${{ parameters.description }}
database: ${{ parameters.database }}
- id: publish
name: Create Repository
action: publish:github
input:
repoUrl: github.com?owner=yourorg&repo=${{ parameters.name }}
defaultBranch: main
- id: deploy
name: Provision Infrastructure
action: custom:terraform-apply
input:
template: microservice
vars:
name: ${{ parameters.name }}
database: ${{ parameters.database }}
cache: ${{ parameters.cache }}
tier: ${{ parameters.tier }}
- id: register
name: Register in Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
Golden Path Catalog
| Template | Target Audience | Provisions |
|---|---|---|
| Node.js Microservice | Backend developers | Repo, CI/CD, K8s, PostgreSQL, Grafana |
| Python Data Pipeline | Data engineers | Repo, Airflow DAG, S3 bucket, monitoring |
| React Frontend | Frontend developers | Repo, CI/CD, CDN, feature flags |
| Event Consumer | Backend developers | Repo, Kafka consumer group, DLQ, alerts |
| Batch Job | All | Repo, CronJob, logging, error alerting |
Step 2: Self-Service Infrastructure
The developer experience should feel like a consumer product — simple, fast, and predictable.
# Developer experience — what the CLI looks like
$ platform create service --name order-api --database postgres --cache redis
Creating service 'order-api'...
✅ GitHub repo created: github.com/org/order-api
✅ CI/CD pipeline configured (GitHub Actions)
✅ Kubernetes namespace: order-api-dev
✅ PostgreSQL database provisioned (order-api-db)
✅ Redis cache provisioned (order-api-redis)
✅ Grafana dashboard: https://grafana.internal/d/order-api
✅ PagerDuty service created with default alert rules
✅ DNS: order-api.internal.company.com
✅ Catalog entry: https://backstage.internal/catalog/order-api
Service ready! Run `git clone` and start coding.
Total time: 4 minutes
Self-Service Capabilities to Prioritize
Build self-service capabilities in this order (highest developer pain points first):
| Priority | Capability | Developer Pain Eliminated |
|---|---|---|
| 1 | New service creation | Multi-day ticket process → minutes |
| 2 | Development environments | ”It works on my machine” → consistent parity |
| 3 | Database provisioning | DBA bottleneck → self-service with guardrails |
| 4 | Secret management | Manual secret distribution → Vault integration |
| 5 | Feature flags | Full deploys for toggles → instant flag flips |
| 6 | Temporary elevated access | Permanent broad permissions → JIT access |
Step 3: Platform Team Structure
| Role | Responsibility | Count (per 100 devs) |
|---|---|---|
| Platform Lead | Strategy, roadmap, stakeholder management, budget | 1 |
| Platform Engineer | IDP development, tooling, automation, IaC | 2-3 |
| SRE | Reliability, incident response, observability | 1-2 |
| DevX Engineer | Developer experience, documentation, onboarding | 1 |
| Total | 5-7 (5-7% of engineering) |
Team Principles
- Treat developers as customers — Run quarterly satisfaction surveys, track NPS, hold office hours
- Build products, not projects — The platform is a long-lived product with a roadmap, not a series of one-off scripts
- Paved roads, not walls — Guide developers with golden paths, but allow departures when justified. Don’t block; make the right choice the easy choice
- Measure everything — Time to first deploy, developer satisfaction, incident MTTR, platform adoption rate
- Start small, iterate — Launch with 1-2 golden paths. Expand based on demand, not speculation
- Documentation is product — If developers can’t find how to use it, it doesn’t exist
Platform Maturity Levels
| Level | Capabilities | Time to Build |
|---|---|---|
| Level 1: Foundation | CI/CD templates, basic monitoring, Terraform modules | 1-3 months |
| Level 2: Self-Service | Service templates (Backstage), self-service DBs, secrets management | 3-6 months |
| Level 3: Product | Developer portal, cost visibility, compliance automation, SLO tracking | 6-12 months |
| Level 4: Autonomous | AI-assisted operations, automated remediation, predictive scaling | 12+ months |
Step 4: Measure Platform Success
| Metric | Target | How to Measure | Why It Matters |
|---|---|---|---|
| Time to first deploy (new service) | < 30 minutes | Track from platform create to first prod deploy | Measures platform’s core value proposition |
| Developer satisfaction (NPS) | > 30 | Quarterly survey (1-10 scale) | Leading indicator of adoption |
| Platform adoption | > 80% of services | Catalog count vs actual services | Low adoption = platform not solving real problems |
| Incident MTTR | < 1 hour | Incident tracking system | Built-in observability reduces MTTR |
| Change failure rate | < 5% | Deploys that cause incidents / total deploys | Golden paths should produce reliable services |
| Deployment frequency | Daily per team | CI/CD metrics | Friction removal enables faster shipping |
| Self-service ratio | > 90% of provisioning | Self-service events / (self-service + tickets) | Measures ticket elimination |
| Cost per service | Trending down | Infrastructure cost / number of services | Platform should enable efficiency |
Platform Team Sizing Guide
| Engineering Org Size | Platform Team Size | Focus Areas |
|---|---|---|
| 10-30 engineers | 0 (shared responsibility) | Standardize CI/CD, basic templates |
| 30-80 engineers | 2-4 platform engineers | IDP foundation, golden paths, CI/CD |
| 80-200 engineers | 5-10 platform engineers | Full IDP, self-service infrastructure, developer portal |
| 200+ engineers | 10-20+ platform engineers | Multiple platform sub-teams (infra, CI/CD, developer experience) |
Platform Engineering Anti-Patterns
- Building before understanding — Build the platform for problems developers actually have, not problems you think they should have
- Mandating adoption — If engineers do not voluntarily use your platform, it is not solving their problems
- Over-engineering — Start with CLI tools and templates, not a full Kubernetes-based IDP
- No product mindset — Treat the platform as a product. Run user research, measure adoption, iterate based on feedback
- Ignoring existing tools — Do not rebuild what GitHub Actions, Argo CD, or Backstage already provide
Platform Engineering Checklist
- Service catalog deployed (Backstage, Port, Cortex, or custom)
- Golden path templates for top 3 service types (backend, frontend, data)
- Self-service database provisioning with guardrails
- Self-service environment creation (dev, staging, production)
- Built-in observability for all golden path services (traces, metrics, logs)
- Automated CI/CD pipelines generated from templates
- Developer documentation site with search and API reference
- Platform NPS tracked quarterly with action items
- Time-to-first-deploy measured and < 30 minutes
- Platform team staffed at 5-7% of engineering headcount
- Cost visibility per service/team implemented
- Office hours and support channel active for developer questions
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For DevOps maturity assessments, visit garnetgrid.com. :::