Internal Developer Platforms: Building the Self-Service Layer Your Engineers Actually Want
Design and build an Internal Developer Platform (IDP) that eliminates toil. Covers golden paths, self-service infrastructure, developer portals, and the organizational patterns that make platforms succeed.
Platform engineering exists because DevOps made a promise it could not keep: that every developer would become an infrastructure expert. They will not. They should not have to. Your frontend engineer should not need to understand Kubernetes networking to deploy a React application, and your data scientist should not need to write Terraform to spin up a GPU instance.
An Internal Developer Platform (IDP) is the self-service layer that abstracts infrastructure complexity behind simple, opinionated interfaces. Done right, it gives developers autonomy without chaos. Done wrong, it becomes another layer of bureaucracy that nobody uses, built by a team that nobody asked for.
This guide covers how to build one that people actually use.
The Platform Maturity Model
Most organizations think they need a platform. Most organizations are wrong. Here is how to know:
Level 0: Wild West
Every team provisions their own infrastructure.
10 teams = 10 different ways to deploy.
"Works on my machine" is a deployment strategy.
Level 1: Shared Scripts
A platform team maintains shared CI/CD pipelines.
Teams can deploy, but configuration is tribal knowledge.
The "platform" is a Confluence page nobody reads.
Level 2: Golden Paths
Opinionated templates for common workloads.
Self-service for 80% of use cases.
Escape hatches for the other 20%.
Level 3: Full IDP
Developer portal with service catalog.
Self-service databases, queues, storage.
Infrastructure as product with SLOs.
Level 4: Platform as Competitive Advantage
Platform enables capabilities competitors cannot match.
New services launch in hours, not weeks.
Developer satisfaction is a tracked metric.
| Level | Team Size | Infra Engineers | Developer Wait Time |
|---|---|---|---|
| 0 | Any | 0 (everyone does it) | Variable (minutes to weeks) |
| 1 | 20-50 | 1-3 | Days |
| 2 | 50-200 | 3-8 | Hours |
| 3 | 200-1000 | 8-20 | Minutes |
| 4 | 1000+ | 20+ (but ratio improves) | Self-service |
The most important insight: Start at Level 2. Do not try to build a Level 3 platform with a Level 0 organization. The golden path approach delivers 80% of the value with 20% of the effort.
Golden Paths: The Heart of Platform Engineering
A golden path is an opinionated, pre-configured way to accomplish a common task. It is not a mandate — developers can go off-path — but the golden path should be so good that going off-path feels like unnecessary work.
Example Golden Path: Deploy a New Microservice
# service-template.yaml — everything a developer needs to provide
apiVersion: platform.garnet.io/v1
kind: ServiceTemplate
metadata:
name: my-new-service
spec:
# Developer provides these 4 fields. Platform handles everything else.
name: payment-validator
team: payments
language: python
tier: critical # critical | standard | experimental
# Everything below is set by the template defaults:
# ✅ Kubernetes deployment + service + ingress
# ✅ CI/CD pipeline (build, test, scan, deploy)
# ✅ Monitoring dashboards (RED metrics)
# ✅ Alerting rules (error rate, latency, saturation)
# ✅ Log aggregation pipeline
# ✅ mTLS certificates
# ✅ Resource limits and autoscaling
# ✅ Network policies
What happens when a developer submits this template:
Developer runs: platform create service --from template.yaml
Platform does:
1. Creates GitHub repository from language-specific scaffold
2. Provisions Kubernetes namespace with resource quotas
3. Generates CI/CD pipeline (GitHub Actions / GitLab CI)
4. Creates Datadog/Grafana dashboards
5. Configures PagerDuty escalation based on tier
6. Registers service in service catalog (Backstage)
7. Generates initial README with runbook template
8. Outputs: "Your service is running at https://payment-validator.internal"
Total time: < 5 minutes
Previous time: 2-3 days of tickets and Slack messages
What Makes a Golden Path Successful
| Property | Good Golden Path | Bad Golden Path |
|---|---|---|
| Opinionated | Makes decisions for you | Exposes 100 config options |
| Escape hatches | Allows overrides when needed | Locks you in completely |
| Documentation | Self-documenting via templates | Requires reading a wiki |
| Maintenance | Updated by platform team | Abandoned after initial launch |
| Feedback loop | Users report friction, team iterates | ”We built it, they will come” |
Developer Portal: The Service Catalog
A developer portal (typically built with Backstage or similar) is the single pane of glass for your entire engineering organization.
What Belongs in the Portal
┌──────────────────────────────────────────────────────────┐
│ DEVELOPER PORTAL │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌────────────────┐ │
│ │ Service │ │ API │ │ Documentation │ │
│ │ Catalog │ │ Registry │ │ Hub │ │
│ │ │ │ │ │ │ │
│ │ - Owner │ │ - Endpoints │ │ - Runbooks │ │
│ │ - SLOs │ │ - Schemas │ │ - ADRs │ │
│ │ - Deps │ │ - Versions │ │ - Tutorials │ │
│ │ - Health │ │ - Auth │ │ - On-call │ │
│ └─────────────┘ └─────────────┘ └────────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌────────────────┐ │
│ │ Templates │ │ Scorecards │ │ Cost Dashboard │ │
│ │ (Golden │ │ (Production │ │ (Per-team │ │
│ │ Paths) │ │ Readiness) │ │ cloud spend) │ │
│ └─────────────┘ └─────────────┘ └────────────────┘ │
│ │
└──────────────────────────────────────────────────────────┘
Backstage Catalog Descriptor
# catalog-info.yaml — lives in every service repo
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payment-validator
description: Validates payment transactions against fraud rules
annotations:
github.com/project-slug: garnet/payment-validator
pagerduty.com/service-id: P123ABC
grafana/dashboard-selector: "app=payment-validator"
tags:
- python
- payments
- critical
links:
- url: https://grafana.internal/d/payment-validator
title: Grafana Dashboard
- url: https://runbooks.internal/payment-validator
title: Runbook
spec:
type: service
lifecycle: production
owner: team-payments
system: payment-processing
dependsOn:
- component:fraud-engine
- resource:postgres-payments
providesApis:
- payment-validation-api
Self-Service Infrastructure
The most impactful self-service capabilities, in order of developer demand:
Tier 1: Build These First
| Capability | Developer Experience | Platform Implementation |
|---|---|---|
| Deploy a service | platform deploy | K8s + Helm + ArgoCD |
| Create a database | platform db create --type postgres | Terraform + Cloud SQL/RDS |
| Add a secret | platform secret set KEY=value | Vault / AWS Secrets Manager |
| View logs | Portal link → Grafana/Loki | Centralized log pipeline |
| Check service health | Portal → dashboard | Prometheus + Grafana |
Tier 2: Build These Next
| Capability | Developer Experience | Platform Implementation |
|---|---|---|
| Create a message queue | platform queue create --type kafka | Terraform + managed Kafka |
| Spin up preview environments | Automatic on PR | Namespace-per-PR + teardown |
| Run load tests | platform loadtest --rps 1000 | k6/Locust + shared infra |
| Create a cron job | platform cron create --schedule "0 * * *" | K8s CronJob template |
Measuring Platform Success
A platform that nobody measures is a platform that nobody improves.
Key Metrics
| Metric | What It Measures | Target |
|---|---|---|
| Time to first deploy | How long from “I need a new service” to running in production | < 1 hour |
| Deployment frequency | How often teams deploy per day | > 1/day |
| Golden path adoption | % of services using platform templates | > 80% |
| Developer satisfaction | Quarterly survey (NPS for platform) | > 40 NPS |
| Ticket volume | Requests to platform team per week | Decreasing trend |
| Self-service ratio | % of infra requests handled without platform team | > 70% |
The Anti-Metrics (What to Watch For)
| Anti-Pattern | Signal | Root Cause |
|---|---|---|
| Shadow IT | Teams building their own deployment pipelines | Golden paths do not cover their use case |
| Platform avoidance | Teams requesting exceptions to skip the platform | Platform adds friction instead of removing it |
| Ticket queue growth | Platform team becomes a bottleneck | Not enough self-service automation |
| Feature creep | Platform supports every edge case | No discipline about what is in scope |
Implementation Checklist
- Survey developers: What are the top 5 things that waste your time? (Build for those first)
- Define 3-5 golden paths for your most common workloads
- Build a service template that deploys a working service in < 5 minutes
- Set up a service catalog (Backstage or equivalent) and register all existing services
- Implement self-service for databases and secrets (highest demand)
- Establish platform SLOs: “95% of self-service requests complete in < 10 minutes”
- Run quarterly developer satisfaction surveys and publish results
- Resist the urge to build for edge cases — 80/20 rule applies