Self-Service Infrastructure: Empowering Developers Without Losing Control
Build self-service infrastructure provisioning that gives developers speed while maintaining security, compliance, and cost controls. Covers Terraform modules, Crossplane, guardrails, approval workflows, and the balance between autonomy and governance.
The platform team’s job is to get out of the developer’s way — without getting out of the way of security, compliance, and cost control. Self-service infrastructure achieves this by providing pre-approved, guardrailed pathways for common provisioning tasks. Developers get what they need in minutes instead of days. Platform teams maintain control without becoming a ticket-processing bottleneck.
The Self-Service Spectrum
Not everything should be self-service. The spectrum looks like this:
Full Self-Service (no approval needed):
- Dev/staging environments
- Feature branch databases
- S3 buckets for development
- CI/CD pipeline modifications
Guardrailed Self-Service (automated guardrails, no human approval):
- Production microservice deployment
- Database creation within size limits
- DNS record creation
- Load balancer configuration
Assisted Self-Service (requires approval):
- Production database schema changes
- Cross-account network peering
- New AWS account creation
- Resources exceeding cost thresholds
Manual Request (platform team executes):
- Multi-region failover setup
- Security group modifications
- Compliance-sensitive infrastructure
- Vendor integrations
The goal over time is to move items up this spectrum — converting manual requests into assisted, assisted into guardrailed, guardrailed into full self-service.
Implementation Patterns
Terraform Module Catalog
Publish curated, hardened Terraform modules that encode best practices:
# Developer writes this
module "api_service" {
source = "registry.internal/platform/api-service/aws"
version = "3.2.0"
name = "checkout-api"
team = "commerce"
environment = "production"
# The module handles:
# - ECS Fargate service with proper IAM roles
# - ALB with TLS termination
# - CloudWatch log group with retention policy
# - Auto-scaling configuration
# - Security groups with least-privilege
# - Tags for cost allocation
# - Monitoring and alerting
}
The developer specifies what they want. The module specifies how it is built — including all the security, monitoring, and compliance details they would otherwise forget.
Crossplane (Kubernetes-Native)
Crossplane lets developers provision infrastructure using Kubernetes resources:
apiVersion: database.platform.example.com/v1alpha1
kind: PostgresDatabase
metadata:
name: checkout-db
namespace: commerce
spec:
size: small # Predefined sizes: small, medium, large
version: "15"
backups: daily
environment: staging
The platform team defines CompositeResourceDefinitions that map these simple specs to complex cloud resources with all required configurations.
Internal Developer Portal
A web UI that wraps the Terraform/Crossplane backend:
┌─────────────────────────────────────┐
│ Create New Service │
│ │
│ Service Name: [checkout-api ] │
│ Team: [Commerce ▼] │
│ Environment: [●Staging ○Production]│
│ Database: [☑ PostgreSQL] │
│ Cache: [☑ Redis] │
│ Queue: [☐ SQS] │
│ │
│ Estimated Cost: $142/month │
│ │
│ [Create Service] │
└─────────────────────────────────────┘
Guardrails
Self-service without guardrails is a cost overrun and security incident waiting to happen.
Cost Guardrails
# Policy: No single resource can cost more than $500/month without VP approval
def check_cost_policy(resource):
estimated_monthly = calculate_cost(resource)
if estimated_monthly > 5000:
return deny("Resources over $5000/month require VP approval")
elif estimated_monthly > 500:
return require_approval("manager", f"Estimated cost: ${estimated_monthly}/month")
else:
return allow()
Security Guardrails
# OPA policy: Databases must have encryption at rest
deny[msg] {
input.resource_type == "aws_db_instance"
not input.storage_encrypted
msg := "Database must have encryption at rest enabled"
}
# S3 buckets must not be public
deny[msg] {
input.resource_type == "aws_s3_bucket"
input.acl == "public-read"
msg := "S3 buckets must not be publicly accessible"
}
Resource Limits
# Per-team resource quotas
team: commerce
quotas:
staging:
max_instances: 10
max_databases: 5
max_monthly_spend: $2000
production:
max_instances: 20
max_databases: 5
max_monthly_spend: $10000
Environment Lifecycle
Self-service environments need automated lifecycle management:
Create: Developer creates environment via portal
Active: Environment running, costs accumulating
Warning: 14 days inactive → email owner
Hibernate: 21 days inactive → stop instances, keep data
Terminate: 30 days inactive → destroy everything
This prevents the “600 forgotten dev environments” problem.
Measuring Success
| Metric | Target | Why |
|---|---|---|
| Time to first deployment | < 30 minutes | From deciding to build a service to having it running |
| Platform team ticket volume | Decreasing monthly | Self-service should reduce, not increase, requests |
| Developer satisfaction score | > 4.0/5.0 | Survey developers quarterly |
| Cost per developer environment | < $X/month | Self-service should not mean unlimited spending |
| Security policy violations | 0 | Guardrails catch everything, not post-hoc audits |
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Self-service without guardrails | Cost overruns, security gaps | Policy-as-code enforcement |
| Too many options | Decision paralysis, inconsistency | Opinionated defaults, limited choices |
| No cleanup automation | Zombie environments accumulate | Automatic lifecycle policies |
| Platform team still required | Self-service in name only | Invest in automation, not ticket automation |
| Ignoring developer feedback | Low adoption, shadow IT | Quarterly developer surveys, usage analytics |
Self-service infrastructure is not about giving developers root access to AWS. It is about encoding your organization’s best practices — security, cost, reliability — into reusable, discoverable, guardrailed components that developers can use without waiting for anyone.