Secrets Management Done Right: Vault, SOPS, and the Practices That Actually Work
Production secrets management with HashiCorp Vault, Mozilla SOPS, and cloud-native solutions. Covers rotation, access control, emergency procedures, and the mistakes that cause breaches.
Every secret management strategy starts with good intentions and ends with a .env file committed to Git. If that sentence made your stomach drop, you are not alone — and you are in the right place.
This is not a theoretical guide to secrets management. This is the guide you read after your third developer accidentally pushes an API key to a public repository, after your “temporary” hardcoded database password has been in production for two years, and after you realize that your secrets rotation policy is a document nobody has ever read.
The Hierarchy of Secrets Management Maturity
Most teams are at Level 1. The goal is Level 4. Be honest about where you are.
Level 0: Secrets in code or .env files committed to Git
(You will get breached. It is a matter of when.)
Level 1: Secrets in environment variables, .env files in .gitignore
(Better. But no rotation, no audit trail, no access control.)
Level 2: Cloud-native secrets (AWS Secrets Manager, Azure Key Vault)
(Good. But often manually managed, rarely rotated.)
Level 3: Centralized secrets management (Vault, Doppler)
(Strong. Dynamic secrets, audit logging, access policies.)
Level 4: Zero-standing-secrets with just-in-time access
(Elite. Secrets generated on demand, expire automatically.)
| Level | Rotation | Audit Trail | Access Control | Encryption at Rest |
|---|---|---|---|---|
| 0 | ❌ Never | ❌ None | ❌ Anyone with repo access | ❌ No |
| 1 | ❌ Manual, rare | ❌ None | ⚠️ Server access = secret access | ⚠️ Depends on host |
| 2 | ⚠️ Manual or scheduled | ✅ Cloud audit logs | ✅ IAM policies | ✅ Yes |
| 3 | ✅ Automatic | ✅ Full audit trail | ✅ Fine-grained policies | ✅ Yes |
| 4 | ✅ Per-request generation | ✅ Complete lineage | ✅ Zero-standing access | ✅ Hardware-backed |
Solution Comparison: What Fits Your Team
HashiCorp Vault
The industry standard for centralized secrets management. Powerful, complex, and overkill for teams under 20 engineers — but transformative for larger organizations.
When to use Vault:
- You operate in multiple clouds
- You need dynamic secrets (database credentials generated on demand)
- You have compliance requirements that mandate audit trails
- Your team has a dedicated platform/security engineer to own it
When Vault is overkill:
- Single-cloud environment with < 20 services
- Team of < 10 engineers
- No compliance mandates requiring centralized audit
# Vault policy: API team can read production database creds
path "database/creds/api-production" {
capabilities = ["read"]
}
# Vault policy: Nobody can read the root token
path "auth/token/create" {
capabilities = ["deny"]
}
# Dynamic database credentials
resource "vault_database_secret_backend_role" "api_role" {
backend = vault_mount.postgres.path
name = "api-production"
db_name = vault_database_secret_backend_connection.postgres.name
creation_statements = [
"CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';",
"GRANT SELECT, INSERT, UPDATE ON ALL TABLES IN SCHEMA public TO \"{{name}}\";"
]
default_ttl = "1h"
max_ttl = "24h"
}
The magic here is default_ttl = "1h". Every database credential Vault issues expires after one hour. No more shared db_user_production password that has been the same since 2019.
Mozilla SOPS (Secrets OPerationS)
Encrypted files committed to Git. Sounds contradictory, but it solves a real problem: how do you version-control secrets alongside the code that uses them?
When to use SOPS:
- GitOps workflows where everything must be in a repository
- Small teams that want encrypted secrets without running infrastructure
- Kubernetes deployments with ArgoCD or Flux
# Encrypt a secrets file with AWS KMS
sops --encrypt \
--kms "arn:aws:kms:us-east-1:123456:key/abc-123" \
--encrypted-regex "^(password|api_key|token)$" \
secrets.yaml > secrets.enc.yaml
# The encrypted file is safe to commit
git add secrets.enc.yaml
git commit -m "Update API credentials"
# secrets.enc.yaml — committed to Git
database:
host: production-db.internal # Not encrypted (not sensitive)
port: 5432 # Not encrypted
password: ENC[AES256_GCM,data:abc...xyz,type:str] # Encrypted
username: ENC[AES256_GCM,data:def...uvw,type:str] # Encrypted
Cloud-Native Secrets
If you are single-cloud, these are often the right answer. Less operational burden than Vault, better than environment variables.
| Feature | AWS Secrets Manager | Azure Key Vault | GCP Secret Manager |
|---|---|---|---|
| Automatic rotation | ✅ Lambda-based | ✅ Function-based | ⚠️ Manual + Cloud Functions |
| Versioning | ✅ Automatic | ✅ Automatic | ✅ Automatic |
| Access control | IAM policies | RBAC + Access Policies | IAM policies |
| Cross-region replication | ✅ Built-in | ⚠️ Manual | ⚠️ Manual |
| Cost | $0.40/secret/month | $0.03/operation | $0.06/10K operations |
| KMS integration | ✅ Native | ✅ Native | ✅ Native |
# AWS Secrets Manager — Python SDK
import boto3
import json
def get_database_credentials():
client = boto3.client('secretsmanager', region_name='us-east-1')
response = client.get_secret_value(SecretId='production/database')
secret = json.loads(response['SecretString'])
return {
'host': secret['host'],
'port': secret['port'],
'username': secret['username'],
'password': secret['password'],
}
# Never cache secrets indefinitely — refresh every 5 minutes
# This ensures rotated credentials are picked up
Secrets Rotation: The Practice Nobody Actually Does
Here is the uncomfortable truth: most teams that claim to rotate secrets do not. They have a policy document that says “secrets must be rotated every 90 days” and a reality where database passwords have not changed since the cluster was created.
Automated Rotation Architecture
┌───────────────┐ ┌──────────────┐ ┌──────────────┐
│ Rotation │────▶│ Secrets │────▶│ Application │
│ Trigger │ │ Store │ │ (fetches │
│ (cron/event) │ │ (Vault/ASM) │ │ latest) │
└───────────────┘ └──────┬───────┘ └──────────────┘
│
┌───────▼───────┐
│ Target │
│ (DB/API/etc) │
│ Password │
│ Updated │
└───────────────┘
The Dual-User Rotation Pattern
The safest rotation pattern uses two credential sets:
Time T0: Active = UserA, Standby = UserB
Time T1: Rotate UserB password → UserB has new creds
Time T2: Switch Active to UserB, Standby to UserA
Time T3: Rotate UserA password → UserA has new creds
Time T4: Both users have fresh credentials, zero downtime
This avoids the terrifying moment where you change a password and discover 47 services were using it.
Emergency Procedures
When a Secret Is Leaked
This will happen. Not if — when. Here is your runbook:
INCIDENT: Secret Exposed
SEVERITY: P1 — Immediate response required
1. REVOKE (0-5 minutes)
- Immediately revoke/rotate the exposed credential
- Do NOT wait to assess impact first
- A false alarm costs you a deployment
- A real leak costs you a breach
2. ASSESS (5-30 minutes)
- When was the secret exposed? (Git blame, access logs)
- What systems does this secret access?
- Is there evidence of unauthorized access? (audit logs)
3. ROTATE ADJACENT (30-60 minutes)
- Rotate any secrets that were stored alongside the leaked one
- If a .env file leaked, ALL secrets in that file are compromised
4. POST-INCIDENT (24-48 hours)
- Root cause: How did the secret get exposed?
- Prevention: What control would have caught this?
- Detection: How long was the secret exposed before discovery?
Implementation Checklist
- Assess your current secrets management maturity level (be honest)
- Audit all existing secrets: where are they stored? who has access?
- Choose a secrets management solution appropriate for your team size
- Implement automated rotation for database credentials first (highest risk)
- Set up pre-commit hooks to prevent secrets from reaching Git (
gitleaks,trufflehog) - Build an emergency rotation runbook and practice it quarterly
- Enable audit logging on all secrets access
- Eliminate all hardcoded secrets — zero exceptions, zero “temporary” workarounds