GitOps in Practice: Managing Infrastructure Through Pull Requests
Implement GitOps for infrastructure and application deployment. Covers ArgoCD vs Flux, repository structures, secret management in Git, drift detection, and the operational model that makes GitOps work beyond the demo.
GitOps is the practice of using Git as the single source of truth for your infrastructure and application state. Instead of running kubectl apply from your laptop or clicking buttons in a cloud console, you commit a change to a Git repository and an automated system makes reality match the desired state.
The idea is elegant. The implementation has sharp edges that the tutorials skip: secret management, multi-environment promotion, drift detection, and what happens when the automated system makes things worse.
The GitOps Operating Model
Traditional deployment:
Developer → CI → Build → Developer runs deploy command → Production
GitOps deployment:
Developer → PR to config repo → Review → Merge
↓
GitOps operator detects change
↓
Operator applies change to cluster
↓
Operator reports status back to Git
Pull vs Push Model
| Model | How It Works | Security | Complexity |
|---|---|---|---|
| Push (CI deploys) | CI pipeline pushes changes to cluster | CI needs cluster credentials | Lower |
| Pull (GitOps operator) | Operator inside cluster polls Git for changes | No external access to cluster needed | Higher |
GitOps uses the pull model. The operator (ArgoCD, Flux) runs inside the cluster and watches a Git repository. When the repo changes, the operator applies the changes. This means your CI pipeline never needs cluster credentials — a significant security improvement.
ArgoCD vs Flux: The Honest Comparison
| Factor | ArgoCD | Flux |
|---|---|---|
| UI | Rich web UI with visualization | CLI-focused, no built-in UI |
| Multi-cluster | Built-in support | Requires additional setup |
| Learning curve | Moderate (UI helps) | Steeper (CLI/CRD-heavy) |
| Helm support | Full, with UI rendering | Full, native integration |
| Kustomize | Full support | Full, first-class support |
| RBAC | Built-in, fine-grained | Kubernetes-native RBAC |
| Notifications | Built-in (Slack, email, webhook) | Notification controller (separate install) |
| Community | Larger, CNCF graduated | Active, CNCF graduated |
| Best for | Teams wanting visibility | Teams wanting lightweight GitOps |
Recommendation: Start with ArgoCD if you are new to GitOps. The UI makes understanding what is happening dramatically easier. Switch to Flux if you have strong Kubernetes expertise and prefer infrastructure-as-code over UIs.
Repository Structure
Monorepo vs Polyrepo
| Structure | When to Use | Example |
|---|---|---|
| App repo + config repo | Most teams. Separates app code from deployment config. | my-app/ + my-app-deploy/ |
| Monorepo | Small teams, few services. Everything in one place. | platform/apps/my-app/deploy/ |
| Config-only repo | Large orgs. Centralized deployment configs for all services. | k8s-manifests/services/my-app/ |
Recommended: App Repo + Config Repo
# App repo: source code + CI
my-app/
├── src/
├── tests/
├── Dockerfile
└── .github/workflows/ci.yaml # Build + test + push image
# Config repo: deployment manifests
my-app-deploy/
├── base/
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── configmap.yaml
│ └── kustomization.yaml
├── overlays/
│ ├── staging/
│ │ ├── kustomization.yaml # image: my-app:staging-abc123
│ │ └── replicas-patch.yaml # replicas: 2
│ └── production/
│ ├── kustomization.yaml # image: my-app:v1.2.3
│ ├── replicas-patch.yaml # replicas: 5
│ └── hpa-patch.yaml # autoscaling config
└── README.md
Environment Promotion Flow
1. CI builds image, tags as staging-<commit-sha>
2. CI updates staging overlay: image tag → staging-<commit-sha>
3. ArgoCD syncs staging cluster with staging overlay
4. Staging validated (automated tests, manual QA)
5. PR opened: promote image tag to production overlay
6. PR reviewed and merged
7. ArgoCD syncs production cluster with production overlay
Secrets in GitOps: The Hard Problem
The biggest challenge with GitOps: you cannot store secrets in Git. But if Git is the source of truth for everything, where do secrets go?
| Solution | How It Works | Complexity | Security |
|---|---|---|---|
| Sealed Secrets | Encrypt secrets with a cluster key; only the cluster can decrypt | Low | Good |
| SOPS + Age/KMS | Encrypt secret files in Git; decrypt at apply time | Medium | Very good |
| External Secrets Operator | Sync secrets from Vault/AWS SM into Kubernetes | Medium | Best — secrets never in Git |
| Vault Agent Sidecar | Inject secrets at pod runtime from Vault | High | Best — secrets never on disk |
External Secrets Operator (Recommended)
# ExternalSecret CRD: references a secret in AWS Secrets Manager
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: api-credentials
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: api-credentials # Kubernetes Secret name
creationPolicy: Owner
data:
- secretKey: DATABASE_URL
remoteRef:
key: production/database # AWS Secrets Manager key
property: url # Field within the secret
- secretKey: API_KEY
remoteRef:
key: production/api
property: key
Drift Detection and Self-Healing
GitOps operators continuously compare the desired state (Git) with the actual state (cluster). When they drift, the operator can either alert or auto-correct.
# ArgoCD Application with auto-sync and self-heal
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app-production
spec:
project: default
source:
repoURL: https://github.com/company/my-app-deploy
targetRevision: main
path: overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Revert manual changes to match Git
syncOptions:
- CreateNamespace=true
retry:
limit: 3
backoff:
duration: 5s
maxDuration: 3m
| Setting | What It Does | Risk |
|---|---|---|
selfHeal: true | Reverts any manual change to match Git | Can undo emergency hotfixes applied with kubectl |
prune: true | Deletes resources removed from Git | Can accidentally delete resources if Git is wrong |
| Both disabled | Alert only, no auto-correction | Drift accumulates silently |
Production recommendation: Enable
selfHealandprunefor non-critical environments. For production, enableselfHealbut be cautious withprune— a misconfigured Git merge could delete production resources.
Implementation Checklist
- Choose ArgoCD or Flux (ArgoCD for visibility, Flux for lightweight)
- Set up a separate config repository for deployment manifests
- Structure configs with base + overlays per environment (Kustomize or Helm)
- Implement environment promotion: CI updates staging image, PR promotes to production
- Solve secrets: deploy External Secrets Operator or Sealed Secrets
- Enable drift detection and self-healing for staging; alert-only for production initially
- Configure notifications: sync failures → Slack/PagerDuty
- Document the promotion flow so every engineer knows how to deploy
- Set up RBAC: limit who can merge to the production overlay
- Run a drill: manually change a production resource and verify self-heal reverts it