Database Vacuum and Maintenance
Keep PostgreSQL and other databases healthy with proper maintenance. Covers autovacuum tuning, bloat detection, index maintenance, statistics collection, and the patterns that prevent database performance degradation over time.
Databases degrade without maintenance. In PostgreSQL, dead rows accumulate from updates and deletes, table bloat grows, statistics become stale, and indexes fragment. VACUUM is PostgreSQL’s garbage collector — and if it falls behind, query performance degrades, storage costs grow, and eventually transaction ID wraparound threatens data loss.
Why VACUUM Matters
PostgreSQL uses MVCC (Multi-Version Concurrency Control):
UPDATE users SET name = 'Bob' WHERE id = 1;
What actually happens:
1. Old row (name='Alice') is marked as dead (not deleted)
2. New row (name='Bob') is inserted
3. Old row remains until VACUUM cleans it up
Without VACUUM:
☐ Dead rows accumulate → table bloat
☐ Index entries point to dead rows → slower index scans
☐ Planner statistics stale → suboptimal query plans
☐ Transaction ID wraparound → database shutdown!
Table: 1 million live rows, 10 million dead rows
Result: 10x storage, 10x I/O for sequential scans
Autovacuum Tuning
-- Check autovacuum activity
SELECT schemaname, relname,
n_dead_tup, n_live_tup,
round(n_dead_tup::numeric / greatest(n_live_tup, 1) * 100, 1) AS dead_pct,
last_vacuum, last_autovacuum,
last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
-- Per-table autovacuum tuning for high-churn tables
ALTER TABLE events SET (
autovacuum_vacuum_threshold = 1000, -- Default: 50
autovacuum_vacuum_scale_factor = 0.01, -- Default: 0.2 (20%)
autovacuum_analyze_threshold = 500,
autovacuum_analyze_scale_factor = 0.005,
autovacuum_vacuum_cost_delay = 2 -- Default: 20ms (make faster)
);
-- When autovacuum triggers:
-- threshold + scale_factor × n_live_tup
-- Default: 50 + 0.2 × 1,000,000 = 200,050 dead rows
-- Tuned: 1000 + 0.01 × 1,000,000 = 11,000 dead rows (runs 18x sooner)
Bloat Detection
-- Detect table bloat (estimated)
SELECT
schemaname || '.' || tablename AS table_name,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(
pg_total_relation_size(schemaname || '.' || tablename) -
pg_relation_size(schemaname || '.' || tablename)
) AS index_size,
round(
(1 - pg_relation_size(schemaname || '.' || tablename)::numeric /
greatest(pg_total_relation_size(schemaname || '.' || tablename), 1)) * 100,
1
) AS bloat_estimate_pct
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 20;
-- Fix severe bloat: pg_repack (online, no locks)
-- pg_repack --table events --no-superuser-check -d mydb
-- Alternative: VACUUM FULL (locks table, use off-peak)
-- VACUUM FULL events;
Anti-Patterns
| Anti-Pattern | Consequence | Fix |
|---|---|---|
| Disable autovacuum | Dead rows accumulate, eventual wraparound | Never disable, tune instead |
| Default settings on high-churn tables | Autovacuum cannot keep up | Per-table autovacuum tuning |
| Never check bloat | Tables grow 10x actual size | Monthly bloat monitoring + pg_repack |
| VACUUM FULL during peak hours | Table locked, queries blocked | Use pg_repack (online) or off-peak VACUUM FULL |
| Ignore ANALYZE | Query planner uses stale statistics | Ensure autoanalyze runs, run ANALYZE after bulk loads |
Database maintenance is invisible when done well and catastrophic when neglected. A healthy autovacuum configuration is the single most important performance tuning for PostgreSQL — more impactful than any index or query optimization.