ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Database Vacuum and Maintenance

Keep PostgreSQL and other databases healthy with proper maintenance. Covers autovacuum tuning, bloat detection, index maintenance, statistics collection, and the patterns that prevent database performance degradation over time.

Databases degrade without maintenance. In PostgreSQL, dead rows accumulate from updates and deletes, table bloat grows, statistics become stale, and indexes fragment. VACUUM is PostgreSQL’s garbage collector — and if it falls behind, query performance degrades, storage costs grow, and eventually transaction ID wraparound threatens data loss.


Why VACUUM Matters

PostgreSQL uses MVCC (Multi-Version Concurrency Control):
  UPDATE users SET name = 'Bob' WHERE id = 1;
  
  What actually happens:
  1. Old row (name='Alice') is marked as dead (not deleted)
  2. New row (name='Bob') is inserted
  3. Old row remains until VACUUM cleans it up

Without VACUUM:
  ☐ Dead rows accumulate → table bloat
  ☐ Index entries point to dead rows → slower index scans
  ☐ Planner statistics stale → suboptimal query plans
  ☐ Transaction ID wraparound → database shutdown!

  Table: 1 million live rows, 10 million dead rows
  Result: 10x storage, 10x I/O for sequential scans

Autovacuum Tuning

-- Check autovacuum activity
SELECT schemaname, relname, 
       n_dead_tup, n_live_tup,
       round(n_dead_tup::numeric / greatest(n_live_tup, 1) * 100, 1) AS dead_pct,
       last_vacuum, last_autovacuum,
       last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

-- Per-table autovacuum tuning for high-churn tables
ALTER TABLE events SET (
    autovacuum_vacuum_threshold = 1000,        -- Default: 50
    autovacuum_vacuum_scale_factor = 0.01,     -- Default: 0.2 (20%)
    autovacuum_analyze_threshold = 500,
    autovacuum_analyze_scale_factor = 0.005,
    autovacuum_vacuum_cost_delay = 2           -- Default: 20ms (make faster)
);

-- When autovacuum triggers:
-- threshold + scale_factor × n_live_tup
-- Default: 50 + 0.2 × 1,000,000 = 200,050 dead rows
-- Tuned:   1000 + 0.01 × 1,000,000 = 11,000 dead rows (runs 18x sooner)

Bloat Detection

-- Detect table bloat (estimated)
SELECT 
    schemaname || '.' || tablename AS table_name,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
    pg_size_pretty(
        pg_total_relation_size(schemaname || '.' || tablename) - 
        pg_relation_size(schemaname || '.' || tablename)
    ) AS index_size,
    round(
        (1 - pg_relation_size(schemaname || '.' || tablename)::numeric / 
         greatest(pg_total_relation_size(schemaname || '.' || tablename), 1)) * 100, 
        1
    ) AS bloat_estimate_pct
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 20;

-- Fix severe bloat: pg_repack (online, no locks)
-- pg_repack --table events --no-superuser-check -d mydb

-- Alternative: VACUUM FULL (locks table, use off-peak)
-- VACUUM FULL events;

Anti-Patterns

Anti-PatternConsequenceFix
Disable autovacuumDead rows accumulate, eventual wraparoundNever disable, tune instead
Default settings on high-churn tablesAutovacuum cannot keep upPer-table autovacuum tuning
Never check bloatTables grow 10x actual sizeMonthly bloat monitoring + pg_repack
VACUUM FULL during peak hoursTable locked, queries blockedUse pg_repack (online) or off-peak VACUUM FULL
Ignore ANALYZEQuery planner uses stale statisticsEnsure autoanalyze runs, run ANALYZE after bulk loads

Database maintenance is invisible when done well and catastrophic when neglected. A healthy autovacuum configuration is the single most important performance tuning for PostgreSQL — more impactful than any index or query optimization.

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →