PostgreSQL Query Optimization

A single slow query can take down a production database. PostgreSQL processes millions of rows efficiently — if you help it. Query optimization is about understanding how PostgreSQL executes queries and giving the planner the information it needs to choose the fastest path.

EXPLAIN ANALYZE

-- Always start here: see what PostgreSQL actually does
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;

-- What to look for:
-- Seq Scan on orders  → Missing index, scanning entire table
-- Sort                → Consider index for ORDER BY
-- Hash Join           → Large join, may need index on join key
-- actual rows=1000    → 1000 rows scanned for 50 results
-- Buffers: shared hit → Data was in cache (good)
-- Buffers: shared read → Data loaded from disk (slow)

Index Strategies

-- B-tree (default): Equality and range queries
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at);

-- Composite index: Multiple columns (order matters!)
CREATE INDEX idx_orders_status_created 
ON orders(status, created_at DESC);
-- Covers: WHERE status = 'pending' ORDER BY created_at DESC

-- Partial index: Index subset of rows
CREATE INDEX idx_orders_pending 
ON orders(created_at DESC) 
WHERE status = 'pending';
-- Smaller index, faster for common queries

-- Covering index (INCLUDE): Avoids table lookup
CREATE INDEX idx_orders_pending_covering 
ON orders(status, created_at DESC)
INCLUDE (total, customer_id);
-- Index-only scan: all data in index, no heap access

-- GIN index: JSONB, arrays, full-text
CREATE INDEX idx_orders_metadata ON orders USING gin(metadata);
-- Supports: metadata @> '{"priority": "high"}'

-- BRIN index: Large tables with naturally ordered data
CREATE INDEX idx_events_timestamp ON events USING brin(timestamp);
-- Tiny index for time-series data

Common Query Rewrites

-- BAD: Function on indexed column (index not used)
SELECT * FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2026;

-- GOOD: Range query (index used)
SELECT * FROM orders 
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';

-- BAD: OR on different columns (index not used efficiently)
SELECT * FROM orders WHERE status = 'pending' OR priority = 'high';

-- GOOD: UNION of indexed queries
SELECT * FROM orders WHERE status = 'pending'
UNION ALL
SELECT * FROM orders WHERE priority = 'high' AND status != 'pending';

-- BAD: SELECT * when you need 2 columns
SELECT * FROM orders WHERE status = 'pending';

-- GOOD: Select only needed columns (enables index-only scan)
SELECT id, total FROM orders WHERE status = 'pending';

-- BAD: COUNT(*) on large table
SELECT COUNT(*) FROM orders;  -- Seq scan, slow

-- GOOD: Approximate count
SELECT reltuples::bigint FROM pg_class WHERE relname = 'orders';

Join Optimization

-- Ensure join keys are indexed
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- For many-to-many, index both sides
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

-- Lateral join for "top-N per group"
-- Find latest 3 orders per customer (efficient)
SELECT c.id, c.name, latest_orders.*
FROM customers c
CROSS JOIN LATERAL (
    SELECT o.id, o.total, o.created_at
    FROM orders o
    WHERE o.customer_id = c.id
    ORDER BY o.created_at DESC
    LIMIT 3
) latest_orders;

Anti-Patterns

Anti-Pattern	Consequence	Fix
No EXPLAIN ANALYZE	Guessing at performance	Always EXPLAIN before optimizing
Index everything	Write overhead, storage waste	Index only queried columns
SELECT *	Prevents index-only scans	Select needed columns only
Functions on WHERE columns	Index bypassed	Rewrite as range/equality
Missing composite indexes	Multiple single-column scans	Composite index matching query pattern

PostgreSQL is remarkably fast when you work with it. The optimizer is sophisticated — but it needs indexes, statistics, and well-structured queries to do its best work.