Database Indexing Strategies for High-Traffic Applications
Master database indexing beyond the basics. Covers B-tree, hash, GIN, GiST, partial indexes, covering indexes, and composite index design for production workloads.
Indexes are the single most impactful performance lever in any relational database. A missing index turns a 2ms query into a 30-second table scan. A wrong index wastes disk, slows writes, and gives the query planner bad options. This guide covers the indexing strategies that actually matter in production.
The biggest indexing mistake isn’t missing indexes — it’s too many of them. Every index needs to be maintained on every write. A table with 15 indexes turns every INSERT into 16 operations. The goal is the minimum set of indexes that covers your actual query patterns.
Index Fundamentals
How B-Tree Indexes Work
B-tree is the default index type in PostgreSQL, MySQL, and SQL Server. It maintains sorted data in a balanced tree structure, enabling O(log n) lookups.
[50]
/ \
[20,30] [70,80]
/ | \ / | \
[10] [25] [35] [60] [75] [90]
Supports: =, <, >, <=, >=, BETWEEN, IN, IS NULL, LIKE 'prefix%'
Does NOT support: LIKE '%suffix', arbitrary expressions without expression indexes
Index Type Quick Reference
| Type | Engine | Best For | Size | Write Cost |
|---|---|---|---|---|
| B-tree | All | Equality, range, sorting | Medium | Medium |
| Hash | PG, MySQL | Equality only (=) | Small | Low |
| GIN | PostgreSQL | Full-text, arrays, JSONB | Large | High |
| GiST | PostgreSQL | Geospatial, range types | Medium | Medium |
| BRIN | PostgreSQL | Append-only time-series | Very small | Very low |
| Covering | PG, SQL Server | Index-only scans | Larger | Medium |
Composite Index Design
The order of columns in a composite index determines which queries it can serve. This is the most commonly misunderstood concept in indexing.
The Equality-Sort-Range (ESR) Rule
Order composite index columns as:
- Equality columns first (WHERE col = value)
- Sort columns second (ORDER BY col)
- Range columns last (WHERE col > value)
-- Query pattern
SELECT * FROM orders
WHERE status = 'shipped' -- Equality
AND customer_id = 'cust_123' -- Equality
ORDER BY created_at DESC -- Sort
LIMIT 20;
-- Optimal index: equality columns first, then sort column
CREATE INDEX idx_orders_status_customer_date
ON orders (status, customer_id, created_at DESC);
What the Index Serves
-- ✅ This index serves ALL of these queries:
WHERE status = 'shipped'
WHERE status = 'shipped' AND customer_id = 'cust_123'
WHERE status = 'shipped' AND customer_id = 'cust_123' ORDER BY created_at DESC
-- ❌ This index does NOT serve:
WHERE customer_id = 'cust_123' -- Skips leading column
WHERE created_at > '2026-01-01' -- Skips leading columns
WHERE status = 'shipped' ORDER BY customer_id -- Sort on non-sequential column
Rule: A composite index can be used for queries that filter on a left prefix of the index columns.
Specialized Index Types
Partial Indexes
Index only the rows you actually query. Dramatically smaller and faster.
-- Only 5% of orders are 'pending', but you query them constantly
CREATE INDEX idx_orders_pending
ON orders (customer_id, created_at)
WHERE status = 'pending';
-- Index is ~20x smaller than a full index on the same columns
-- Queries MUST include the WHERE clause to use this index:
SELECT * FROM orders WHERE status = 'pending' AND customer_id = 'cust_123';
Covering Indexes (Index-Only Scans)
Include non-indexed columns so the database doesn’t need to visit the table at all:
-- Without covering index: index lookup → table lookup (2 I/O operations)
CREATE INDEX idx_orders_customer ON orders (customer_id);
-- With covering index: index-only scan (1 I/O operation)
CREATE INDEX idx_orders_customer_covering
ON orders (customer_id)
INCLUDE (status, total_amount, created_at);
-- This query can be served entirely from the index
SELECT status, total_amount, created_at
FROM orders
WHERE customer_id = 'cust_123';
Expression Indexes
Index computed values for queries that filter on expressions:
-- Query pattern: case-insensitive email lookup
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- Expression index
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
-- JSONB field extraction
CREATE INDEX idx_products_brand ON products ((metadata->>'brand'));
-- Date truncation for daily aggregation
CREATE INDEX idx_events_day ON events (DATE(created_at));
GIN Indexes (Generalized Inverted)
For full-text search, arrays, and JSONB containment:
-- Full-text search
CREATE INDEX idx_articles_search
ON articles USING gin(to_tsvector('english', title || ' ' || body));
-- Array containment
CREATE INDEX idx_products_tags ON products USING gin(tags);
-- Query: SELECT * FROM products WHERE tags @> ARRAY['enterprise', 'saas'];
-- JSONB containment
CREATE INDEX idx_events_data ON events USING gin(payload jsonb_path_ops);
-- Query: SELECT * FROM events WHERE payload @> '{"type": "purchase"}';
BRIN Indexes (Block Range)
Extremely compact indexes for append-only, naturally ordered data:
-- Time-series data where rows are inserted in timestamp order
CREATE INDEX idx_logs_timestamp ON application_logs USING brin(created_at);
-- BRIN index on a 1B row table: ~1MB vs ~20GB for B-tree
-- Trade-off: less precise, scans more blocks, but minimal storage overhead
Anti-Patterns
1. Indexing Every Column
-- ❌ Don't do this
CREATE INDEX idx_orders_status ON orders (status);
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_date ON orders (created_at);
CREATE INDEX idx_orders_total ON orders (total_amount);
CREATE INDEX idx_orders_type ON orders (order_type);
-- ✅ Create composite indexes that match your actual query patterns
CREATE INDEX idx_orders_main ON orders (status, customer_id, created_at DESC);
Each index slows down INSERT, UPDATE, and DELETE operations. On write-heavy tables, excessive indexes can cut write throughput by 50%+.
2. Low-Selectivity Indexes
-- ❌ Boolean column with ~50/50 distribution — index is useless
CREATE INDEX idx_users_active ON users (is_active);
-- ✅ Partial index — only index the minority case you actually query
CREATE INDEX idx_users_inactive ON users (id) WHERE is_active = false;
3. Redundant Indexes
-- ❌ Index B is redundant — Index A covers its queries
CREATE INDEX idx_a ON orders (customer_id, status, created_at); -- A
CREATE INDEX idx_b ON orders (customer_id); -- B (redundant)
CREATE INDEX idx_c ON orders (customer_id, status); -- C (redundant)
-- ✅ Keep only the most comprehensive composite index
-- Index A serves: (customer_id), (customer_id, status), (customer_id, status, created_at)
Write Impact by Index Count
| Indexes on Table | INSERT Overhead | UPDATE Overhead | DELETE Overhead |
|---|---|---|---|
| 0-2 | Minimal | Minimal | Minimal |
| 3-5 | 10-20% slower | 10-30% slower | 10-20% slower |
| 6-10 | 30-50% slower | 40-60% slower | 30-50% slower |
| 10+ | 50%+ slower | 60%+ slower | 50%+ slower |
Index Monitoring Queries
Find Unused Indexes
SELECT
schemaname || '.' || relname AS table_name,
indexrelname AS index_name,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan < 10
AND indexrelname NOT LIKE '%_pkey'
AND pg_relation_size(indexrelid) > 1048576 -- > 1MB
ORDER BY pg_relation_size(indexrelid) DESC;
Find Missing Indexes
SELECT
schemaname || '.' || relname AS table_name,
seq_scan AS sequential_scans,
seq_tup_read AS rows_scanned_sequentially,
idx_scan AS index_scans,
pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_stat_user_tables
WHERE seq_scan > 500
AND pg_relation_size(relid) > 10485760 -- > 10MB
AND seq_tup_read / GREATEST(seq_scan, 1) > 10000
ORDER BY seq_tup_read DESC
LIMIT 20;
Index Strategy by Workload
| Workload | Primary Strategy | Index Types |
|---|---|---|
| OLTP (transactions) | Composite B-tree on query patterns | B-tree, partial |
| Search | GIN on tsvector | GIN, expression |
| Analytics/reporting | Covering indexes, BRIN for time-series | B-tree + INCLUDE, BRIN |
| Multi-tenant SaaS | tenant_id as leading column in all composites | Composite B-tree |
| Event sourcing | BRIN on timestamp, B-tree on entity_id | BRIN, B-tree |
| Geospatial | GiST on geometry columns | GiST (PostGIS) |
Index Maintenance Best Practices
Indexes degrade over time as data patterns change. Regular maintenance prevents performance regression:
| Maintenance Task | Frequency | Purpose | Tool or Command |
|---|---|---|---|
| Check index usage | Monthly | Find unused indexes | pg_stat_user_indexes (PostgreSQL), sys.dm_db_index_usage_stats (SQL Server) |
| Rebuild fragmented indexes | Weekly (high-write) | Reduce page splits, reclaim space | REINDEX (PostgreSQL), ALTER INDEX REBUILD (SQL Server) |
| Update statistics | After bulk loads | Keep query planner accurate | ANALYZE (PostgreSQL), UPDATE STATISTICS (SQL Server) |
| Review slow query log | Weekly | Identify missing indexes | pg_stat_statements, slow_query_log |
| Validate covering indexes | Quarterly | Ensure queries avoid table lookups | EXPLAIN ANALYZE with actual plans |
Index Size vs Query Speed Trade-Off
Every index you add improves read performance for specific queries but degrades write performance for all operations on that table. Use this rule of thumb:
- OLTP systems (high write, moderate read) — Limit to 5-7 indexes per table
- OLAP systems (low write, high read) — Indexes are cheap; add generously
- Mixed workloads — Profile actual query patterns before adding indexes; remove unused ones quarterly
Indexing Checklist
- Identified top 20 queries by frequency and latency (use
pg_stat_statements) - Created composite indexes following ESR rule (Equality → Sort → Range)
- Used partial indexes for filtered queries on subsets
- Added covering indexes for index-only scans on hot queries
- Removed redundant and unused indexes (check
pg_stat_user_indexes) - Verified write impact is acceptable (target < 5 indexes per write-heavy table)
- Set up monitoring for sequential scans on large tables
- BRIN indexes applied to time-series / append-only tables
- GIN indexes for full-text search and JSONB queries
- Validated all indexes with
EXPLAIN ANALYZE - Index maintenance scheduled (REINDEX for bloated indexes)
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For database performance consulting, visit garnetgrid.com. :::