Database Indexing Strategies for High-Traffic Applications

Indexes are the single most impactful performance lever in any relational database. A missing index turns a 2ms query into a 30-second table scan. A wrong index wastes disk, slows writes, and gives the query planner bad options. This guide covers the indexing strategies that actually matter in production.

The biggest indexing mistake isn’t missing indexes — it’s too many of them. Every index needs to be maintained on every write. A table with 15 indexes turns every INSERT into 16 operations. The goal is the minimum set of indexes that covers your actual query patterns.

Index Fundamentals

How B-Tree Indexes Work

B-tree is the default index type in PostgreSQL, MySQL, and SQL Server. It maintains sorted data in a balanced tree structure, enabling O(log n) lookups.

                    [50]
                   /    \
              [20,30]   [70,80]
             /  |  \    /  |  \
          [10] [25] [35] [60] [75] [90]

Supports: =, <, >, <=, >=, BETWEEN, IN, IS NULL, LIKE 'prefix%'

Does NOT support: LIKE '%suffix', arbitrary expressions without expression indexes

Index Type Quick Reference

Type	Engine	Best For	Size	Write Cost
B-tree	All	Equality, range, sorting	Medium	Medium
Hash	PG, MySQL	Equality only (=)	Small	Low
GIN	PostgreSQL	Full-text, arrays, JSONB	Large	High
GiST	PostgreSQL	Geospatial, range types	Medium	Medium
BRIN	PostgreSQL	Append-only time-series	Very small	Very low
Covering	PG, SQL Server	Index-only scans	Larger	Medium

Composite Index Design

The order of columns in a composite index determines which queries it can serve. This is the most commonly misunderstood concept in indexing.

The Equality-Sort-Range (ESR) Rule

Order composite index columns as:

Equality columns first (WHERE col = value)
Sort columns second (ORDER BY col)
Range columns last (WHERE col > value)

-- Query pattern
SELECT * FROM orders
WHERE status = 'shipped'           -- Equality
AND customer_id = 'cust_123'       -- Equality
ORDER BY created_at DESC           -- Sort
LIMIT 20;

-- Optimal index: equality columns first, then sort column
CREATE INDEX idx_orders_status_customer_date
ON orders (status, customer_id, created_at DESC);

What the Index Serves

-- ✅ This index serves ALL of these queries:
WHERE status = 'shipped'
WHERE status = 'shipped' AND customer_id = 'cust_123'
WHERE status = 'shipped' AND customer_id = 'cust_123' ORDER BY created_at DESC

-- ❌ This index does NOT serve:
WHERE customer_id = 'cust_123'                    -- Skips leading column
WHERE created_at > '2026-01-01'                   -- Skips leading columns
WHERE status = 'shipped' ORDER BY customer_id      -- Sort on non-sequential column

Rule: A composite index can be used for queries that filter on a left prefix of the index columns.

Specialized Index Types

Partial Indexes

Index only the rows you actually query. Dramatically smaller and faster.

-- Only 5% of orders are 'pending', but you query them constantly
CREATE INDEX idx_orders_pending
ON orders (customer_id, created_at)
WHERE status = 'pending';

-- Index is ~20x smaller than a full index on the same columns
-- Queries MUST include the WHERE clause to use this index:
SELECT * FROM orders WHERE status = 'pending' AND customer_id = 'cust_123';

Covering Indexes (Index-Only Scans)

Include non-indexed columns so the database doesn’t need to visit the table at all:

-- Without covering index: index lookup → table lookup (2 I/O operations)
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- With covering index: index-only scan (1 I/O operation)
CREATE INDEX idx_orders_customer_covering
ON orders (customer_id)
INCLUDE (status, total_amount, created_at);

-- This query can be served entirely from the index
SELECT status, total_amount, created_at
FROM orders
WHERE customer_id = 'cust_123';

Expression Indexes

Index computed values for queries that filter on expressions:

-- Query pattern: case-insensitive email lookup
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- Expression index
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

-- JSONB field extraction
CREATE INDEX idx_products_brand ON products ((metadata->>'brand'));

-- Date truncation for daily aggregation
CREATE INDEX idx_events_day ON events (DATE(created_at));

GIN Indexes (Generalized Inverted)

For full-text search, arrays, and JSONB containment:

-- Full-text search
CREATE INDEX idx_articles_search
ON articles USING gin(to_tsvector('english', title || ' ' || body));

-- Array containment
CREATE INDEX idx_products_tags ON products USING gin(tags);
-- Query: SELECT * FROM products WHERE tags @> ARRAY['enterprise', 'saas'];

-- JSONB containment
CREATE INDEX idx_events_data ON events USING gin(payload jsonb_path_ops);
-- Query: SELECT * FROM events WHERE payload @> '{"type": "purchase"}';

BRIN Indexes (Block Range)

Extremely compact indexes for append-only, naturally ordered data:

-- Time-series data where rows are inserted in timestamp order
CREATE INDEX idx_logs_timestamp ON application_logs USING brin(created_at);

-- BRIN index on a 1B row table: ~1MB vs ~20GB for B-tree
-- Trade-off: less precise, scans more blocks, but minimal storage overhead

Anti-Patterns

1. Indexing Every Column

-- ❌ Don't do this
CREATE INDEX idx_orders_status ON orders (status);
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_date ON orders (created_at);
CREATE INDEX idx_orders_total ON orders (total_amount);
CREATE INDEX idx_orders_type ON orders (order_type);

-- ✅ Create composite indexes that match your actual query patterns
CREATE INDEX idx_orders_main ON orders (status, customer_id, created_at DESC);

Each index slows down INSERT, UPDATE, and DELETE operations. On write-heavy tables, excessive indexes can cut write throughput by 50%+.

2. Low-Selectivity Indexes

-- ❌ Boolean column with ~50/50 distribution — index is useless
CREATE INDEX idx_users_active ON users (is_active);

-- ✅ Partial index — only index the minority case you actually query
CREATE INDEX idx_users_inactive ON users (id) WHERE is_active = false;

3. Redundant Indexes

-- ❌ Index B is redundant — Index A covers its queries
CREATE INDEX idx_a ON orders (customer_id, status, created_at);  -- A
CREATE INDEX idx_b ON orders (customer_id);                       -- B (redundant)
CREATE INDEX idx_c ON orders (customer_id, status);               -- C (redundant)

-- ✅ Keep only the most comprehensive composite index
-- Index A serves: (customer_id), (customer_id, status), (customer_id, status, created_at)

Write Impact by Index Count

Indexes on Table	INSERT Overhead	UPDATE Overhead	DELETE Overhead
0-2	Minimal	Minimal	Minimal
3-5	10-20% slower	10-30% slower	10-20% slower
6-10	30-50% slower	40-60% slower	30-50% slower
10+	50%+ slower	60%+ slower	50%+ slower

Index Monitoring Queries

Find Unused Indexes

SELECT
    schemaname || '.' || relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan < 10
  AND indexrelname NOT LIKE '%_pkey'
  AND pg_relation_size(indexrelid) > 1048576  -- > 1MB
ORDER BY pg_relation_size(indexrelid) DESC;

Find Missing Indexes

SELECT
    schemaname || '.' || relname AS table_name,
    seq_scan AS sequential_scans,
    seq_tup_read AS rows_scanned_sequentially,
    idx_scan AS index_scans,
    pg_size_pretty(pg_relation_size(relid)) AS table_size
FROM pg_stat_user_tables
WHERE seq_scan > 500
  AND pg_relation_size(relid) > 10485760  -- > 10MB
  AND seq_tup_read / GREATEST(seq_scan, 1) > 10000
ORDER BY seq_tup_read DESC
LIMIT 20;

Index Strategy by Workload

Workload	Primary Strategy	Index Types
OLTP (transactions)	Composite B-tree on query patterns	B-tree, partial
Search	GIN on tsvector	GIN, expression
Analytics/reporting	Covering indexes, BRIN for time-series	B-tree + INCLUDE, BRIN
Multi-tenant SaaS	tenant_id as leading column in all composites	Composite B-tree
Event sourcing	BRIN on timestamp, B-tree on entity_id	BRIN, B-tree
Geospatial	GiST on geometry columns	GiST (PostGIS)

Index Maintenance Best Practices

Indexes degrade over time as data patterns change. Regular maintenance prevents performance regression:

Maintenance Task	Frequency	Purpose	Tool or Command
Check index usage	Monthly	Find unused indexes	pg_stat_user_indexes (PostgreSQL), sys.dm_db_index_usage_stats (SQL Server)
Rebuild fragmented indexes	Weekly (high-write)	Reduce page splits, reclaim space	REINDEX (PostgreSQL), ALTER INDEX REBUILD (SQL Server)
Update statistics	After bulk loads	Keep query planner accurate	ANALYZE (PostgreSQL), UPDATE STATISTICS (SQL Server)
Review slow query log	Weekly	Identify missing indexes	pg_stat_statements, slow_query_log
Validate covering indexes	Quarterly	Ensure queries avoid table lookups	EXPLAIN ANALYZE with actual plans

Index Size vs Query Speed Trade-Off

Every index you add improves read performance for specific queries but degrades write performance for all operations on that table. Use this rule of thumb:

OLTP systems (high write, moderate read) — Limit to 5-7 indexes per table
OLAP systems (low write, high read) — Indexes are cheap; add generously
Mixed workloads — Profile actual query patterns before adding indexes; remove unused ones quarterly

Indexing Checklist

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For database performance consulting, visit garnetgrid.com. :::