Database Sharding: When, Why, and How
A comprehensive guide to horizontal database sharding. Covers sharding keys, strategies (hash, range, geo), cross-shard queries, rebalancing, and when NOT to shard.
Sharding splits your database horizontally across multiple servers, each holding a subset of the data. It’s the nuclear option for scaling — incredibly powerful, but it introduces complexity that most teams underestimate. Before you shard, make sure you’ve exhausted every other optimization: indexing, query tuning, read replicas, caching, and vertical scaling.
The most common mistake: sharding too early. Teams shard at 100GB when vertical scaling easily handles 2-5TB. Premature sharding adds months of engineering overhead for problems that don’t exist yet.
When You Actually Need Sharding
| Symptom | Try First | Shard If |
|---|---|---|
| Slow queries | Indexing, query optimization | Already optimized, table has 1B+ rows |
| Write bottleneck | Vertical scaling, write batching | Single node can’t handle write IOPS |
| Storage limits | Archiving, partitioning | Dataset exceeds single-node storage |
| Multi-region latency | Read replicas per region | Writes must be region-local |
| Connection limits | Connection pooling (PgBouncer) | Still hitting single-server limits |
Rule of thumb: If your database is under 500GB and under 10K writes/sec, you probably don’t need sharding.
Optimization Ladder (Do These First)
| Step | Effort | Impact | Try Before |
|---|---|---|---|
| 1. Add proper indexes | Hours | 10-100x query speed | Anything else |
| 2. Query optimization | Days | 2-10x throughput | Read replicas |
| 3. Read replicas | Days | Scales reads to N nodes | Caching |
| 4. Application-level caching | Week | Reduces DB load 50-90% | Vertical scaling |
| 5. Vertical scaling | Hours | 2-4x capacity | Partitioning |
| 6. Table partitioning | Week | Efficient range queries | Sharding |
| 7. Sharding | Months | Infinite horizontal scale | Last resort |
Sharding Strategies
1. Hash-Based Sharding
Distribute rows by hashing the shard key. Provides even distribution.
import hashlib
def get_shard(tenant_id: str, num_shards: int = 8) -> int:
"""Consistent hash-based shard selection."""
hash_val = int(hashlib.md5(tenant_id.encode()).hexdigest(), 16)
return hash_val % num_shards
# All data for tenant_123 goes to the same shard
shard = get_shard("tenant_123") # Returns 0-7
connection = shard_connections[shard]
connection.execute("SELECT * FROM orders WHERE tenant_id = %s", "tenant_123")
Pros: Even data distribution, simple implementation Cons: Range queries require scatter-gather, adding shards requires rehashing
2. Range-Based Sharding
Split data by value ranges — common for time-series and ordered data.
from datetime import datetime
SHARD_RANGES = [
(datetime(2025, 1, 1), 0),
(datetime(2025, 7, 1), 1),
(datetime(2026, 1, 1), 2),
(datetime(2026, 7, 1), 3),
]
def get_time_shard(created_at: datetime) -> int:
for boundary, shard_id in reversed(SHARD_RANGES):
if created_at >= boundary:
return shard_id
return 0
Pros: Range queries hit single shard, natural data archiving Cons: Hot spots (newest shard gets all writes), uneven distribution
3. Geographic Sharding
Route data by region for compliance and latency:
GEO_SHARDS = {
"us": {"host": "db-us-east.example.com", "regions": ["US", "CA", "MX"]},
"eu": {"host": "db-eu-west.example.com", "regions": ["DE", "FR", "GB", "NL"]},
"apac": {"host": "db-apac.example.com", "regions": ["JP", "AU", "SG", "KR"]},
}
def get_geo_shard(user_country: str) -> dict:
for shard_name, config in GEO_SHARDS.items():
if user_country in config["regions"]:
return config
return GEO_SHARDS["us"] # Default
Pros: Data residency compliance (GDPR), low latency for regional users Cons: Cross-region queries are slow, uneven load distribution
4. Directory-Based Sharding
A lookup table maps each entity to its shard:
-- Shard directory table (on a separate, small database)
CREATE TABLE shard_directory (
entity_id VARCHAR(64) PRIMARY KEY,
shard_id INT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Lookup before every query
SELECT shard_id FROM shard_directory WHERE entity_id = 'tenant_123';
Pros: Maximum flexibility, easy rebalancing Cons: Single point of failure (directory DB), extra lookup per query
Strategy Selection Guide
| Strategy | Best For | Avoid When |
|---|---|---|
| Hash | Even distribution across all shards | Need range queries on shard key |
| Range | Time-series, ordered data, archiving | High write volume on latest range |
| Geographic | GDPR compliance, regional latency | Users access cross-region data frequently |
| Directory | Flexible tenant placement, VIP routing | Ultra-low-latency lookups required |
Shard Key Selection
The shard key is the most critical architectural decision. Choose wrong and you’ll spend months re-sharding.
Good Shard Keys
| Key | Why | Best For |
|---|---|---|
tenant_id | All tenant data colocated | Multi-tenant SaaS |
user_id | User data isolation | Social platforms, user-centric apps |
region | Geographic isolation | Compliance-driven apps |
order_id (hashed) | Even distribution | High-volume transaction systems |
Bad Shard Keys
| Key | Why It Fails |
|---|---|
created_at | All writes hit the newest shard (hot spot) |
status | Low cardinality — uneven distribution |
email | Frequently changes, requires shard migration |
Auto-increment id | Sequential — all writes go to one shard |
Compound Shard Keys
Shard key: (tenant_id, HASH(order_id))
Benefits:
- All tenant data stays on same set of shards (locality)
- Orders within tenant are distributed (no hot spots)
- Queries for a specific tenant don't scatter across all shards
Cross-Shard Query Patterns
Scatter-Gather
Query all shards in parallel, merge results:
import asyncio
async def search_all_shards(query: str, params: dict) -> list:
tasks = [
shard.execute(query, params)
for shard in all_shards
]
results = await asyncio.gather(*tasks)
merged = []
for result_set in results:
merged.extend(result_set)
return sorted(merged, key=lambda r: r['created_at'], reverse=True)[:100]
Cost: N round-trips (one per shard). Acceptable for dashboards, unacceptable for real-time APIs.
Global Tables
Keep small, frequently-joined reference data replicated across all shards:
Global tables (replicated to every shard):
├── countries
├── currencies
├── feature_flags
└── pricing_tiers
Sharded tables (distributed by tenant_id):
├── users
├── orders
├── invoices
└── activity_logs
Rebalancing
When shards become unevenly loaded, you need to move data between them.
Live Rebalancing Process
1. Create new shard(s)
2. Begin dual-writing to old and new shard
3. Backfill historical data to new shard
4. Verify data consistency (checksums)
5. Switch reads to new shard
6. Stop writing to old shard
7. Clean up old shard data
Consistent Hashing
Minimizes data movement when adding/removing shards:
import hashlib
from bisect import bisect_right
class ConsistentHashRing:
def __init__(self, nodes: list, virtual_nodes: int = 150):
self.ring = {}
self.sorted_keys = []
for node in nodes:
for i in range(virtual_nodes):
key = self._hash(f"{node}:{i}")
self.ring[key] = node
self.sorted_keys.append(key)
self.sorted_keys.sort()
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def get_node(self, key: str) -> str:
h = self._hash(key)
idx = bisect_right(self.sorted_keys, h) % len(self.sorted_keys)
return self.ring[self.sorted_keys[idx]]
# Adding a new node only moves ~1/N of the data
ring = ConsistentHashRing(["shard-1", "shard-2", "shard-3"])
ring.get_node("tenant_123") # -> "shard-2"
Alternatives to Application-Level Sharding
| Solution | How It Works | When to Use |
|---|---|---|
| PostgreSQL table partitioning | Split table by range/list/hash within one server | <1TB, need partition pruning |
| Citus (PostgreSQL) | Distributed PostgreSQL with transparent sharding | Want SQL compatibility |
| Vitess (MySQL) | Sharding middleware for MySQL | YouTube-scale MySQL workloads |
| CockroachDB | Distributed SQL with auto-sharding | Multi-region, strong consistency |
| MongoDB | Native sharding with sh.shardCollection() | Document workloads |
Failure Modes and Recovery
| Failure | Impact | Recovery Strategy |
|---|---|---|
| Single shard down | Partial outage (1/N of data unavailable) | Replica promotion, automatic failover |
| Routing layer down | Full outage (no requests reach any shard) | Redundant routing, health checks |
| Shard key regret | Expensive re-sharding required | Dual-write migration to new key |
| Unbalanced shards | Hot shard degrades, others idle | Rebalance with consistent hashing |
| Cross-shard deadlock | Queries hang, cascading timeouts | Timeout policies, circuit breakers |
Shard Monitoring Dashboard Metrics
Track these per shard to detect problems early:
- QPS per shard — Should be roughly equal; more than 2x difference signals imbalance
- P99 latency per shard — Rising latency on one shard indicates resource pressure
- Disk usage per shard — Uneven growth means your shard key has skewed distribution
- Connection count per shard — Connection pooling should distribute evenly
- Replication lag — If using read replicas per shard, lag over 1s needs investigation
Sharding Checklist
- Exhausted single-node optimizations first (indexes → replicas → caching → vertical)
- Shard key chosen based on access patterns (not just even distribution)
- Strategy selected (hash, range, geo, directory) based on query patterns
- Cross-shard query patterns identified and cost accepted
- Global/reference tables identified for replication
- Monitoring per-shard metrics (size, QPS, latency, connection count)
- Rebalancing strategy documented (consistent hashing or directory)
- Backup/restore tested per-shard
- Application routing layer tested for failover
- Data consistency verification automated (checksums, row counts)
- Rollback plan documented in case of shard key regret
:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For database scaling consulting, visit garnetgrid.com. :::