Database Selection Guide: Choosing the Right Database for the Job

The worst database choice is the one made based on what was hot on Hacker News the week the project started. The second worst is defaulting to PostgreSQL for everything — which, to be fair, is wrong far less often than you would expect.

Database selection is a consequential architectural decision because migration is expensive, painful, and rarely complete. This guide gives you a framework for making the choice deliberately instead of accidentally.

Database Categories

Category	Data Model	Best For	Examples
Relational	Tables, rows, relationships	Structured data, complex queries, transactions	PostgreSQL, MySQL, SQL Server
Document	JSON/BSON documents, schemaless	Variable schemas, rapid prototyping, content management	MongoDB, DynamoDB, Firestore
Key-Value	Simple key → value pairs	Caching, sessions, feature flags	Redis, Memcached, DynamoDB
Columnar	Columns instead of rows	Analytics, time-series aggregations, data warehousing	ClickHouse, BigQuery, Redshift
Graph	Nodes and edges (relationships)	Social networks, recommendations, fraud detection	Neo4j, Neptune, Dgraph
Time-Series	Timestamped data points	Metrics, IoT, financial data	TimescaleDB, InfluxDB, Prometheus
Vector	High-dimensional vectors	Semantic search, AI/ML embeddings, RAG	Pinecone, pgvector, Weaviate
Search	Inverted index, full-text	Full-text search, log analysis	Elasticsearch, Meilisearch, Typesense

The Selection Framework

Start here:
  "What is the primary access pattern?"

  ├─ Complex queries with JOINs and transactions
  │   └─ Relational (PostgreSQL, MySQL)
  │
  ├─ Simple key → value lookups at extreme speed
  │   └─ Key-Value (Redis, DynamoDB)
  │
  ├─ Variable/nested document structure
  │   └─ Document (MongoDB, Firestore)
  │
  ├─ Aggregations over time-ordered data
  │   └─ Time-Series (TimescaleDB, ClickHouse)
  │
  ├─ Relationship traversal ("friends of friends who like X")
  │   └─ Graph (Neo4j)
  │
  ├─ Full-text search with ranking
  │   └─ Search (Elasticsearch, Meilisearch)
  │
  ├─ AI/ML similarity search
  │   └─ Vector (Pinecone, pgvector)
  │
  └─ Analytics on billions of rows
      └─ Columnar (BigQuery, ClickHouse, Redshift)

Deep Dive: When to Use What

PostgreSQL (The Safe Default)

Use PostgreSQL when:
  ✅ You need ACID transactions
  ✅ Your data is relational (users → orders → items)
  ✅ You need complex queries (JOINs, aggregations, window functions)
  ✅ You want one database that handles JSON, full-text search, and vectors
  ✅ You are unsure what to choose (PostgreSQL is rarely the wrong answer)

Avoid PostgreSQL when:
  ❌ You need sub-millisecond key-value lookups at millions of ops/sec (use Redis)
  ❌ You are doing real-time analytics on billions of rows (use ClickHouse)
  ❌ You need graph traversals with variable depth (use Neo4j)
  ❌ Your data is truly schemaless and changes constantly (consider document DB)

MongoDB (Documents Done Right)

Use MongoDB when:
  ✅ Each record is a self-contained document (no complex cross-document queries)
  ✅ Schema varies significantly between records
  ✅ You embed related data in the same document (denormalized)
  ✅ Your team prefers working with JSON-like structures

Avoid MongoDB when:
  ❌ You need multi-document transactions across collections (possible but expensive)
  ❌ You need complex JOINs (aggregation pipeline is powerful but verbose)
  ❌ Your data is inherently relational
  ❌ You need strong consistency guarantees by default

Redis (Speed Above All)

Use Redis when:
  ✅ You need sub-millisecond response times
  ✅ Data fits in memory (caching, sessions, leaderboards)
  ✅ You need pub/sub, rate limiting, or distributed locks
  ✅ You need a cache layer in front of your primary database

Avoid Redis when:
  ❌ Data does not fit in memory
  ❌ You need durability guarantees (data must survive restart)
  ❌ You need complex queries beyond key lookups

The Polyglot Persistence Pattern

Most production systems use multiple databases, each optimized for its access pattern:

┌──────────────────────────────────────────────────┐
│  Application Layer                                │
├──────────────────────────────────────────────────┤
│                                                    │
│  PostgreSQL         │ Primary data store            │
│  (users, orders,    │ Transactions, relationships   │
│   products)         │                               │
│                     │                               │
│  Redis              │ Caching + sessions            │
│  (cache, sessions,  │ Sub-millisecond reads         │
│   rate limiting)    │                               │
│                     │                               │
│  Elasticsearch      │ Full-text search              │
│  (product search,   │ Faceted filtering             │
│   log analysis)     │                               │
│                     │                               │
│  ClickHouse         │ Analytics + dashboards        │
│  (event analytics,  │ Aggregations on billions      │
│   business metrics) │ of rows                       │
│                     │                               │
└──────────────────────────────────────────────────┘

Polyglot Costs

Benefit	Cost
Each database optimized for its workload	More databases to operate and monitor
Better performance per use case	Data synchronization between databases
Independent scaling	More complex deployment and backup
Team can specialize	Broader skill requirements

Start with one database (PostgreSQL). Add specialized databases only when you have a proven performance problem that the primary database cannot solve. Every additional database adds operational complexity.

Migration Patterns

Pattern	When to Use	Risk
Big bang	Small dataset, simple schema	High: all or nothing
Dual write	Write to both old and new during transition	Medium: data inconsistency risk
Change data capture (CDC)	Stream changes from old to new DB	Low: eventual consistency
Strangler fig	Gradually route queries to new DB	Lowest: incremental, reversible

Database Categories

The Selection Framework

Deep Dive: When to Use What

PostgreSQL (The Safe Default)

MongoDB (Documents Done Right)

Redis (Speed Above All)

The Polyglot Persistence Pattern

Polyglot Costs

Migration Patterns

Implementation Checklist

More in DevOps & CI/CD

Chaos Engineering in Practice

Canary Deployments

CI/CD Pipeline Maturity Model