Database Selection Guide: Choosing the Right Database for the Job
Navigate the database landscape without getting paralyzed by options. Covers relational vs document vs columnar vs graph vs time-series databases, when each shines, migration patterns, and the selection framework that prevents 'we chose Mongo for everything' syndrome.
The worst database choice is the one made based on what was hot on Hacker News the week the project started. The second worst is defaulting to PostgreSQL for everything — which, to be fair, is wrong far less often than you would expect.
Database selection is a consequential architectural decision because migration is expensive, painful, and rarely complete. This guide gives you a framework for making the choice deliberately instead of accidentally.
Database Categories
| Category | Data Model | Best For | Examples |
|---|---|---|---|
| Relational | Tables, rows, relationships | Structured data, complex queries, transactions | PostgreSQL, MySQL, SQL Server |
| Document | JSON/BSON documents, schemaless | Variable schemas, rapid prototyping, content management | MongoDB, DynamoDB, Firestore |
| Key-Value | Simple key → value pairs | Caching, sessions, feature flags | Redis, Memcached, DynamoDB |
| Columnar | Columns instead of rows | Analytics, time-series aggregations, data warehousing | ClickHouse, BigQuery, Redshift |
| Graph | Nodes and edges (relationships) | Social networks, recommendations, fraud detection | Neo4j, Neptune, Dgraph |
| Time-Series | Timestamped data points | Metrics, IoT, financial data | TimescaleDB, InfluxDB, Prometheus |
| Vector | High-dimensional vectors | Semantic search, AI/ML embeddings, RAG | Pinecone, pgvector, Weaviate |
| Search | Inverted index, full-text | Full-text search, log analysis | Elasticsearch, Meilisearch, Typesense |
The Selection Framework
Start here:
"What is the primary access pattern?"
├─ Complex queries with JOINs and transactions
│ └─ Relational (PostgreSQL, MySQL)
│
├─ Simple key → value lookups at extreme speed
│ └─ Key-Value (Redis, DynamoDB)
│
├─ Variable/nested document structure
│ └─ Document (MongoDB, Firestore)
│
├─ Aggregations over time-ordered data
│ └─ Time-Series (TimescaleDB, ClickHouse)
│
├─ Relationship traversal ("friends of friends who like X")
│ └─ Graph (Neo4j)
│
├─ Full-text search with ranking
│ └─ Search (Elasticsearch, Meilisearch)
│
├─ AI/ML similarity search
│ └─ Vector (Pinecone, pgvector)
│
└─ Analytics on billions of rows
└─ Columnar (BigQuery, ClickHouse, Redshift)
Deep Dive: When to Use What
PostgreSQL (The Safe Default)
Use PostgreSQL when:
✅ You need ACID transactions
✅ Your data is relational (users → orders → items)
✅ You need complex queries (JOINs, aggregations, window functions)
✅ You want one database that handles JSON, full-text search, and vectors
✅ You are unsure what to choose (PostgreSQL is rarely the wrong answer)
Avoid PostgreSQL when:
❌ You need sub-millisecond key-value lookups at millions of ops/sec (use Redis)
❌ You are doing real-time analytics on billions of rows (use ClickHouse)
❌ You need graph traversals with variable depth (use Neo4j)
❌ Your data is truly schemaless and changes constantly (consider document DB)
MongoDB (Documents Done Right)
Use MongoDB when:
✅ Each record is a self-contained document (no complex cross-document queries)
✅ Schema varies significantly between records
✅ You embed related data in the same document (denormalized)
✅ Your team prefers working with JSON-like structures
Avoid MongoDB when:
❌ You need multi-document transactions across collections (possible but expensive)
❌ You need complex JOINs (aggregation pipeline is powerful but verbose)
❌ Your data is inherently relational
❌ You need strong consistency guarantees by default
Redis (Speed Above All)
Use Redis when:
✅ You need sub-millisecond response times
✅ Data fits in memory (caching, sessions, leaderboards)
✅ You need pub/sub, rate limiting, or distributed locks
✅ You need a cache layer in front of your primary database
Avoid Redis when:
❌ Data does not fit in memory
❌ You need durability guarantees (data must survive restart)
❌ You need complex queries beyond key lookups
The Polyglot Persistence Pattern
Most production systems use multiple databases, each optimized for its access pattern:
┌──────────────────────────────────────────────────┐
│ Application Layer │
├──────────────────────────────────────────────────┤
│ │
│ PostgreSQL │ Primary data store │
│ (users, orders, │ Transactions, relationships │
│ products) │ │
│ │ │
│ Redis │ Caching + sessions │
│ (cache, sessions, │ Sub-millisecond reads │
│ rate limiting) │ │
│ │ │
│ Elasticsearch │ Full-text search │
│ (product search, │ Faceted filtering │
│ log analysis) │ │
│ │ │
│ ClickHouse │ Analytics + dashboards │
│ (event analytics, │ Aggregations on billions │
│ business metrics) │ of rows │
│ │ │
└──────────────────────────────────────────────────┘
Polyglot Costs
| Benefit | Cost |
|---|---|
| Each database optimized for its workload | More databases to operate and monitor |
| Better performance per use case | Data synchronization between databases |
| Independent scaling | More complex deployment and backup |
| Team can specialize | Broader skill requirements |
Start with one database (PostgreSQL). Add specialized databases only when you have a proven performance problem that the primary database cannot solve. Every additional database adds operational complexity.
Migration Patterns
| Pattern | When to Use | Risk |
|---|---|---|
| Big bang | Small dataset, simple schema | High: all or nothing |
| Dual write | Write to both old and new during transition | Medium: data inconsistency risk |
| Change data capture (CDC) | Stream changes from old to new DB | Low: eventual consistency |
| Strangler fig | Gradually route queries to new DB | Lowest: incremental, reversible |
Implementation Checklist
- Identify your primary access patterns before choosing a database
- Start with PostgreSQL unless you have a specific reason not to
- Add specialized databases only for proven performance problems
- Document why each database was chosen (Architecture Decision Record)
- Plan for data synchronization if using multiple databases
- Set up monitoring for each database: query latency, connection count, storage growth
- Implement connection pooling for every database connection
- Design your schema for the most common query, not the most complex one
- Test with realistic data volumes (not 100 rows — 1 million rows minimum)
- Have a migration plan before you need one (because you will need one)