ESC
Type to search guides, tutorials, and reference documentation.
Verified by Garnet Grid

Database Selection Guide: Choosing the Right Database for the Job

Navigate the database landscape without getting paralyzed by options. Covers relational vs document vs columnar vs graph vs time-series databases, when each shines, migration patterns, and the selection framework that prevents 'we chose Mongo for everything' syndrome.

The worst database choice is the one made based on what was hot on Hacker News the week the project started. The second worst is defaulting to PostgreSQL for everything — which, to be fair, is wrong far less often than you would expect.

Database selection is a consequential architectural decision because migration is expensive, painful, and rarely complete. This guide gives you a framework for making the choice deliberately instead of accidentally.


Database Categories

CategoryData ModelBest ForExamples
RelationalTables, rows, relationshipsStructured data, complex queries, transactionsPostgreSQL, MySQL, SQL Server
DocumentJSON/BSON documents, schemalessVariable schemas, rapid prototyping, content managementMongoDB, DynamoDB, Firestore
Key-ValueSimple key → value pairsCaching, sessions, feature flagsRedis, Memcached, DynamoDB
ColumnarColumns instead of rowsAnalytics, time-series aggregations, data warehousingClickHouse, BigQuery, Redshift
GraphNodes and edges (relationships)Social networks, recommendations, fraud detectionNeo4j, Neptune, Dgraph
Time-SeriesTimestamped data pointsMetrics, IoT, financial dataTimescaleDB, InfluxDB, Prometheus
VectorHigh-dimensional vectorsSemantic search, AI/ML embeddings, RAGPinecone, pgvector, Weaviate
SearchInverted index, full-textFull-text search, log analysisElasticsearch, Meilisearch, Typesense

The Selection Framework

Start here:
  "What is the primary access pattern?"

  ├─ Complex queries with JOINs and transactions
  │   └─ Relational (PostgreSQL, MySQL)

  ├─ Simple key → value lookups at extreme speed
  │   └─ Key-Value (Redis, DynamoDB)

  ├─ Variable/nested document structure
  │   └─ Document (MongoDB, Firestore)

  ├─ Aggregations over time-ordered data
  │   └─ Time-Series (TimescaleDB, ClickHouse)

  ├─ Relationship traversal ("friends of friends who like X")
  │   └─ Graph (Neo4j)

  ├─ Full-text search with ranking
  │   └─ Search (Elasticsearch, Meilisearch)

  ├─ AI/ML similarity search
  │   └─ Vector (Pinecone, pgvector)

  └─ Analytics on billions of rows
      └─ Columnar (BigQuery, ClickHouse, Redshift)

Deep Dive: When to Use What

PostgreSQL (The Safe Default)

Use PostgreSQL when:
  ✅ You need ACID transactions
  ✅ Your data is relational (users → orders → items)
  ✅ You need complex queries (JOINs, aggregations, window functions)
  ✅ You want one database that handles JSON, full-text search, and vectors
  ✅ You are unsure what to choose (PostgreSQL is rarely the wrong answer)

Avoid PostgreSQL when:
  ❌ You need sub-millisecond key-value lookups at millions of ops/sec (use Redis)
  ❌ You are doing real-time analytics on billions of rows (use ClickHouse)
  ❌ You need graph traversals with variable depth (use Neo4j)
  ❌ Your data is truly schemaless and changes constantly (consider document DB)

MongoDB (Documents Done Right)

Use MongoDB when:
  ✅ Each record is a self-contained document (no complex cross-document queries)
  ✅ Schema varies significantly between records
  ✅ You embed related data in the same document (denormalized)
  ✅ Your team prefers working with JSON-like structures

Avoid MongoDB when:
  ❌ You need multi-document transactions across collections (possible but expensive)
  ❌ You need complex JOINs (aggregation pipeline is powerful but verbose)
  ❌ Your data is inherently relational
  ❌ You need strong consistency guarantees by default

Redis (Speed Above All)

Use Redis when:
  ✅ You need sub-millisecond response times
  ✅ Data fits in memory (caching, sessions, leaderboards)
  ✅ You need pub/sub, rate limiting, or distributed locks
  ✅ You need a cache layer in front of your primary database

Avoid Redis when:
  ❌ Data does not fit in memory
  ❌ You need durability guarantees (data must survive restart)
  ❌ You need complex queries beyond key lookups

The Polyglot Persistence Pattern

Most production systems use multiple databases, each optimized for its access pattern:

┌──────────────────────────────────────────────────┐
│  Application Layer                                │
├──────────────────────────────────────────────────┤
│                                                    │
│  PostgreSQL         │ Primary data store            │
│  (users, orders,    │ Transactions, relationships   │
│   products)         │                               │
│                     │                               │
│  Redis              │ Caching + sessions            │
│  (cache, sessions,  │ Sub-millisecond reads         │
│   rate limiting)    │                               │
│                     │                               │
│  Elasticsearch      │ Full-text search              │
│  (product search,   │ Faceted filtering             │
│   log analysis)     │                               │
│                     │                               │
│  ClickHouse         │ Analytics + dashboards        │
│  (event analytics,  │ Aggregations on billions      │
│   business metrics) │ of rows                       │
│                     │                               │
└──────────────────────────────────────────────────┘

Polyglot Costs

BenefitCost
Each database optimized for its workloadMore databases to operate and monitor
Better performance per use caseData synchronization between databases
Independent scalingMore complex deployment and backup
Team can specializeBroader skill requirements

Start with one database (PostgreSQL). Add specialized databases only when you have a proven performance problem that the primary database cannot solve. Every additional database adds operational complexity.


Migration Patterns

PatternWhen to UseRisk
Big bangSmall dataset, simple schemaHigh: all or nothing
Dual writeWrite to both old and new during transitionMedium: data inconsistency risk
Change data capture (CDC)Stream changes from old to new DBLow: eventual consistency
Strangler figGradually route queries to new DBLowest: incremental, reversible

Implementation Checklist

  • Identify your primary access patterns before choosing a database
  • Start with PostgreSQL unless you have a specific reason not to
  • Add specialized databases only for proven performance problems
  • Document why each database was chosen (Architecture Decision Record)
  • Plan for data synchronization if using multiple databases
  • Set up monitoring for each database: query latency, connection count, storage growth
  • Implement connection pooling for every database connection
  • Design your schema for the most common query, not the most complex one
  • Test with realistic data volumes (not 100 rows — 1 million rows minimum)
  • Have a migration plan before you need one (because you will need one)
Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →