Verified by Garnet Grid

Vector Databases: Architecture & Selection Guide

Understand vector database internals and choose the right one. Covers embedding storage, ANN algorithms, and comparisons of Pinecone, Weaviate, Qdrant, Milvus, and pgvector.

Vector databases store and search high-dimensional embeddings — the numerical representations that AI models create from text, images, audio, and other unstructured data. They power semantic search, recommendation engines, RAG (Retrieval-Augmented Generation) systems, anomaly detection, and de-duplication. As AI adoption accelerates, vector databases have become critical infrastructure.

This guide covers how vector search works internally, algorithm trade-offs, database selection criteria, and production best practices that prevent performance and cost surprises.


How Vector Search Works

Step 1: Generate Embeddings

An embedding model converts unstructured data (text, images) into fixed-length numerical vectors. Similar content produces similar vectors, enabling mathematical similarity comparison.

from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
    input="How do I optimize PostgreSQL for high-traffic applications?",
    model="text-embedding-3-small"
)
embedding = response.data[0].embedding  # [0.012, -0.045, ...] (1536 floats)

Step 2: Choose a Distance Metric

The distance metric determines how “similarity” is calculated between vectors.

MetricBest ForHow It Works
Cosine SimilarityText embeddings (normalized vectors)Measures angle between vectors — 1.0 = identical, 0.0 = unrelated
Euclidean (L2)Image embeddings, spatial dataMeasures straight-line distance — 0.0 = identical, higher = farther
Dot ProductWhen magnitude matters (popularity, relevance scoring)Combines direction and magnitude — higher = more similar

Rule of thumb: Use cosine similarity for text-based applications. Most embedding models produce normalized vectors where cosine similarity and dot product are equivalent.


ANN Algorithms Deep Dive

Exact nearest-neighbor search (brute force) is O(n) — too slow at scale. Approximate Nearest Neighbor (ANN) algorithms trade small accuracy losses for massive speed gains.

HNSW (Hierarchical Navigable Small World)

The most popular algorithm. HNSW builds a multi-layer graph (like a skip list) where higher layers contain fewer nodes for fast traversal, and lower layers contain all nodes for precision.

Key Parameters:

ParameterWhat It ControlsRecommended RangeTrade-off
MMax connections per node16-64Higher = better recall, more memory
ef_constructionBuild-time search width100-400Higher = better graph quality, slower build
ef_searchQuery-time search width50-200Higher = better recall, slower queries

When to use: Most production workloads. Best balance of speed, accuracy, and memory.

IVF (Inverted File Index)

Clusters vectors into partitions (Voronoi cells). At query time, searches only the closest clusters instead of the full dataset.

Key Parameters:

ParameterWhat It ControlsRecommended
nlistNumber of clusters√(n) to 4×√(n)
nprobeClusters searched per query5-20% of nlist

When to use: Very large datasets where memory is constrained. Often combined with Product Quantization (PQ) for compression.

Algorithm Comparison

AlgorithmSpeedMemoryAccuracyBuild Time
HNSWFastest (sub-ms)High (full vectors in memory)Highest (99%+ recall)Slow
IVF-PQFastLow (compressed vectors)Good (90-95% recall)Medium
IVF-FlatMediumMediumVery Good (97%+ recall)Fast
Flat (exact)Slow (O(n))LowPerfect (100%)None
ScaNNVery FastMediumVery HighMedium

Database Comparison

FeaturePineconeWeaviateQdrantMilvuspgvector
HostingFully managedBoth (cloud + self-hosted)BothSelf-hosted (Zilliz Cloud)PostgreSQL extension
AlgorithmProprietary (optimized)HNSWHNSWHNSW + IVF + ScaNNHNSW + IVF
Hybrid searchYes (dense + sparse)Yes (BM25 + vector)Yes (sparse + dense)YesWith tsvector
Max scaleBillions (serverless)BillionsBillionsBillionsTens of millions
FilteringMetadata filtering (pre/post)GraphQL-style filtersPayload filteringAttribute filteringStandard SQL WHERE
Multi-tenancyNamespacesMulti-tenant classesCollection-levelPartitionsSchema-level
Best forSimplicity, serverlessMulti-modal (text + images)Raw performanceLarge-scale ML pipelinesExisting PostgreSQL installations
PricingPay-per-query (serverless)Open source + cloudOpen source + cloudOpen source + ZillizFree (PostgreSQL)

Implementation: Pinecone (Managed)

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")
index = pc.Index("knowledge-base")

# Upsert documents with metadata for filtering
index.upsert(vectors=[{
    "id": "doc-001",
    "values": embedding_vector,
    "metadata": {
        "source": "guide.pdf",
        "category": "cloud",
        "text": "How to optimize cloud costs...",
        "word_count": 1500,
        "published": "2025-01-15"
    }
}])

# Semantic search with metadata filter
results = index.query(
    vector=query_embedding,
    top_k=10,
    filter={"category": {"$eq": "cloud"}},
    include_metadata=True
)

for match in results['matches']:
    print(f"Score: {match['score']:.3f} | {match['metadata']['text'][:80]}")

Implementation: pgvector (Self-Hosted)

If you already run PostgreSQL, pgvector avoids adding a new database to your infrastructure.

-- Enable the extension
CREATE EXTENSION vector;

-- Create table with vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title TEXT,
    content TEXT,
    category TEXT,
    embedding vector(1536),
    created_at TIMESTAMP DEFAULT NOW()
);

-- Create HNSW index for fast similarity search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 200);

-- Semantic search with SQL filtering
SELECT id, title,
       1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE category = 'cloud'
ORDER BY embedding <=> $1::vector
LIMIT 10;

pgvector Limitations

pgvector works well up to 5-10 million vectors. Beyond that, query latency increases and memory requirements grow quickly. If you expect to exceed this scale, plan a migration path to a dedicated vector database.


Selection Decision Tree

Need managed, serverless, minimal ops? → Pinecone
Already using PostgreSQL and < 5M vectors? → pgvector
Need multi-modal (text + images + video)? → Weaviate
Need max query performance, self-hosted? → Qdrant or Milvus
Prototyping / local development? → Chroma or LanceDB
Need hybrid search (keyword + semantic)? → Pinecone, Weaviate, or Qdrant
Budget constrained, open-source preferred? → Qdrant or pgvector

Production Best Practices

Chunking Strategy

Chunking (splitting documents into smaller pieces before embedding) directly impacts retrieval quality. Too large and the embedding averages out important details. Too small and context is lost.

# Recommended: ~500 tokens with 50-token overlap
# Overlap prevents information loss at chunk boundaries
chunks = split_text(document, chunk_size=500, chunk_overlap=50)
Content TypeRecommended Chunk SizeOverlap
Technical documentation400-600 tokens50-100 tokens
Legal contracts200-400 tokens50 tokens
Code filesPer function/class0 (natural boundaries)
FAQ/knowledge basePer question-answer pair0

Embedding Model Selection

ModelDimensionsCostQuality
text-embedding-3-small (OpenAI)1536$0.02/M tokensGood
text-embedding-3-large (OpenAI)3072$0.13/M tokensBetter
Cohere embed-v31024$0.10/M tokensVery Good
BGE-M3 (open source)1024Free (self-hosted)Good
Voyage AI voyage-31024$0.06/M tokensExcellent

Cost Control

  • Dimensionality reduction — OpenAI’s embedding-3 models support dimensions parameter to reduce vector size (e.g., 1536 → 512) with minimal quality loss
  • Serverless vs provisioned — Use serverless (Pinecone) for variable workloads, provisioned for steady high throughput
  • Batch embedding — Embed documents in batches, not one at a time, to reduce API calls
  • Cache frequent queries — Cache embedding + results for repeated queries

Vector Database Selection Criteria

CriteriaPineconeWeaviateMilvusChromaQdrant
DeploymentFully managedSelf-hosted + cloudSelf-hosted + ZillizEmbedded + cloudSelf-hosted + cloud
Max vectorsBillionsMillionsBillionsMillionsMillions
FilteringMetadata filteringHybrid (vector + BM25)Attribute filteringMetadata filteringPayload filtering
Ease of useEasiestGoodComplexEasiestGood
CostPer-pod pricingFree (self-hosted)Free (self-hosted)Free (self-hosted)Free (self-hosted)
Best forProduction SaaSHybrid searchLarge-scale enterprisePrototyping, local devPerformance-focused

Embedding Dimension Trade-Offs

Higher dimensions capture more semantic nuance but increase storage and computation costs:

  • 256-384 dimensions — Sufficient for simple similarity search, fastest queries, lowest cost
  • 768-1024 dimensions — Good balance for most production use cases
  • 1536-3072 dimensions — Maximum quality for complex retrieval, highest cost
  • Quantization — Reduce storage 4x by converting float32 to int8 with less than 5 percent quality loss

Checklist

  • Embedding model selected and benchmarked against your data
  • Chunking strategy tested with representative documents
  • Vector database selected based on scale, ops capability, and budget
  • Index parameters configured and tuned (M, ef_construction, ef_search)
  • Metadata schema designed for filtering (categories, dates, sources)
  • Retrieval quality measured (precision@k, recall@k, MRR)
  • Backup and disaster recovery planned (index snapshots, embedding re-generation)
  • Cost projections calculated for expected query volume and storage
  • Monitoring in place (query latency p99, index size, error rates)

:::note[Source] This guide is derived from operational intelligence at Garnet Grid Consulting. For AI infrastructure consulting, visit garnetgrid.com. :::

Jakub Dimitri Rezayev
Jakub Dimitri Rezayev
Founder & Chief Architect • Garnet Grid Consulting

Jakub holds an M.S. in Customer Intelligence & Analytics and a B.S. in Finance & Computer Science from Pace University. With deep expertise spanning D365 F&O, Azure, Power BI, and AI/ML systems, he architects enterprise solutions that bridge legacy systems and modern technology — and has led multi-million dollar ERP implementations for Fortune 500 supply chains.

View Full Profile →