1. ai
  2. /building
  3. /embeddings-and-vector-db

Embeddings and Vector DBs

RAG quality depends more on chunking and retrieval than on which vector database logo you pick — but the store still matters for ops, cost, and latency.

Last reviewed: June 2026

Embedding model dimensions and vector DB pricing change. Verify provider docs before production.

Embedding models

Model (2026 examples)DimensionsUse when
OpenAI text-embedding-3-small1536 (configurable)Default product RAG; good cost/quality
OpenAI text-embedding-3-large3072Higher retrieval precision
Cohere embed v3VariesMultilingual search
Open-source (e.g. via Ollama)VariesAir-gapped; you host inference

Code: OpenAI API embeddings section.

Rule: Use the same embedding model at index and query time. Re-embed entire corpus on model change.

Vector store comparison

StoreBest forTradeoffs
pgvector (Postgres)Already on Postgres; moderate scaleOps you already know; tune indexes
PineconeManaged; fast iterationVendor lock-in; cost at scale
Weaviate / QdrantSelf-host or cloud; hybrid searchOperate another service
Redis VectorLow-latency cache + vectorsSmaller corpora
S3 + batchOffline reindex pipelinesNot for interactive low-latency

For codebase RAG in dev tools, often MCP + search beats full vector stack — see RAG for Codebases.

pgvector quick start

CREATE EXTENSION vector;

CREATE TABLE document_chunks (
  id bigserial PRIMARY KEY,
  source_path text NOT NULL,
  chunk_index int NOT NULL,
  content text NOT NULL,
  embedding vector(1536)
);

CREATE INDEX ON document_chunks USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);
// Query nearest neighbors
const results = await db.query(`
  SELECT content, source_path, 1 - (embedding <=> $1) AS score
  FROM document_chunks
  ORDER BY embedding <=> $1
  LIMIT 8
`, [queryEmbedding]);

Chunking for code

ContentChunk strategy
Markdown docsBy heading (~500–800 tokens)
Source filesWhole file if small; else by function/class
API specsOne endpoint per chunk
TicketsOne ticket per chunk

Include source_path and line range in metadata for citations.

Production concerns

ConcernWhat to do
CostBatch embed offline; cache query embeddings
LatencyTop-k 5–10; rerank only if needed
StalenessReindex on merge to main; version index
SecurityTenant isolation; no cross-customer index
EvalMeasure recall@k on golden questions — Observability