Embeddings and Vector DBs
RAG quality depends more on chunking and retrieval than on which vector database logo you pick — but the store still matters for ops, cost, and latency.
Last reviewed: June 2026
Embedding model dimensions and vector DB pricing change. Verify provider docs before production.
Embedding models
| Model (2026 examples) | Dimensions | Use when |
|---|---|---|
OpenAI text-embedding-3-small | 1536 (configurable) | Default product RAG; good cost/quality |
OpenAI text-embedding-3-large | 3072 | Higher retrieval precision |
| Cohere embed v3 | Varies | Multilingual search |
| Open-source (e.g. via Ollama) | Varies | Air-gapped; you host inference |
Code: OpenAI API embeddings section.
Rule: Use the same embedding model at index and query time. Re-embed entire corpus on model change.
Vector store comparison
| Store | Best for | Tradeoffs |
|---|---|---|
| pgvector (Postgres) | Already on Postgres; moderate scale | Ops you already know; tune indexes |
| Pinecone | Managed; fast iteration | Vendor lock-in; cost at scale |
| Weaviate / Qdrant | Self-host or cloud; hybrid search | Operate another service |
| Redis Vector | Low-latency cache + vectors | Smaller corpora |
| S3 + batch | Offline reindex pipelines | Not for interactive low-latency |
For codebase RAG in dev tools, often MCP + search beats full vector stack — see RAG for Codebases.
pgvector quick start
CREATE EXTENSION vector;
CREATE TABLE document_chunks (
id bigserial PRIMARY KEY,
source_path text NOT NULL,
chunk_index int NOT NULL,
content text NOT NULL,
embedding vector(1536)
);
CREATE INDEX ON document_chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
// Query nearest neighbors
const results = await db.query(`
SELECT content, source_path, 1 - (embedding <=> $1) AS score
FROM document_chunks
ORDER BY embedding <=> $1
LIMIT 8
`, [queryEmbedding]);
Chunking for code
| Content | Chunk strategy |
|---|---|
| Markdown docs | By heading (~500–800 tokens) |
| Source files | Whole file if small; else by function/class |
| API specs | One endpoint per chunk |
| Tickets | One ticket per chunk |
Include source_path and line range in metadata for citations.
Production concerns
| Concern | What to do |
|---|---|
| Cost | Batch embed offline; cache query embeddings |
| Latency | Top-k 5–10; rerank only if needed |
| Staleness | Reindex on merge to main; version index |
| Security | Tenant isolation; no cross-customer index |
| Eval | Measure recall@k on golden questions — Observability |