Skip to content
Insight · AI in production
AI in production · 12 JAN 2026 · 8 min read

Choosing a vector store in 2026

The vector-store decision is the most over-discussed and under-consequential choice in the modern AI stack. We have shipped retrieval-heavy production systems on Postgres, Qdrant, Pinecone, Weaviate, and a custom in-process index, and the choice has rarely been the thing that determined whether the project succeeded. What has determined success is whether the team understood the cost envelope, the operational profile, and the failure modes of the choice they made. This piece is a 2026 working note on how we make that choice on actual engagements.

The honest default is Postgres

For a project that expects to embed and retrieve fewer than ten million chunks, with query latency requirements above 100 milliseconds at the 95th percentile, with a team that already operates a Postgres instance — the honest default is pgvector on the existing Postgres. Not because Postgres is the best vector store. It is not. It is because the operational cost of adding a second database to a project that does not strictly need one is, in our experience, the largest hidden cost in a retrieval system.

Postgres with pgvector handles the scale that most projects actually have. It runs in the customer's existing infrastructure. It backs up with the rest of the database. It is operated by people who already know how to operate it. The query language is SQL, which means the rest of the application can join vector queries against business data without a second integration. These are properties that compound over the lifetime of the system, and they are properties that the dedicated vector stores cannot match without significant operational investment.

-- pgvector with HNSW index, the 2026 default
CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

SELECT id, content
FROM documents
WHERE tenant_id = $1
ORDER BY embedding <=> $2
LIMIT 8;

When Postgres is not enough

There are real cases where pgvector is the wrong choice. The first is scale — past roughly fifty million embeddings, the index-build times and the memory pressure on a single Postgres instance become operationally painful enough that a dedicated store is justified. The second is latency — if the application is bound by single-digit-millisecond query times at high concurrency, a purpose-built store will hold up better. The third is filtering complexity — if every query needs to combine a vector search with a complex pre-filter against a large attribute space, the dedicated stores have invested more in this than pgvector has.

The fourth case is multi-tenancy at extreme scale, which is its own discussion and is the case where Pinecone, in particular, has historically earned its keep. If the system needs to host distinct vector spaces for thousands of customers, with hard isolation guarantees and per-tenant index lifecycles, a managed service is almost always the right call.

What we actually consider

On a real engagement, the questions we ask are not 'which vector store is the best' — they are: how many embeddings will the system hold at the end of year two, what is the expected query rate, what is the latency budget, what is the existing infrastructure, what is the team's operational capability, and what is the cost ceiling. The answers usually point at a small number of viable options. The choice between them is then made on cost and on which option creates the smallest second-system effect for the rest of the architecture.

The question is not which vector store is best. The question is which vector store costs the team the least to operate over the next three years.

Embeddings are the bigger choice

We spend more time on the embedding-model choice than on the store choice, because the embedding model determines both the retrieval quality and the cost envelope, and changing it later is expensive. In 2026 we default to one of three options — OpenAI's small text-embedding-3-small for cost-sensitive cases, Voyage AI's voyage-3 for cases that need higher recall on technical content, or a self-hosted bge-m3 for cases where data residency requirements rule out hosted APIs. Each of these has shipped for us multiple times, and each has a known cost and operational profile.

The mistake we see most often is teams that pick the highest-quality available embedding model on the basis of a benchmark and then discover, in production, that the cost of re-embedding their corpus quarterly is the largest line item in the entire system. The right framing is to pick the lowest-cost embedding model that meets the retrieval quality bar, not the highest-quality model that fits the budget.

Hybrid is usually the right answer

Pure vector search rarely outperforms hybrid search — a combination of vector similarity and a traditional lexical method like BM25 — on real production data. Vector search excels at semantic similarity. Lexical search excels at exact-term matching. Real queries contain both. A hybrid system, with a small re-ranker on top, almost always outperforms either alone, and the engineering cost of the hybrid is modest.

Postgres with pgvector and the built-in tsvector lexical index is a workable hybrid system out of the box. Most production retrieval systems we have shipped use exactly this combination, with a small cross-encoder for the final re-ranking step. The total stack is two pieces of software, both operated by the existing infrastructure team, and the retrieval quality is, in most cases, competitive with a dedicated vector platform at a fraction of the operational cost.

The 2026 advice in two sentences

Default to pgvector on your existing Postgres. Move to a dedicated vector store only when the cost of staying on Postgres is concretely larger than the operational cost of adding a second database to your stack — and on a real engagement, that crossover happens later than the marketing materials of the dedicated stores suggest.

Author
Dima Livshitz
Founder & Engineering Lead

Twelve years building production software for marketplaces, banks, and platform teams. Writes about AI engineering, delivery, and the parts of consulting nobody likes to publish.

Stay in the loop

One email per month. New essays only.

Working on something?

If this is the kind of work you need shipped, the studio is open for new engagements this quarter.