Vector Databases Sold You a Lie
You spent six months migrating your RAG pipeline to a dedicated vector database, and your AI agents still hallucinate like college freshmen at last call.
The irony? Your retrieval latency is actually fine. The vector search returns candidates in under 50ms. The embeddings are solid. The model’s context window isn’t even half full.
So why does your agent still answer “What’s the capital of France?” with “A city in Europe” instead of “Paris”?
Because vector databases optimize for the wrong thing. They solve nearest neighbor search beautifully. They fail at the actual problem: getting the right data to your model in a format it can use. And that problem? SQL has been solving it for fifty years.
The Similarity Score Mirage
Here’s what nobody tells you in the vector database hype cycle: approximate nearest neighbor (ANN) search doesn’t care about relevance. It cares about distance.
Think of it like airport security lines. The TSA agent (your vector DB) pulls you out of line because you sort of match a profile. But they don’t ask why you match, or whether matching matters. They just hand you off to an interrogator (your LLM) and say “This one looks suspicious.”
| Metric | Vector DB | SQL |
|---|---|---|
| Search type | Approximate | Exact |
| Relevance guarantee | None | Yes (joins, constraints) |
| Metadata filtering | After search | Before/during |
| Set operations | Manual | Native |
| Time-series aware | No | Window functions |
The industry benchmark, BEIR, shows that even top-tier embedding models plateau around 60-70% retrieval accuracy on standard datasets. Meanwhile, a simple SQL WHERE clause hits 100% on exact matches every time.
The Filter-After-Filter Lie
Your vector database claims it supports filtering. Read the fine print. Most do filtered search by running ANN first, then applying your filters to the results.
That 20ms search? Still happens. But now you’re filtering 1000 results down to 3, and the top 10 candidates from the vector scan don’t even satisfy your filter criteria. Your agent gets garbage because the most similar vectors didn’t survive the WHERE clause.
I watched a team at a Series B fintech company spend three months debugging why their agent couldn’t find “recent transactions over $1000.” Turns out, their vector DB was returning transactions with similar text descriptions that were from six years ago, then filtering out the year 2024 results because the vector search never considered recency.
A block of three numbered points:
- Vector DBs optimize for cosine similarity, not business logic.
- Metadata filters run as a post-processing step in most implementations.
- You cannot express “find me the 5 most similar documents that also satisfy {X, Y, Z}” as a single atomic operation.
The Row Store You Already Own
Here’s the uncomfortable truth: your data lives in a relational database. Your embeddings live in a separate vector DB. Your application code lives in between, shuttling IDs back and forth.
This is the worst possible architecture. You’ve recreated the N+1 query problem, but now with 768-dimensional floating point vectors instead of integer primary keys.
# What you're probably doing:
vector_ids = vector_db.search(query_embedding, top_k=10)
for vid in vector_ids:
row = postgres.query("SELECT * FROM documents WHERE id = $1", vid)
# Now loop again to check business rules, time filters, user permissions...
if passes_filters(row):
candidate_rows.append(row)
# What you should be doing:
candidate_rows = postgres.query("""
SELECT * FROM documents
WHERE embedding <=> $1 < 0.6 -- Annoy index on this
AND created_at > NOW() - INTERVAL '30 days'
AND user_id = $2
ORDER BY embedding <=> $1
LIMIT 5
""")
Postgres 14+ with pgvector doesn’t just support vector search. It integrates it into the query planner. The optimizer decides whether to do the vector scan first or the filter first. It respects transactions, constraints, and replication. It doesn’t ship your data over the network twice.
The Hidden Cost of Yet Another Service
Your vector database needs its own cluster, its own backups, its own capacity planning, its own monitoring dashboard. Each of these is a vector for failure.
Netflix’s engineering blog documented a case where a separate vector service added 150ms of P99 latency to their recommendation pipeline—not because the search was slow, but because the network round-trip between their microservice and the vector store created tail latency amplification.
When your LLM needs to answer a question in under 2 seconds to feel “reactive,” every 50ms hop matters. A three-service pipeline (LLM → router → vector DB → PostgreSQL → LLM) adds 200-400ms of overhead before a single line of business logic runs.
Stop treating vector search like magic. Start treating it like what it is: a specialized indexing strategy that belongs inside your existing database, not a separate god-service that your entire agent depends on.
- Embeddings are just features. Store them in a column that supports nearest-neighbor indexing.
- Metadata is first-class, not an afterthought. Let the query planner handle the join.
- One database, one set of constraints, one latency budget.
The Real Talk Your Architect Won’t Have
Vector databases solve a toy problem: “find similar things.” Your production system solves a hard problem: “find similar things that are relevant to this specific user, within these specific constraints, under this time limit, without violating data consistency.”
The industry is slowly admitting this. Pinecone released metadata filtering. Weaviate added hybrid search. But these are bandaids on a architecture that should have been a feature, not a product.
Next time your AI agent hallucinates a customers’ name or recommends a product that’s out of stock, don’t debug the prompt. Debug the pipeline. Your vector database isn’t the bottleneck because it’s slow. It’s the bottleneck because it’s solving the wrong problem.
Comments