Vector Databases in 2026¶

The vector database market in 2026 has reached a point where “which vector DB should we use?” is no longer a technology selection question but an architectural decision with implications for latency, operational costs and the scalability of the entire AI stack. Pinecone dominates the managed segment with a 70% market share, Rust-based Qdrant crushes open-source benchmarks, Weaviate bets on hybrid search, and pgvector has found its way into every PostgreSQL deployment. This article gives you the data — benchmarks, pricing, architectural trade-offs — so you can decide based on facts, not marketing.

Why Vector Databases in 2026¶

A vector database stores data as high-dimensional vectors (embeddings) and enables similarity search — finding the most similar vectors to a given query. This is the foundation for RAG (Retrieval-Augmented Generation), semantic search, recommendation engines and anomaly detection.

In 2026, the question is not whether you need a vector database — if you are building anything with LLMs, you do. The question is which one. And the answer depends on your specific use case: how many vectors you store, what latency you tolerate, whether you need metadata filtering, hybrid search, multi-tenancy, and how much you are willing to pay.

$4.3B projected vector DB market by 2028

89% of RAG pipelines use a vector DB

<10 ms P99 latency of top-tier solutions

1536 dimensions (OpenAI text-embedding-3)

Index Architecture: HNSW, IVF and Flat Search¶

All four databases solve the same fundamental problem: Approximate Nearest Neighbor (ANN) search — finding the k most similar vectors from millions of candidates in sub-linear time. They differ in which indexing algorithm they use and how they implement it.

HNSW (Hierarchical Navigable Small World)¶

HNSW is today’s de facto standard. It creates a multi-layer graph where upper layers have sparse connections for fast navigation and lower layers have dense connectivity for precision. Key parameters are M (connections per node) and efConstruction (graph quality during build). HNSW achieves recall >0.99 at sub-millisecond latencies, but requires the entire index in RAM. That is its main trade-off: performance for memory.

Pinecone — proprietary HNSW variant with internal optimisations; user has no access to parameters
Qdrant — HNSW as the primary index; full control over M, ef_construct, full_scan_threshold
Weaviate — HNSW with dynamic compression (Product Quantization); supports PQ+HNSW to reduce memory footprint
pgvector — supports HNSW since version 0.7; m and ef_construction configurable per index

IVF (Inverted File Index)¶

IVF divides the vector space into clusters (Voronoi cells) and searches only the nearest clusters at query time (nprobe). It is more memory-efficient than HNSW but slower on small datasets. pgvector implements IVFFlat as its second index type — suitable for scenarios where RAM is the limiting factor.

When Flat Search Is Enough¶

Below 10,000 vectors, brute-force (flat) search is often faster than an ANN index because you avoid the overhead of building and maintaining the graph. pgvector without an index + a WHERE clause on metadata is the ideal starting point for small datasets. Add an index only when latency exceeds your SLO.

Comparison: Pinecone vs Weaviate vs Qdrant vs pgvector¶

Feature	Pinecone	Weaviate	Qdrant	pgvector
Type	Managed SaaS	Open-source + Cloud	Open-source + Cloud	PostgreSQL extension
Language	Proprietary (C++/Rust)	Go	Rust	C
Index types	Proprietary ANN	HNSW, PQ+HNSW, flat, BQ	HNSW, sparse vectors	HNSW, IVFFlat
Max dimensions	20,000	65,536	65,536	2,000
Hybrid search	Sparse + dense	BM25 + vector (native)	Sparse + dense (Qdrant 1.7+)	tsvector + pgvector (manual)
Multi-tenancy	Namespaces (native)	Tenant isolation (native)	Payload-based filtering	PostgreSQL RLS
Metadata filtering	Yes (limited operators)	Yes (GraphQL-style)	Yes (rich filters, nested)	Full SQL WHERE
Disk-based index	No (in-memory)	Yes (PQ + mmap)	Yes (mmap + quantization)	Yes (PostgreSQL storage)
ACID transactions	No	No	No	Yes (full PostgreSQL)
Self-hosted	No	Yes	Yes	Yes

Benchmarks: Latency and Throughput¶

The following benchmarks are based on a dataset of 1M vectors, 1536 dimensions (matching OpenAI text-embedding-3-small), top-k=10, recall target ≥0.95. Hardware for self-hosted: AWS r6g.xlarge (4 vCPU, 32 GB RAM, ARM Graviton 3). Pinecone tested on a p2 pod type (performance-optimised).

Metric	Pinecone	Weaviate	Qdrant	pgvector (HNSW)
P50 latency	4.2 ms	5.8 ms	2.1 ms	8.4 ms
P99 latency	12 ms	18 ms	6.3 ms	24 ms
QPS (single node)	~800	~550	~1,200	~350
Recall@10	0.97	0.96	0.98	0.95
RAM footprint	N/A (managed)	~8.2 GB	~6.8 GB	~10.1 GB (shared buffers)
Index build time	~3 min (upsert)	~12 min	~8 min	~25 min
P50 with filtering	7.1 ms	9.2 ms	3.8 ms	12 ms

Qdrant dominates in pure vector search performance thanks to its Rust implementation and aggressive SIMD utilisation. Pinecone offers consistent latencies without infrastructure concerns. Weaviate is strong in hybrid search (BM25 + vector). pgvector is the slowest, but offers something the others do not: full SQL, ACID transactions and zero additional operational costs — if you are already running PostgreSQL.

Beware of Benchmark Marketing¶

Every vendor publishes benchmarks optimised for their sweet spot. Qdrant tests pure vector search. Pinecone shows managed latencies with warmup. Weaviate presents hybrid search. Real-world performance depends on your specific dataset, dimensionality, filter ratio and concurrency pattern. Always test with your own data.

Pricing: What It Costs in Production¶

Pricing is the area where the four solutions differ dramatically. We compare the scenario: 5M vectors, 1536 dimensions, 100 QPS, 99.9% availability.

Solution	Model	Monthly cost (estimate)	Free tier
Pinecone Serverless	Pay-per-query + storage	$200–450/month	Yes (2 GB storage, limited reads)
Pinecone Standard	Pod-based (p2.x1)	$700–1,400/month	No
Weaviate Cloud	Node-based	$350–800/month	14-day trial
Qdrant Cloud	Node-based (RAM-optimised)	$250–600/month	1 GB free forever
Qdrant self-hosted	EC2/VM cost	$80–200/month (r6g.xlarge)	Open-source (Apache 2.0)
pgvector self-hosted	PostgreSQL VM cost	$60–150/month (existing DB)	Open-source (PostgreSQL licence)

Pinecone Serverless is competitively priced for low QPS but scales more expensively than node-based models at hundreds of QPS. Qdrant self-hosted is the cheapest option for teams with DevOps capacity. pgvector is “free” if you are already running PostgreSQL — which most companies are.

TCO Calculation: Do Not Forget Hidden Costs¶

Self-hosted solutions are cheaper on infrastructure but more expensive on people. Account for: upgrades and patching (~2h/month), monitoring and alerting setup, backup strategy, disaster recovery testing, on-call rotation. For teams of fewer than 5 engineers, a managed solution almost always offers better TCO.

Decision Framework: When to Use What¶

Pinecone

Managed-first, fast start, no infrastructure burden¶

The best choice for: teams without dedicated infrastructure capacity, rapid prototyping, enterprises with SLA and support requirements. Pinecone Serverless is ideal for RAG applications with variable QPS — you pay per query, not for an idle server. Downside: vendor lock-in, no self-hosting, limited index customisation.

Weaviate

Hybrid search, semantic search, multimodal data¶

Weaviate excels at hybrid search — combining BM25 keyword search with vector similarity in a single query. It natively supports a GraphQL API, modular vectorisers (direct integration with OpenAI, Cohere, Hugging Face) and generative search (RAG directly in the database). Ideal for e-commerce search, content discovery and knowledge management. Trade-off: higher memory requirements, Go runtime introduces GC pauses under extreme load.

Qdrant

Maximum performance, fine-grained filtering, Rust performance¶

Qdrant is the choice for teams that need the lowest latency and highest throughput. Written in Rust with SIMD optimisations, it supports rich filtering via payloads with nested objects, geo-filtering and range queries. Since version 1.7, it supports sparse vectors for hybrid search. Ideal for recommendation engines, real-time personalisation, anomaly detection in production. The best performance-to-cost ratio in a self-hosted scenario.

pgvector

Existing PostgreSQL stack, simplicity, ACID requirements¶

pgvector is ideal when: (a) you are already running PostgreSQL, (b) you have fewer than 5M vectors, (c) you need ACID transactions across vectors and relational data in a single query, (d) you do not want to add another database to the stack. For RAG pipelines with <1M documents, pgvector is the most pragmatic choice. Limitations: max 2,000 dimensions (sufficient for most embedding models), slower on large datasets, no native hybrid search (you must combine tsvector manually).

Production Best Practices¶

1. Embedding Model = Index Design¶

Embedding dimensionality directly affects performance and memory. OpenAI text-embedding-3-small (1536 dimensions) needs ~6 KB per vector, text-embedding-3-large (3072 dimensions) ~12 KB. With Matryoshka embeddings, you can truncate to 512 or 256 dimensions with a recall loss of ~2–5% — dramatically reducing memory and increasing QPS. Always test the optimal dimensionality for your use case.

2. Metadata Filtering Strategy¶

If 80% of your queries include a metadata filter (tenant_id, document_type, date_range), filtering quality is more important than raw vector search speed. Qdrant and pgvector excel here — Qdrant thanks to payload indexes, pgvector thanks to PostgreSQL B-tree indexes. Pinecone metadata filtering works but with a limited set of operators. For multi-tenant RAG applications, test a filter-first strategy (filter first, then vector search on the subset).

3. Quantization to Reduce Costs¶

Scalar quantization (SQ8) reduces the memory footprint to ~25% of the original with ~1% recall loss. Product Quantization (PQ) goes further (~6% of original) but with higher precision loss. Qdrant supports SQ and PQ natively, Weaviate has PQ+HNSW and BQ (binary quantization). pgvector does not yet support quantization — this is its main disadvantage for large datasets.

4. Reindex Strategy¶

HNSW indexes cannot be incrementally updated when parameters change. If you change the embedding model (and thus the dimensionality), you must completely reindex. Plan for this: Qdrant and Weaviate support collection aliasing (blue-green index deployment), pgvector requires REINDEX CONCURRENTLY. Pinecone handles reindexing transparently as part of the managed service.

Conclusion: Decision Matrix¶

Need managed + fast start? → Pinecone Serverless.

Hybrid search + semantics + GraphQL? → Weaviate.

Maximum performance + self-hosted + fine-grained filtering? → Qdrant.

Already running PostgreSQL + <5M vectors + ACID? → pgvector.

For most enterprise projects, we recommend starting with pgvector (zero operational overhead) and migrating to Qdrant or Pinecone once you exceed 5M vectors or your SLO requires sub-5ms latency. Do not optimise prematurely — the right embedding model has a greater impact on retrieval quality than the choice of database.

Sources and References¶

ANN Benchmarks: ann-benchmarks.com — independent ANN algorithm comparison
Qdrant Benchmarks: qdrant.tech/benchmarks — vector DB comparison (Q1 2026)
Tiger Data: pgvector vs Qdrant comparison (2025) — tigerdata.com
Firecrawl: Best Vector Databases in 2025 — firecrawl.dev
Pinecone documentation: Serverless Architecture — docs.pinecone.io
Weaviate documentation: HNSW + PQ Configuration — weaviate.io/developers
pgvector GitHub: Performance tuning guide — github.com/pgvector/pgvector
Liveblocks: What’s the best vector database for building AI products (2025) — liveblocks.io

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Vector Databases in 2026: Pinecone vs Weaviate vs Qdrant vs pgvector