_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN
Let's talk

Vector Databases in 2026: Pinecone vs Weaviate vs Qdrant vs pgvector

15. 12. 2025 10 min read CORE SYSTEMSdata
Vector Databases in 2026: Pinecone vs Weaviate vs Qdrant vs pgvector

Vector Databases in 2026

The vector database market in 2026 has reached a point where “which vector DB should we use?” is no longer a technology selection question but an architectural decision with implications for latency, operational costs and the scalability of the entire AI stack. Pinecone dominates the managed segment with a 70% market share, Rust-based Qdrant crushes open-source benchmarks, Weaviate bets on hybrid search, and pgvector has found its way into every PostgreSQL deployment. This article gives you the data — benchmarks, pricing, architectural trade-offs — so you can decide based on facts, not marketing.

Why Vector Databases in 2026

A vector database stores data as high-dimensional vectors (embeddings) and enables similarity search — finding the most similar vectors to a given query. This is the foundation for RAG (Retrieval-Augmented Generation), semantic search, recommendation engines and anomaly detection.

In 2026, the question is not whether you need a vector database — if you are building anything with LLMs, you do. The question is which one. And the answer depends on your specific use case: how many vectors you store, what latency you tolerate, whether you need metadata filtering, hybrid search, multi-tenancy, and how much you are willing to pay.

$4.3B projected vector DB market by 2028

89% of RAG pipelines use a vector DB

<10 ms P99 latency of top-tier solutions

1536 dimensions (OpenAI text-embedding-3)

All four databases solve the same fundamental problem: Approximate Nearest Neighbor (ANN) search — finding the k most similar vectors from millions of candidates in sub-linear time. They differ in which indexing algorithm they use and how they implement it.

HNSW (Hierarchical Navigable Small World)

HNSW is today’s de facto standard. It creates a multi-layer graph where upper layers have sparse connections for fast navigation and lower layers have dense connectivity for precision. Key parameters are M (connections per node) and efConstruction (graph quality during build). HNSW achieves recall >0.99 at sub-millisecond latencies, but requires the entire index in RAM. That is its main trade-off: performance for memory.

  • Pinecone — proprietary HNSW variant with internal optimisations; user has no access to parameters
  • Qdrant — HNSW as the primary index; full control over M, ef_construct, full_scan_threshold
  • Weaviate — HNSW with dynamic compression (Product Quantization); supports PQ+HNSW to reduce memory footprint
  • pgvector — supports HNSW since version 0.7; m and ef_construction configurable per index

IVF (Inverted File Index)

IVF divides the vector space into clusters (Voronoi cells) and searches only the nearest clusters at query time (nprobe). It is more memory-efficient than HNSW but slower on small datasets. pgvector implements IVFFlat as its second index type — suitable for scenarios where RAM is the limiting factor.

When Flat Search Is Enough

Below 10,000 vectors, brute-force (flat) search is often faster than an ANN index because you avoid the overhead of building and maintaining the graph. pgvector without an index + a WHERE clause on metadata is the ideal starting point for small datasets. Add an index only when latency exceeds your SLO.

Comparison: Pinecone vs Weaviate vs Qdrant vs pgvector

Feature Pinecone Weaviate Qdrant pgvector
Type Managed SaaS Open-source + Cloud Open-source + Cloud PostgreSQL extension
Language Proprietary (C++/Rust) Go Rust C
Index types Proprietary ANN HNSW, PQ+HNSW, flat, BQ HNSW, sparse vectors HNSW, IVFFlat
Max dimensions 20,000 65,536 65,536 2,000
Hybrid search Sparse + dense BM25 + vector (native) Sparse + dense (Qdrant 1.7+) tsvector + pgvector (manual)
Multi-tenancy Namespaces (native) Tenant isolation (native) Payload-based filtering PostgreSQL RLS
Metadata filtering Yes (limited operators) Yes (GraphQL-style) Yes (rich filters, nested) Full SQL WHERE
Disk-based index No (in-memory) Yes (PQ + mmap) Yes (mmap + quantization) Yes (PostgreSQL storage)
ACID transactions No No No Yes (full PostgreSQL)
Self-hosted No Yes Yes Yes

Benchmarks: Latency and Throughput

The following benchmarks are based on a dataset of 1M vectors, 1536 dimensions (matching OpenAI text-embedding-3-small), top-k=10, recall target ≥0.95. Hardware for self-hosted: AWS r6g.xlarge (4 vCPU, 32 GB RAM, ARM Graviton 3). Pinecone tested on a p2 pod type (performance-optimised).

Metric Pinecone Weaviate Qdrant pgvector (HNSW)
P50 latency 4.2 ms 5.8 ms 2.1 ms 8.4 ms
P99 latency 12 ms 18 ms 6.3 ms 24 ms
QPS (single node) ~800 ~550 ~1,200 ~350
Recall@10 0.97 0.96 0.98 0.95
RAM footprint N/A (managed) ~8.2 GB ~6.8 GB ~10.1 GB (shared buffers)
Index build time ~3 min (upsert) ~12 min ~8 min ~25 min
P50 with filtering 7.1 ms 9.2 ms 3.8 ms 12 ms

Qdrant dominates in pure vector search performance thanks to its Rust implementation and aggressive SIMD utilisation. Pinecone offers consistent latencies without infrastructure concerns. Weaviate is strong in hybrid search (BM25 + vector). pgvector is the slowest, but offers something the others do not: full SQL, ACID transactions and zero additional operational costs — if you are already running PostgreSQL.

Beware of Benchmark Marketing

Every vendor publishes benchmarks optimised for their sweet spot. Qdrant tests pure vector search. Pinecone shows managed latencies with warmup. Weaviate presents hybrid search. Real-world performance depends on your specific dataset, dimensionality, filter ratio and concurrency pattern. Always test with your own data.

Pricing: What It Costs in Production

Pricing is the area where the four solutions differ dramatically. We compare the scenario: 5M vectors, 1536 dimensions, 100 QPS, 99.9% availability.

Solution Model Monthly cost (estimate) Free tier
Pinecone Serverless Pay-per-query + storage $200–450/month Yes (2 GB storage, limited reads)
Pinecone Standard Pod-based (p2.x1) $700–1,400/month No
Weaviate Cloud Node-based $350–800/month 14-day trial
Qdrant Cloud Node-based (RAM-optimised) $250–600/month 1 GB free forever
Qdrant self-hosted EC2/VM cost $80–200/month (r6g.xlarge) Open-source (Apache 2.0)
pgvector self-hosted PostgreSQL VM cost $60–150/month (existing DB) Open-source (PostgreSQL licence)

Pinecone Serverless is competitively priced for low QPS but scales more expensively than node-based models at hundreds of QPS. Qdrant self-hosted is the cheapest option for teams with DevOps capacity. pgvector is “free” if you are already running PostgreSQL — which most companies are.

TCO Calculation: Do Not Forget Hidden Costs

Self-hosted solutions are cheaper on infrastructure but more expensive on people. Account for: upgrades and patching (~2h/month), monitoring and alerting setup, backup strategy, disaster recovery testing, on-call rotation. For teams of fewer than 5 engineers, a managed solution almost always offers better TCO.

Decision Framework: When to Use What

Pinecone

Managed-first, fast start, no infrastructure burden

The best choice for: teams without dedicated infrastructure capacity, rapid prototyping, enterprises with SLA and support requirements. Pinecone Serverless is ideal for RAG applications with variable QPS — you pay per query, not for an idle server. Downside: vendor lock-in, no self-hosting, limited index customisation.

Weaviate

Hybrid search, semantic search, multimodal data

Weaviate excels at hybrid search — combining BM25 keyword search with vector similarity in a single query. It natively supports a GraphQL API, modular vectorisers (direct integration with OpenAI, Cohere, Hugging Face) and generative search (RAG directly in the database). Ideal for e-commerce search, content discovery and knowledge management. Trade-off: higher memory requirements, Go runtime introduces GC pauses under extreme load.

Qdrant

Maximum performance, fine-grained filtering, Rust performance

Qdrant is the choice for teams that need the lowest latency and highest throughput. Written in Rust with SIMD optimisations, it supports rich filtering via payloads with nested objects, geo-filtering and range queries. Since version 1.7, it supports sparse vectors for hybrid search. Ideal for recommendation engines, real-time personalisation, anomaly detection in production. The best performance-to-cost ratio in a self-hosted scenario.

pgvector

Existing PostgreSQL stack, simplicity, ACID requirements

pgvector is ideal when: (a) you are already running PostgreSQL, (b) you have fewer than 5M vectors, (c) you need ACID transactions across vectors and relational data in a single query, (d) you do not want to add another database to the stack. For RAG pipelines with <1M documents, pgvector is the most pragmatic choice. Limitations: max 2,000 dimensions (sufficient for most embedding models), slower on large datasets, no native hybrid search (you must combine tsvector manually).

Production Best Practices

1. Embedding Model = Index Design

Embedding dimensionality directly affects performance and memory. OpenAI text-embedding-3-small (1536 dimensions) needs ~6 KB per vector, text-embedding-3-large (3072 dimensions) ~12 KB. With Matryoshka embeddings, you can truncate to 512 or 256 dimensions with a recall loss of ~2–5% — dramatically reducing memory and increasing QPS. Always test the optimal dimensionality for your use case.

2. Metadata Filtering Strategy

If 80% of your queries include a metadata filter (tenant_id, document_type, date_range), filtering quality is more important than raw vector search speed. Qdrant and pgvector excel here — Qdrant thanks to payload indexes, pgvector thanks to PostgreSQL B-tree indexes. Pinecone metadata filtering works but with a limited set of operators. For multi-tenant RAG applications, test a filter-first strategy (filter first, then vector search on the subset).

3. Quantization to Reduce Costs

Scalar quantization (SQ8) reduces the memory footprint to ~25% of the original with ~1% recall loss. Product Quantization (PQ) goes further (~6% of original) but with higher precision loss. Qdrant supports SQ and PQ natively, Weaviate has PQ+HNSW and BQ (binary quantization). pgvector does not yet support quantization — this is its main disadvantage for large datasets.

4. Reindex Strategy

HNSW indexes cannot be incrementally updated when parameters change. If you change the embedding model (and thus the dimensionality), you must completely reindex. Plan for this: Qdrant and Weaviate support collection aliasing (blue-green index deployment), pgvector requires REINDEX CONCURRENTLY. Pinecone handles reindexing transparently as part of the managed service.

Conclusion: Decision Matrix

Need managed + fast start? → Pinecone Serverless.

Hybrid search + semantics + GraphQL? → Weaviate.

Maximum performance + self-hosted + fine-grained filtering? → Qdrant.

Already running PostgreSQL + <5M vectors + ACID? → pgvector.

For most enterprise projects, we recommend starting with pgvector (zero operational overhead) and migrating to Qdrant or Pinecone once you exceed 5M vectors or your SLO requires sub-5ms latency. Do not optimise prematurely — the right embedding model has a greater impact on retrieval quality than the choice of database.

Sources and References

  • ANN Benchmarks: ann-benchmarks.com — independent ANN algorithm comparison
  • Qdrant Benchmarks: qdrant.tech/benchmarks — vector DB comparison (Q1 2026)
  • Tiger Data: pgvector vs Qdrant comparison (2025) — tigerdata.com
  • Firecrawl: Best Vector Databases in 2025 — firecrawl.dev
  • Pinecone documentation: Serverless Architecture — docs.pinecone.io
  • Weaviate documentation: HNSW + PQ Configuration — weaviate.io/developers
  • pgvector GitHub: Performance tuning guide — github.com/pgvector/pgvector
  • Liveblocks: What’s the best vector database for building AI products (2025) — liveblocks.io
Share:

CORE SYSTEMS

Stavíme core systémy a AI agenty, které drží provoz. 15 let zkušeností s enterprise IT.

Need help with implementation?

Our experts can help with design, implementation, and operations. From architecture to production.

Contact us