Vector Database Architecture
Vector databases are purpose-built for storing, indexing, and querying high-dimensional vectors. They form the backbone of production RAG systems by enabling fast similarity search across millions of embeddings.
Vector Database Comparison
| Database | Type | Index | Max Vectors | Latency | Best For |
|---|---|---|---|---|---|
| Pinecone | Managed | Proprietary | 1B+ | <10ms | Enterprise, no-ops |
| Weaviate | Self-host/Managed | HNSW | 10B+ | <5ms | Multi-modal, GraphQL |
| ChromaDB | Embedded | HNSW | 10M | <5ms | Prototyping, small scale |
| Qdrant | Self-host/Managed | HNSW+Quantization | 10B+ | <5ms | High performance, filtering |
| Milvus | Self-host | IVF/HNSW | 10B+ | <10ms | Large-scale, distributed |
| pgvector | PostgreSQL ext | IVFFlat/HNSW | 100M | <20ms | SQL integration |
Pinecone
Fully managed vector database with zero operational overhead.
from pinecone import Pinecone, ServerlessSpec
# Initialize Pinecone
pc = Pinecone(api_key="YOUR_API_KEY")
# Create index
pc.create_index(
name="production-rag",
dimension=1536, # OpenAI ada-002 dimension
metric="cosine",
spec=ServerlessSpec(
cloud="aws",
region="us-east-1"
)
)
index = pc.Index("production-rag")
# Upsert vectors
index.upsert(vectors=[
{
"id": "doc_001",
"values": embedding_vector,
"metadata": {
"source": "technical-docs",
"chunk_index": 0,
"content": "Document text here..."
}
}
])
# Query
results = index.query(
vector=query_embedding,
top_k=10,
include_metadata=True,
filter={"source": {"$eq": "technical-docs"}}
)
Weaviate
Open-source vector database with native multi-modal support.
import weaviate
from weaviate.classes.config import Configure, Property, DataType
# Connect to Weaviate
client = weaviate.connect_to_local() # or connect_to_weaviate_cloud()
# Create collection with vectorizer
client.collections.create(
name="Documents",
vectorizer_config=Configure.Vectorizer.text2vec_openai(),
properties=[
Property(name="content", data_type=DataType.TEXT),
Property(name="source", data_type=DataType.TEXT),
Property(name="chunk_index", data_type=DataType.INT),
]
)
# Auto-vectorize on insert
docs = client.collections.get("Documents")
docs.data.insert({
"content": "Document text here...",
"source": "technical-docs",
"chunk_index": 0
})
# Hybrid search (vector + keyword)
results = docs.query.hybrid(
query="How to deploy LLMs",
alpha=0.75, # 0=pure keyword, 1=pure vector
limit=10,
return_metadata=weaviate.classes.config.QueryMetadata(score=True)
)
ChromaDB
Lightweight, embedded vector database ideal for prototyping.
import chromadb
# Create persistent client
client = chromadb.PersistentClient(path="./chroma_db")
# Create collection
collection = client.create_collection(
name="documents",
metadata={"hnsw:space": "cosine"}
)
# Add documents
collection.add(
documents=["Document text here..."],
metadatas=[{"source": "technical-docs"}],
ids=["doc_001"]
)
# Query
results = collection.query(
query_texts=["How to deploy LLMs"],
n_results=10,
where={"source": {"$eq": "technical-docs"}}
)
Qdrant
High-performance vector database with advanced filtering and quantization.
from qdrant_client import QdrantClient
from qdrant_client.models import (
VectorParams, Distance, PointStruct,
Filter, FieldCondition, MatchValue
)
# Connect to Qdrant
client = QdrantClient(url="http://localhost:6333")
# Create collection with quantization
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=1536,
distance=Distance.COSINE
),
quantization_config=ScalarQuantization(
scalar_type="int8",
always_ram=True
)
)
# Upsert points
client.upsert(
collection_name="documents",
points=[
PointStruct(
id=1,
vector=embedding_vector,
payload={
"content": "Document text here...",
"source": "technical-docs"
}
)
]
)
# Search with filtering
results = client.search(
collection_name="documents",
query_vector=embedding_vector,
query_filter=Filter(
must=[FieldCondition(key="source", match=MatchValue(value="technical-docs"))]
),
limit=10
)
ANN Algorithm Comparison
| Algorithm | Build Time | Query Time | Memory | Accuracy |
|---|---|---|---|---|
| HNSW | O(nΒ·log(n)) | O(log(n)) | High | ~99% |
| IVF-PQ | O(nΒ·k) | O(n/(kΒ·m)) | Low | ~95% |
| ScaNN | O(nΒ·k) | O(n/(kΒ·m)) | Low | ~97% |
| Flat (brute force) | O(1) | O(n) | High | 100% |
\text{HNSW Search Complexity}: O(\log n) \text{ per query}
\text{IVF-PQ Search Complexity}: O\left(\frac{n}{k \cdot m}\right) \text{ per query}
Where n = total vectors, k = number of lists/clusters, m = number of sub-quantizers.
Production Deployment Patterns
| Pattern | Description | When to Use |
|---|---|---|
| Managed service | Pinecone, Weaviate Cloud | Low ops overhead |
| Self-hosted | Weaviate, Qdrant on K8s | Full control, cost optimization |
| Embedded | ChromaDB, FAISS | Prototyping, single-server |
| Hybrid | Vector DB + PostgreSQL | Metadata-heavy workloads |
Choosing the right vector database depends on scale, latency requirements, operational expertise, and budget constraints.