Vector Databases

Vector Database Architecture

Vector databases are purpose-built for storing, indexing, and querying high-dimensional vectors. They form the backbone of production RAG systems by enabling fast similarity search across millions of embeddings.

Vector Database Comparison

Database	Type	Index	Max Vectors	Latency	Best For
Pinecone	Managed	Proprietary	1B+	<10ms	Enterprise, no-ops
Weaviate	Self-host/Managed	HNSW	10B+	<5ms	Multi-modal, GraphQL
ChromaDB	Embedded	HNSW	10M	<5ms	Prototyping, small scale
Qdrant	Self-host/Managed	HNSW+Quantization	10B+	<5ms	High performance, filtering
Milvus	Self-host	IVF/HNSW	10B+	<10ms	Large-scale, distributed
pgvector	PostgreSQL ext	IVFFlat/HNSW	100M	<20ms	SQL integration

Pinecone

Fully managed vector database with zero operational overhead.

from pinecone import Pinecone, ServerlessSpec

# Initialize Pinecone
pc = Pinecone(api_key="YOUR_API_KEY")

# Create index
pc.create_index(
    name="production-rag",
    dimension=1536,           # OpenAI ada-002 dimension
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    )
)

index = pc.Index("production-rag")

# Upsert vectors
index.upsert(vectors=[
    {
        "id": "doc_001",
        "values": embedding_vector,
        "metadata": {
            "source": "technical-docs",
            "chunk_index": 0,
            "content": "Document text here..."
        }
    }
])

# Query
results = index.query(
    vector=query_embedding,
    top_k=10,
    include_metadata=True,
    filter={"source": {"$eq": "technical-docs"}}
)

Weaviate

Open-source vector database with native multi-modal support.

import weaviate
from weaviate.classes.config import Configure, Property, DataType

# Connect to Weaviate
client = weaviate.connect_to_local()  # or connect_to_weaviate_cloud()

# Create collection with vectorizer
client.collections.create(
    name="Documents",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),
    properties=[
        Property(name="content", data_type=DataType.TEXT),
        Property(name="source", data_type=DataType.TEXT),
        Property(name="chunk_index", data_type=DataType.INT),
    ]
)

# Auto-vectorize on insert
docs = client.collections.get("Documents")
docs.data.insert({
    "content": "Document text here...",
    "source": "technical-docs",
    "chunk_index": 0
})

# Hybrid search (vector + keyword)
results = docs.query.hybrid(
    query="How to deploy LLMs",
    alpha=0.75,  # 0=pure keyword, 1=pure vector
    limit=10,
    return_metadata=weaviate.classes.config.QueryMetadata(score=True)
)

ChromaDB

Lightweight, embedded vector database ideal for prototyping.

import chromadb

# Create persistent client
client = chromadb.PersistentClient(path="./chroma_db")

# Create collection
collection = client.create_collection(
    name="documents",
    metadata={"hnsw:space": "cosine"}
)

# Add documents
collection.add(
    documents=["Document text here..."],
    metadatas=[{"source": "technical-docs"}],
    ids=["doc_001"]
)

# Query
results = collection.query(
    query_texts=["How to deploy LLMs"],
    n_results=10,
    where={"source": {"$eq": "technical-docs"}}
)

Qdrant

High-performance vector database with advanced filtering and quantization.

from qdrant_client import QdrantClient
from qdrant_client.models import (
    VectorParams, Distance, PointStruct,
    Filter, FieldCondition, MatchValue
)

# Connect to Qdrant
client = QdrantClient(url="http://localhost:6333")

# Create collection with quantization
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=1536,
        distance=Distance.COSINE
    ),
    quantization_config=ScalarQuantization(
        scalar_type="int8",
        always_ram=True
    )
)

# Upsert points
client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=embedding_vector,
            payload={
                "content": "Document text here...",
                "source": "technical-docs"
            }
        )
    ]
)

# Search with filtering
results = client.search(
    collection_name="documents",
    query_vector=embedding_vector,
    query_filter=Filter(
        must=[FieldCondition(key="source", match=MatchValue(value="technical-docs"))]
    ),
    limit=10
)

ANN Algorithm Comparison

Algorithm	Build Time	Query Time	Memory	Accuracy
HNSW	O(n·log(n))	O(log(n))	High	~99%
IVF-PQ	O(n·k)	O(n/(k·m))	Low	~95%
ScaNN	O(n·k)	O(n/(k·m))	Low	~97%
Flat (brute force)	O(1)	O(n)	High	100%

\text{HNSW Search Complexity}: O(\log n) \text{ per query}

\text{IVF-PQ Search Complexity}: O\left(\frac{n}{k \cdot m}\right) \text{ per query}

Where n = total vectors, k = number of lists/clusters, m = number of sub-quantizers.

Production Deployment Patterns

Pattern	Description	When to Use
Managed service	Pinecone, Weaviate Cloud	Low ops overhead
Self-hosted	Weaviate, Qdrant on K8s	Full control, cost optimization
Embedded	ChromaDB, FAISS	Prototyping, single-server
Hybrid	Vector DB + PostgreSQL	Metadata-heavy workloads

Choosing the right vector database depends on scale, latency requirements, operational expertise, and budget constraints.