Data Systems

NoSQL Deep Dive

NoSQL databases trade ACID guarantees for horizontal scalability and flexible data models. Master the four categories of NoSQL databases and their optimal use cases.

Scalability — Horizontal scaling across commodity servers
Flexibility — Schema-less or schema-on-read data models
Performance — Optimized for specific access patterns

NoSQL is not "No SQL"—it's "Not Only SQL." Choose the right tool for the job.

The Four Categories of NoSQL

Document Databases (MongoDB)

DfDocument Database

A document database stores data as documents (typically JSON/BSON) with nested structures. Each document can have a different schema, allowing flexible data models. Documents are retrieved by their unique key and can be queried using field values, array elements, and nested document fields.

MongoDB Internals

Component	Purpose
WiredTiger	Storage engine with document-level locking
B-tree indexes	Primary index structure
Replica Set	Primary-secondary replication for HA
Sharding	Horizontal scaling across clusters
Aggregation Pipeline	Server-side data processing

Data Modeling Patterns

MongoDB Embedded vs Referenced Design

Embedded (denormalized):

{
  "_id": "user123",
  "name": "Alice",
  "orders": [
    { "item": "laptop", "price": 999, "date": "2024-01-15" },
    { "item": "mouse", "price": 29, "date": "2024-01-16" }
  ]
}

Pros: Single query, atomic updates, no JOINs Cons: Document size limit (16MB), data duplication

Referenced (normalized):

{ "_id": "user123", "name": "Alice", "order_ids": ["o1", "o2"] }
{ "_id": "o1", "item": "laptop", "price": 999, "user_id": "user123" }

Pros: No duplication, unlimited related data Cons: Requires multiple queries or $lookup aggregation

Choose embedded design when related data is always accessed together and the total size is bounded. Choose referenced design when related data is accessed independently or grows unbounded.

Key-Value Databases (Redis, DynamoDB)

DfKey-Value Store

A key-value store is the simplest NoSQL model: data is stored as key-value pairs, and access is primarily through key-based lookups. It provides O(1) average-case performance for reads and writes, making it ideal for caching, session storage, and high-throughput simple operations.

Redis Data Structures

Structure	Use Case	Example
String	Cache, counters	`SET user:123 "Alice"`
Hash	Object storage	`HSET user:123 name "Alice" age 30`
List	Message queues	`LPUSH queue "msg1" "msg2"`
Set	Tags, unique items	`SADD tags "python" "system-design"`
Sorted Set	Leaderboards	`ZADD leaderboard 100 "player1"`
Stream	Event sourcing	`XADD events * type "click" page "/home"`

Redis Latency

L_{p99} < 1ms \text{ for simple operations}

Here,

$L_{p99}$ =99th percentile latency
$< 1ms$ =Sub-millisecond response time

DynamoDB Partitioning

DfDynamoDB Partitioning

DynamoDB automatically partitions data across multiple servers using the partition key. Each partition holds a contiguous range of keys, and DynamoDB evenly distributes load by choosing partition keys that create uniform access patterns. Hot partitions (uneven key distribution) are the primary performance concern.

A common DynamoDB anti-pattern is choosing a partition key with low cardinality (e.g., "status" with only a few values). This creates hot partitions where most traffic hits a single partition. Choose partition keys with high cardinality for uniform distribution.

Column-Family Databases (Cassandra)

DfColumn-Family Store

A column-family store (wide-column store) organizes data into column families (similar to tables), rows, and columns. Unlike relational tables, each row can have a different set of columns. This model is optimized for write-heavy workloads and time-series data, with excellent write throughput and horizontal scalability.

Cassandra Data Model

Cassandra Query Patterns

Pattern	Description	Example
Partition lookup	Get all data for a partition key	`WHERE user_id = 'abc'`
Range within partition	Query by clustering key	`WHERE user_id = 'abc' AND timestamp > '2024-01-01'`
Time-series	Latest events for a user	`ORDER BY timestamp DESC LIMIT 10`

Cassandra does not support JOINs, aggregations, or flexible WHERE clauses. You must model your tables to match your query patterns (query-first design). This is the opposite of relational modeling, where you normalize first and optimize queries later.

Graph Databases (Neo4j)

DfGraph Database

A graph database stores data as nodes (entities) and edges (relationships), with properties on both. It excels at queries that traverse relationships, such as finding shortest paths, detecting cycles, or recommending connections. Graph databases use index-free adjacency, meaning each node directly references its neighbors.

Graph Query Patterns

Query Type	Description	Example
Path finding	Shortest path between nodes	Find 3-degree connections
Pattern matching	Find specific subgraphs	Users who bought X and Y
Centrality	Most connected nodes	Influencer detection
Community detection	Cluster related nodes	Social group identification

Graph Traversal Complexity

T_{traversal} = O(k \times d)

Here,

$T_{traversal}$ =Traversal time
$k$ =Average degree (connections per node)
$d$ =Traversal depth

Social Network Query

Find all users within 3 degrees of separation from user "alice":

MATCH (alice:User {name: 'Alice'})-[:FRIEND*1..3]-(friend)
RETURN DISTINCT friend.name

In a relational database, this requires 3 self-joins on the friends table—extremely expensive at scale. In Neo4j, this is a simple traversal using index-free adjacency.

NoSQL Comparison Matrix

Criteria	Document	Key-Value	Column-Family	Graph
Data model	JSON docs	Key→Value	Wide columns	Nodes + edges
Query flexibility	High	Low (key only)	Moderate	High (traversals)
Write throughput	Good	Excellent	Excellent	Moderate
Read throughput	Good	Excellent	Good	Depends on query
Horizontal scaling	Good	Excellent	Excellent	Hard
Consistency	Configurable	Configurable	Tunable	Strong
Best use case	Content management	Caching, sessions	Time-series, IoT	Social networks

Practice Exercises

Data Modeling: Design the MongoDB schema for a blogging platform with users, posts, comments, and tags. Decide which fields to embed vs reference. Justify your choices.
Key-Value Design: Using Redis, design a rate limiter that allows 100 requests per minute per user. What data structures would you use? How do you handle expiration?
Column-Family Design: Design the Cassandra table schema for a time-series IoT sensor data system. What is the partition key? What is the clustering key?
Graph Query: Given a social network graph, write the Cypher query to find "friends of friends who live in the same city and share at least 3 interests."

Key Takeaways:

Document databases (MongoDB) excel at flexible, nested data with schema-on-read
Key-value stores (Redis) provide O(1) lookups for caching and sessions
Column-family stores (Cassandra) optimize for write-heavy time-series workloads
Graph databases (Neo4j) use index-free adjacency for relationship queries
Choose based on your primary access pattern, not on popularity

What to Learn Next

-> SQL Deep Dive PostgreSQL, MySQL, indexing strategies, and query optimization.

-> MongoDB Deep Dive Advanced MongoDB features, aggregation pipeline, and sharding.

-> Redis Deep Dive Redis data structures, persistence, clustering, and use cases.

-> Cassandra Deep Dive Cassandra architecture, data modeling, and operational patterns.

-> Choosing the Right Database Systematic framework for database selection.

-> DynamoDB Deep Dive DynamoDB internals, partitioning, and global tables.

NoSQL Deep Dive

NoSQL Deep Dive

The Four Categories of NoSQL

Document Databases (MongoDB)

DfDocument Database

MongoDB Internals

Data Modeling Patterns

MongoDB Embedded vs Referenced Design

Key-Value Databases (Redis, DynamoDB)

DfKey-Value Store

Redis Data Structures

Redis Latency

DynamoDB Partitioning

DfDynamoDB Partitioning

Column-Family Databases (Cassandra)

DfColumn-Family Store

Cassandra Data Model

Cassandra Query Patterns

Graph Databases (Neo4j)

DfGraph Database

Graph Query Patterns

Graph Traversal Complexity

Social Network Query

NoSQL Comparison Matrix

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?