Data Systems

DynamoDB Deep Dive

Amazon DynamoDB is a fully managed NoSQL database that delivers single-digit millisecond performance at any scale. Master its partitioning model, indexing strategies, global tables, and event-driven patterns with DynamoDB Streams.

Serverless — No servers to manage, auto-scaling built in
Predictable Performance — Single-digit millisecond at any scale
Global Distribution — Multi-region replication with global tables

DynamoDB scales to millions of requests per second with zero operational overhead.

DynamoDB Architecture

DfAmazon DynamoDB

Amazon DynamoDB is a fully managed, serverless, key-value and document NoSQL database. It provides consistent, single-digit millisecond latency at any scale. DynamoDB automatically partitions data across servers and supports both eventually consistent and strongly consistent reads.

Data Model

Concept	Description
Table	Collection of items (analogous to a table in SQL)
Item	A group of attributes (analogous to a row)
Attribute	A key-value pair (analogous to a column)
Primary Key	Unique identifier for each item (partition key + optional sort key)

DynamoDB Data Model

// Table: Users
{
  "PK": "USER#123",           // Partition Key
  "SK": "PROFILE",            // Sort Key
  "name": "Alice Johnson",
  "email": "alice@example.com",
  "created_at": "2024-01-15",
  "plan": "premium"
}

// Table: Orders (single table design)
{
  "PK": "USER#123",           // Partition Key
  "SK": "ORDER#2024-01-15#001", // Sort Key
  "product_id": "PROD#456",
  "amount": 99.99,
  "status": "shipped"
}

Partitioning

DfDynamoDB Partitioning

DynamoDB automatically partitions data based on the partition key. Each partition stores a contiguous range of keys and handles a proportional share of traffic. A good partition key has high cardinality and distributes traffic evenly across partitions.

Partition Key Distribution

P_{partition} = hash(PK) \mod N_{partitions}

Here,

$P_{partition}$ =Partition assigned to the item
$PK$ =The partition key value
$N_{partitions}$ =Number of partitions

A common anti-pattern is choosing a partition key with low cardinality (e.g., "status" with only a few values). This creates hot partitions where most traffic hits a single partition. Choose partition keys with high cardinality for uniform distribution.

Single Table Design

DfSingle Table Design

Single table design stores multiple entity types in one DynamoDB table using composite primary keys (PK + SK). This enables efficient access patterns across related entities without JOINs. The trade-off is a more complex data model but fewer tables to manage.

Entity	PK	SK	Attributes
User	`USER#123`	`PROFILE`	name, email
Order	`USER#123`	`ORDER#2024-01-15`	amount, status
Product	`PRODUCT#456`	`METADATA`	name, price
Review	`PRODUCT#456`	`REVIEW#USER#123`	rating, text

Secondary Indexes

DfGlobal Secondary Index (GSI)

A Global Secondary Index is a separate index with its own partition key and optional sort key. It enables queries on non-key attributes at the cost of eventually consistent reads and additional storage cost.

DfLocal Secondary Index (LSI)

A Local Secondary Index shares the partition key with the base table but uses a different sort key. It provides strongly consistent reads but must be defined at table creation time.

Index Type	Partition Key	Sort Key	Consistency	Cost
GSI	Different from base	Optional	Eventually consistent	Extra storage + throughput
LSI	Same as base	Different	Strongly consistent	Extra storage only

Design your access patterns first, then choose the partition key and sort key to support them. GSI projections determine which attributes are copied to the index—project only what you need to minimize cost.

DynamoDB Streams

DfDynamoDB Streams

DynamoDB Streams capture a time-ordered sequence of item-level modifications (create, update, delete) in a DynamoDB table. The stream data is available for 24 hours and can trigger AWS Lambda functions for event-driven processing.

Use Case	Pattern
Cross-region replication	Stream → Lambda → write to other region
Event-driven workflows	Stream → Lambda → trigger Step Functions
Materialized views	Stream → Lambda → update derived tables
Audit logging	Stream → Kinesis → S3 → Athena

Global Tables

DfDynamoDB Global Tables

Global tables provide a fully managed, multi-region, multi-active replication solution. They enable fast, local reads and writes in any region with eventual consistency across regions. Global tables are ideal for applications that need low-latency access from multiple geographic locations.

Feature	Description
Multi-active	Read and write in any region
Eventual consistency	Replication across regions is async
Conflict resolution	Last-writer-wins (LWW)
Automatic	No manual setup for replication

Capacity Modes

Mode	Description	Best For
On-demand	Pay per request, auto-scales	Unpredictable workloads
Provisioned	Reserve read/write capacity	Predictable workloads
Auto-scaling	Adjusts provisioned capacity	Variable but patterned workloads

DynamoDB Capacity Calculation

RCU = \frac{ReadSize_{bytes}}{4KB} \times ConsistencyFactor

Here,

$RCU$ =Read Capacity Units needed
$ReadSize_{bytes}$ =Item size in bytes
$4KB$ =One RCU per 4KB for strongly consistent read
$ConsistencyFactor$ =1 for strongly consistent, 0.5 for eventually consistent

Practice Exercises

Table Design: Design a single-table DynamoDB schema for a ride-sharing app with users, drivers, rides, and payments. Identify all access patterns and choose appropriate PK/SK combinations.
Partition Key Analysis: You have a DynamoDB table with 100M items and the partition key is "country". Analyze the access pattern and identify potential hot partitions. Propose a solution.
Stream Processing: Design an event-driven workflow using DynamoDB Streams that sends a notification when an order status changes to "shipped".
Cost Estimation: Estimate the monthly cost for a DynamoDB table with 100GB of data, 10K read capacity units, and 5K write capacity units.

Key Takeaways:

DynamoDB is a serverless, fully managed NoSQL database with predictable performance
Single table design stores multiple entity types using composite primary keys
Choose partition keys with high cardinality for uniform distribution
GSIs enable queries on non-key attributes; LSIs share the partition key
DynamoDB Streams enable event-driven workflows with Lambda
Global tables provide multi-region, multi-active replication

What to Learn Next

-> Redis Deep Dive Redis data structures, persistence, clustering, and use cases.

-> Cassandra Deep Dive Cassandra architecture, data modeling, and operational patterns.

-> Spanner and CockroachDB Deep dive into specific NewSQL implementations.

-> NoSQL Deep Dive Document, key-value, column-family, and graph databases overview.

-> Data Partitioning Sharding strategies, consistent hashing, and partition keys.

-> Choosing the Right Database Systematic framework for database selection.

DynamoDB Deep Dive

DynamoDB Deep Dive

DynamoDB Architecture

DfAmazon DynamoDB

Data Model

DynamoDB Data Model

Partitioning

DfDynamoDB Partitioning

Partition Key Distribution

Single Table Design

DfSingle Table Design

Secondary Indexes

DfGlobal Secondary Index (GSI)

DfLocal Secondary Index (LSI)

DynamoDB Streams

DfDynamoDB Streams

Global Tables

DfDynamoDB Global Tables

Capacity Modes

DynamoDB Capacity Calculation

Practice Exercises

What to Learn Next

Premium Content

Need Expert System Design Help?