Data Systems
DynamoDB Deep Dive
Amazon DynamoDB is a fully managed NoSQL database that delivers single-digit millisecond performance at any scale. Master its partitioning model, indexing strategies, global tables, and event-driven patterns with DynamoDB Streams.
- Serverless β No servers to manage, auto-scaling built in
- Predictable Performance β Single-digit millisecond at any scale
- Global Distribution β Multi-region replication with global tables
DynamoDB scales to millions of requests per second with zero operational overhead.
DynamoDB Architecture
DfAmazon DynamoDB
Amazon DynamoDB is a fully managed, serverless, key-value and document NoSQL database. It provides consistent, single-digit millisecond latency at any scale. DynamoDB automatically partitions data across servers and supports both eventually consistent and strongly consistent reads.
Data Model
| Concept | Description |
|---|---|
| Table | Collection of items (analogous to a table in SQL) |
| Item | A group of attributes (analogous to a row) |
| Attribute | A key-value pair (analogous to a column) |
| Primary Key | Unique identifier for each item (partition key + optional sort key) |
DynamoDB Data Model
// Table: Users
{
"PK": "USER#123", // Partition Key
"SK": "PROFILE", // Sort Key
"name": "Alice Johnson",
"email": "alice@example.com",
"created_at": "2024-01-15",
"plan": "premium"
}
// Table: Orders (single table design)
{
"PK": "USER#123", // Partition Key
"SK": "ORDER#2024-01-15#001", // Sort Key
"product_id": "PROD#456",
"amount": 99.99,
"status": "shipped"
}
Partitioning
DfDynamoDB Partitioning
DynamoDB automatically partitions data based on the partition key. Each partition stores a contiguous range of keys and handles a proportional share of traffic. A good partition key has high cardinality and distributes traffic evenly across partitions.
Partition Key Distribution
Here,
- =Partition assigned to the item
- =The partition key value
- =Number of partitions
A common anti-pattern is choosing a partition key with low cardinality (e.g., "status" with only a few values). This creates hot partitions where most traffic hits a single partition. Choose partition keys with high cardinality for uniform distribution.
Single Table Design
DfSingle Table Design
Single table design stores multiple entity types in one DynamoDB table using composite primary keys (PK + SK). This enables efficient access patterns across related entities without JOINs. The trade-off is a more complex data model but fewer tables to manage.
| Entity | PK | SK | Attributes |
|---|---|---|---|
| User | USER#123 | PROFILE | name, email |
| Order | USER#123 | ORDER#2024-01-15 | amount, status |
| Product | PRODUCT#456 | METADATA | name, price |
| Review | PRODUCT#456 | REVIEW#USER#123 | rating, text |
Secondary Indexes
DfGlobal Secondary Index (GSI)
A Global Secondary Index is a separate index with its own partition key and optional sort key. It enables queries on non-key attributes at the cost of eventually consistent reads and additional storage cost.
DfLocal Secondary Index (LSI)
A Local Secondary Index shares the partition key with the base table but uses a different sort key. It provides strongly consistent reads but must be defined at table creation time.
| Index Type | Partition Key | Sort Key | Consistency | Cost |
|---|---|---|---|---|
| GSI | Different from base | Optional | Eventually consistent | Extra storage + throughput |
| LSI | Same as base | Different | Strongly consistent | Extra storage only |
Design your access patterns first, then choose the partition key and sort key to support them. GSI projections determine which attributes are copied to the indexβproject only what you need to minimize cost.
DynamoDB Streams
DfDynamoDB Streams
DynamoDB Streams capture a time-ordered sequence of item-level modifications (create, update, delete) in a DynamoDB table. The stream data is available for 24 hours and can trigger AWS Lambda functions for event-driven processing.
| Use Case | Pattern |
|---|---|
| Cross-region replication | Stream β Lambda β write to other region |
| Event-driven workflows | Stream β Lambda β trigger Step Functions |
| Materialized views | Stream β Lambda β update derived tables |
| Audit logging | Stream β Kinesis β S3 β Athena |
Global Tables
DfDynamoDB Global Tables
Global tables provide a fully managed, multi-region, multi-active replication solution. They enable fast, local reads and writes in any region with eventual consistency across regions. Global tables are ideal for applications that need low-latency access from multiple geographic locations.
| Feature | Description |
|---|---|
| Multi-active | Read and write in any region |
| Eventual consistency | Replication across regions is async |
| Conflict resolution | Last-writer-wins (LWW) |
| Automatic | No manual setup for replication |
Capacity Modes
| Mode | Description | Best For |
|---|---|---|
| On-demand | Pay per request, auto-scales | Unpredictable workloads |
| Provisioned | Reserve read/write capacity | Predictable workloads |
| Auto-scaling | Adjusts provisioned capacity | Variable but patterned workloads |
DynamoDB Capacity Calculation
Here,
- =Read Capacity Units needed
- =Item size in bytes
- =One RCU per 4KB for strongly consistent read
- =1 for strongly consistent, 0.5 for eventually consistent
Practice Exercises
-
Table Design: Design a single-table DynamoDB schema for a ride-sharing app with users, drivers, rides, and payments. Identify all access patterns and choose appropriate PK/SK combinations.
-
Partition Key Analysis: You have a DynamoDB table with 100M items and the partition key is "country". Analyze the access pattern and identify potential hot partitions. Propose a solution.
-
Stream Processing: Design an event-driven workflow using DynamoDB Streams that sends a notification when an order status changes to "shipped".
-
Cost Estimation: Estimate the monthly cost for a DynamoDB table with 100GB of data, 10K read capacity units, and 5K write capacity units.
Key Takeaways:
- DynamoDB is a serverless, fully managed NoSQL database with predictable performance
- Single table design stores multiple entity types using composite primary keys
- Choose partition keys with high cardinality for uniform distribution
- GSIs enable queries on non-key attributes; LSIs share the partition key
- DynamoDB Streams enable event-driven workflows with Lambda
- Global tables provide multi-region, multi-active replication
What to Learn Next
-> Redis Deep Dive Redis data structures, persistence, clustering, and use cases.
-> Cassandra Deep Dive Cassandra architecture, data modeling, and operational patterns.
-> Spanner and CockroachDB Deep dive into specific NewSQL implementations.
-> NoSQL Deep Dive Document, key-value, column-family, and graph databases overview.
-> Data Partitioning Sharding strategies, consistent hashing, and partition keys.
-> Choosing the Right Database Systematic framework for database selection.