🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Introduction to System Design

FoundationsSystem Design Fundamentals🟢 Free Lesson

Advertisement

System Design Foundations

Introduction to System Design

System design is the discipline of defining the architecture, components, modules, interfaces, and data flow of a system to satisfy specified requirements. This guide provides a rigorous foundation for reasoning about distributed systems at scale.

  • Architecture — The high-level structure of a system and its components
  • Trade-offs — Every design decision involves competing constraints
  • Scalability — Systems must handle growth in users, data, and traffic

The goal of system design is not perfection—it is making informed decisions under uncertainty.

What Is System Design?

System design is the process of defining the architecture, interfaces, data models, and operational characteristics of a software system to meet functional and non-functional requirements.

DfSystem Design

System design is the disciplined practice of specifying the structure, behavior, and more detailed views of a system. It encompasses architectural decisions (component decomposition, communication patterns), data modeling (schemas, storage strategies), and operational concerns (scalability, reliability, observability). The goal is to produce a blueprint that satisfies requirements while balancing competing constraints such as cost, complexity, and time-to-market.

Functional vs Non-Functional Requirements

Every system design begins with understanding requirements. These split into two categories:

CategoryExamples
FunctionalUser authentication, search, payments, notifications
Non-FunctionalLatency < 100ms, 99.99% uptime, support 1M concurrent users, GDPR compliance

Non-functional requirements (NFRs) are often called quality attributes or -ilities. They are the primary drivers of architectural decisions. A system with different NFRs for the same functional requirements will have a completely different architecture.

The Core Principles

Several principles guide effective system design:

Principle 1: Know Your Constraints

Before designing anything, understand:

  • Scale: How many users? How much data? What growth rate?
  • Latency: What are the response time requirements?
  • Consistency: How strong are the consistency guarantees?
  • Budget: What are the cost constraints?

Principle 2: Design for Failure

Distributed systems fail. Networks partition, disks crash, processes crash. Design every component to be resilient:

  • Redundancy at every layer
  • Graceful degradation under failure
  • Circuit breakers and bulkheads
  • Health checks and automatic recovery

Principle 3: Keep It Simple

Complexity is the enemy of reliability. Every additional component, every additional network hop, every additional layer of abstraction introduces failure modes and increases cognitive load.

YAGNI (You Aren't Gonna Need It) applies powerfully to system design. Design for current requirements plus one step ahead. Over-engineering is more dangerous than under-engineering because it wastes resources and obscures the system's true complexity.

Principle 4: Make Trade-offs Explicit

Every design decision involves trade-offs. The best engineers can articulate why they chose one approach over another and what they sacrificed.

The Design Process

A systematic approach to system design follows these phases:

Phase 1: Requirements Gathering

Clarify functional and non-functional requirements. Ask questions:

  • What are the core use cases?
  • What is the expected scale (users, data, QPS)?
  • What are the latency requirements?
  • What consistency guarantees are needed?
  • What are the availability targets?

Phase 2: Back-of-the-Envelope Estimation

Quantify the problem before designing the solution:

Traffic Estimation

QPS=Daily Active Users×Actions per DaySeconds in a Day\text{QPS} = \frac{\text{Daily Active Users} \times \text{Actions per Day}}{\text{Seconds in a Day}}

Here,

  • QPSQPS=Queries per second
  • DAUDAU=Daily active users
  • 8640086400=Seconds in a day

Estimating QPS for a Social Media Feed

Suppose we have 100M daily active users, each viewing 10 feeds per day and posting 2 updates per day.

Feed reads: QPS_read = (100M × 10) / 86400 ≈ 11,600 QPS

Feed writes: QPS_write = (100M × 2) / 86400 ≈ 2,300 QPS

Peak QPS (3x average): Peak read ≈ 35,000 QPS Peak write ≈ 7,000 QPS

Phase 3: High-Level Design

Identify the major components and their interactions:

  • Client Layer: Web, mobile, API consumers
  • Application Layer: Business logic, orchestration
  • Data Layer: Databases, caches, search indices
  • Infrastructure Layer: Load balancers, message queues, CDN

Phase 4: Detailed Design

Deep-dive into each component:

  • Data models and schemas
  • API contracts
  • Database selection and partitioning
  • Caching strategies
  • Communication patterns (sync vs async)

Phase 5: Trade-off Analysis

Document the decisions made and alternatives considered. This is where senior engineers distinguish themselves—not by knowing the "right" answer, but by understanding why a choice was made.

System Design Taxonomy

Systems can be categorized along several dimensions:

Monolithic vs Distributed

MonolithicWeb ServerBusiness LogicData AccessSingle DatabaseDistributedAuth SvcUser SvcFeed SvcSearch SvcNotify SvcMedia SvcDB per SvcDB per Svc

Synchronous vs Asynchronous

  • Synchronous: Request-response, tight coupling, simpler to reason about
  • Asynchronous: Event-driven, decoupled, better for scalability and resilience

Stateless vs Stateful

DfStateless vs Stateful

A stateless system stores no client-specific state between requests. Each request contains all information needed to process it. A stateful system maintains session state, requiring sticky sessions or external state stores.

Key Metrics

System design requires understanding and optimizing for specific metrics:

Little's Law (Concurrency)

L=lambda×WL = \\lambda \times W

Here,

  • LL=Average number of concurrent requests in the system
  • λ\lambda=Average arrival rate (requests per second)
  • WW=Average time a request spends in the system (seconds)

Applying Little's Law

If your service receives 1000 QPS and each request takes 200ms to process:

L = 1000 × 0.2 = 200 concurrent requests

This tells you the minimum number of workers/threads needed to handle the load without queuing.

Amdahl's Law

When optimizing system performance, Amdahl's Law tells us the maximum improvement possible:

Amdahl's Law

S=1(1p)+psS = \frac{1}{(1 - p) + \frac{p}{s}}

Here,

  • SS=Maximum speedup
  • pp=Fraction of the system that can be parallelized
  • ss=Speedup of the parallelizable portion

Amdahl's Law in Practice

If 75% of your system can be parallelized and you have 10x faster parallel execution:

S = 1 / ((1 - 0.75) + 0.75/10) = 1 / (0.25 + 0.075) = 3.08x

Even with infinite parallelism, the sequential 25% limits speedup to 4x. This is why understanding bottlenecks is critical.

The CAP Theorem

One of the most fundamental results in distributed systems:

DfCAP Theorem

The CAP Theorem (Brewer, 2000; Gilbert & Lynch, 2002) states that a distributed data store can provide at most two of the following three guarantees:

  • Consistency (C): Every read receives the most recent write
  • Availability (A): Every request receives a response (without error)
  • Partition Tolerance (P): The system continues to operate despite network partitions

Since network partitions are inevitable in distributed systems, the real choice is between CP and AP systems.

The CAP theorem is often misunderstood. It does not say you must choose between C and A—rather, during a network partition, you must choose between C and A. Most of the time, when the network is healthy, you can have both.

Practice Exercises

  1. Conceptual: Explain the difference between scalability and performance. Can a system be performant but not scalable? Give an example.

  2. Estimation: A URL shortener handles 100M new URLs per month and 10:1 read-to-write ratio. Estimate the QPS for reads and writes. How much storage is needed for 5 years at 500 bytes per URL record?

  3. Design: Sketch a high-level architecture for a real-time notification system that must deliver messages to 50M users within 500ms. Identify the key components and their responsibilities.

  4. Trade-offs: Compare synchronous and asynchronous architectures for processing payment transactions. What are the trade-offs in terms of consistency, latency, and complexity?

Key Takeaways:

  • System design is the discipline of defining architecture, components, and data flow to satisfy requirements
  • Non-functional requirements (NFRs) are the primary drivers of architectural decisions
  • Design for failure, keep it simple, and make trade-offs explicit
  • Use back-of-the-envelope estimation to quantify the problem before designing the solution
  • Understand fundamental results like Little's Law, Amdahl's Law, and the CAP theorem

What to Learn Next

-> Scalability Fundamentals Vertical vs horizontal scaling, load balancing, and capacity planning.

-> Networking Fundamentals TCP/IP, HTTP, DNS, CDNs, and network latency.

-> API Design REST, GraphQL, gRPC, versioning, and rate limiting.

-> Databases SQL vs NoSQL, indexing, replication, and sharding.

-> CAP Theorem Consistency models, availability, and partition tolerance.

-> Load Balancing Distribution algorithms and L4 vs L7 load balancing.

Premium Content

Introduction to System Design

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement