🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

SQL Deep Dive

Data SystemsRelational Databases🟢 Free Lesson

Advertisement

Data Systems

SQL Deep Dive

SQL databases remain the backbone of most systems. Master PostgreSQL and MySQL internals, indexing strategies, transaction isolation levels, and scaling patterns.

  • ACID — Strong consistency guarantees for critical data
  • Indexing — B-trees, hash indexes, and query optimization
  • Scaling — Read replicas, connection pooling, and sharding

SQL databases are not legacy—they are battle-tested foundations for reliable systems.

SQL Database Architecture

Understanding the internal architecture of SQL databases is essential for making informed design decisions.

DfRelational Database

A relational database organizes data into tables (relations) with rows (tuples) and columns (attributes). It enforces a predefined schema, supports ACID transactions, and uses SQL for querying. The relational model provides strong data integrity through constraints, foreign keys, and normalization.

PostgreSQL vs MySQL

FeaturePostgreSQLMySQL
ACID ComplianceFullInnoDB only
JSON SupportExcellent (JSONB)Good (JSON)
Full-Text SearchBuilt-inBuilt-in
ReplicationStreaming, logicalAsynchronous, semi-sync
ExtensionsRich ecosystemPlugin-based
PerformanceComplex queriesSimple queries
ConcurrencyMVCCMVCC (InnoDB)
Best ForComplex analytics, geospatialWeb apps, read-heavy

Storage Engine Internals

B-Tree Indexing

DfB-Tree

A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. It is the standard index structure for most SQL databases, optimized for disk I/O with high branching factor.

B-Tree Search Complexity

Tsearch=O(logBN)T_{search} = O(\log_B N)

Here,

  • TsearchT_{search}=Search time complexity
  • BB=Branching factor (typically 100-500)
  • NN=Number of records

B-Tree Height Calculation

For a table with 1 billion rows and branching factor of 500:

Height = log_500(1,000,000,000) ≈ 5 levels

With each node fitting in a 16KB page, the entire index tree for 1B rows is only 5 disk reads deep. This is why B-trees are so efficient for disk-based storage.

Buffer Pool and Page Management

QueryBuffer PoolPage 1Page 2...DiskData FilesWrite-Ahead Log (WAL)Durability guaranteeCheckpointFlush dirty pages to disk

Transaction Isolation Levels

DfTransaction Isolation

Transaction isolation defines how and when changes made by one transaction are visible to other transactions. Higher isolation prevents more anomalies but reduces concurrency. The four standard levels represent a trade-off between correctness and performance.

Isolation LevelPhenomena PreventedPerformance
Read UncommittedDirty readsHighest
Read CommittedDirty readsHigh
Repeatable ReadDirty, non-repeatable readsModerate
SerializableAll anomaliesLowest

Isolation-Performance Trade-off

Throughput1Isolation LevelThroughput \propto \frac{1}{Isolation\ Level}

Here,

  • ThroughputThroughput=Transactions per second
  • IsolationLevelIsolation Level=Strictness of isolation (higher = more restrictive)

The default isolation level in PostgreSQL is Read Committed, while MySQL's InnoDB uses Repeatable Read. This can cause subtle bugs when migrating between databases. Always explicitly set your isolation level.

Indexing Strategies

When to Create an Index

Access PatternIndex TypeExample
Equality lookupB-treeWHERE id = 123
Range queryB-treeWHERE date > '2024-01-01'
Full-text searchGIN/GiSTWHERE content ILIKE '%keyword%'
GeospatialGiSTWHERE ST_DWithin(loc, point, 1000)
Array containsGINWHERE tags @> ARRAY['sql']
JSON fieldGINWHERE metadata @> '{"key": "value"}'

Composite Indexes

DfComposite Index

A composite index (multi-column index) covers multiple columns in a single index. The order of columns matters—queries must use the leftmost columns to benefit from the index. This is known as the leftmost prefix rule.

Composite Index Design

Given: CREATE INDEX idx_user_date ON orders(user_id, order_date);

This index supports:

  • WHERE user_id = 100 (uses index)
  • WHERE user_id = 100 AND order_date > '2024-01-01' (uses index)
  • WHERE order_date > '2024-01-01' (does NOT use index—skips leftmost column)

Query Optimization

EXPLAIN ANALYZE

The most important tool for query optimization:

EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id)
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01'
GROUP BY u.name
ORDER BY COUNT(o.id) DESC
LIMIT 10;

Common Performance Anti-Patterns

Anti-PatternProblemSolution
SELECT *Fetches unnecessary columnsSelect only needed columns
N+1 queriesRepeated queries in loopUse JOIN or batch loading
Missing indexesFull table scansAdd appropriate indexes
Over-indexingSlow writes, storage wasteRemove unused indexes
Implicit type conversionPrevents index usageMatch types explicitly

Always run EXPLAIN ANALYZE on slow queries. The output shows the actual execution plan, including which indexes are used, how many rows are scanned, and where the time is spent.

Scaling SQL Databases

Read Replicas

DfRead Replica

A read replica is a copy of the primary database that accepts read queries. Writes go to the primary, then replicate to replicas. This distributes read load across multiple servers while maintaining a single source of truth for writes.

Connection Pooling

Connection Pool Sizing

Poolsize=QPS×LatencyqueryCPUcoresPool_{size} = \frac{QPS \times Latency_{query}}{CPU_{cores}}

Here,

  • PoolsizePool_{size}=Recommended connection pool size
  • QPSQPS=Queries per second
  • LatencyqueryLatency_{query}=Average query latency in seconds
  • CPUcoresCPU_{cores}=CPU cores on the database server

Connection Pool Calculation

For a database handling 5000 QPS with 10ms average query latency and 16 CPU cores:

Pool_size = (5000 × 0.01) / 16 ≈ 3 connections

This seems low, but it's correct—each connection can handle many queries per second. Over-provisioning connections causes context switching overhead.

Practice Exercises

  1. Index Design: Given a users table with 100M rows and these query patterns, design the optimal indexes:

    • Lookup by email (unique)
    • Find users by city and age range
    • Search users by name (partial match)
    • Get recent users by creation date
  2. Query Optimization: Rewrite this query for better performance:

    SELECT * FROM orders 
    WHERE YEAR(created_at) = 2024 
    AND status IN ('pending', 'processing')
    ORDER BY created_at DESC;
    
  3. Scaling Decision: Your PostgreSQL database handles 10K QPS reads and 1K QPS writes. The read latency is 50ms P99. Design a scaling strategy that maintains <100ms P99 read latency while supporting 2x growth.

  4. Transaction Design: Design the transaction boundaries for a banking transfer operation. Consider isolation levels, deadlock prevention, and recovery from failures.

Key Takeaways:

  • PostgreSQL excels at complex queries; MySQL at simple reads
  • B-tree indexes provide O(log N) lookups with high branching factor
  • Transaction isolation levels trade consistency for concurrency
  • Composite indexes follow the leftmost prefix rule
  • Read replicas distribute read load; connection pooling manages connections
  • Always use EXPLAIN ANALYZE to diagnose slow queries

What to Learn Next

-> NoSQL Deep Dive Document, key-value, column-family, and graph databases.

-> PostgreSQL Deep Dive Advanced PostgreSQL features, extensions, and optimization.

-> Database Indexing B-trees, hash indexes, and indexing strategies.

-> Data Replication Sync vs async replication, leader election, and consistency.

-> Choosing the Right Database Systematic framework for database selection.

-> Data Partitioning Sharding strategies, consistent hashing, and partition keys.

Premium Content

SQL Deep Dive

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
💼Interview Prep
📜Certificates
🤝Community Access

Already a member? Log in

Need Expert System Design Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement