Data Systems
PostgreSQL Deep Dive
PostgreSQL is the most advanced open-source relational database. Master its advanced features: JSONB, partitioning, full-text search, CTEs, window functions, and extension ecosystem.
- Extensibility β Custom types, functions, and extensions
- Standards Compliance β Full SQL standard support
- Advanced Features β Window functions, CTEs, JSONB, geospatial
PostgreSQL is the database that grows with you.
PostgreSQL Architecture
DfPostgreSQL
PostgreSQL is an object-relational database management system with a focus on extensibility, standards compliance, and data integrity. It supports advanced SQL features (CTEs, window functions, lateral joins), multiple indexing methods (B-tree, Hash, GiST, SP-GiST, GIN, BRIN), and a rich extension ecosystem (PostGIS, TimescaleDB, pgvector).
Process Architecture
Advanced SQL Features
Window Functions
DfWindow Functions
Window functions perform calculations across a set of rows related to the current row, without collapsing them (unlike GROUP BY). They enable running totals, rankings, moving averages, and other analytics in a single query.
| Function | Purpose | Example |
|---|---|---|
| ROW_NUMBER() | Sequential numbering | Rank rows within a partition |
| RANK() | Ranking with gaps | Ties get same rank |
| DENSE_RANK() | Ranking without gaps | No gaps after ties |
| LAG/LEAD() | Access previous/next rows | Compare with adjacent rows |
| SUM/AVG() OVER | Running aggregates | Running total, moving average |
| NTILE() | Divide into buckets | Quartiles, deciles |
Window Functions for Analytics
-- Top 3 products per category by revenue
WITH ranked AS (
SELECT
category,
product_name,
revenue,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM products
)
SELECT * FROM ranked WHERE rank <= 3;
-- Month-over-month revenue growth
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) as prev_month,
revenue - LAG(revenue) OVER (ORDER BY month) as growth,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month)) * 100.0 /
LAG(revenue) OVER (ORDER BY month), 2
) as growth_pct
FROM monthly_revenue;
JSONB
DfJSONB
JSONB (Binary JSON) is PostgreSQL's binary JSON data type. It stores JSON documents in a decomposed binary format that supports indexing, efficient access, and operators for containment, existence, and path queries. JSONB enables document-like flexibility within a relational database.
| Operator | Description | Example |
|---|---|---|
| -> | Get JSON field | data->'name' |
| ->> | Get JSON field as text | data->>'name' |
| #> | Get JSON path | data#>'{address,city}' |
| @> | Contains | data @> '{"status":"active"}' |
| ? | Key exists | data ? 'name' |
| jsonb_path_query | SQL/JSON path | jsonb_path_query(data, '$.orders[*].price') |
JSONB for Flexible Data
-- Create table with JSONB column
CREATE TABLE events (
id SERIAL PRIMARY KEY,
event_type TEXT,
payload JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Insert event with flexible payload
INSERT INTO events (event_type, payload) VALUES
('user_signup', '{"source": "web", "plan": "pro"}'),
('purchase', '{"item": "laptop", "price": 999}');
-- Query with JSON operators
SELECT * FROM events
WHERE payload @> '{"plan": "pro"}';
-- Index for fast JSON queries
CREATE INDEX idx_events_payload ON events USING GIN (payload);
Table Partitioning
DfTable Partitioning
Table partitioning splits a large table into smaller, more manageable pieces (partitions). PostgreSQL supports range, list, and hash partitioning. Queries that filter on the partition key automatically prune irrelevant partitions, improving performance.
| Partition Type | Strategy | Use Case |
|---|---|---|
| Range | Value ranges (e.g., dates) | Time-series data |
| List | Discrete values (e.g., regions) | Multi-tenant systems |
| Hash | Hash of partition key | Even distribution |
Range Partitioning for Time-Series
-- Create partitioned table
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INT,
value DOUBLE PRECISION
) PARTITION BY RANGE (time);
-- Create partitions
CREATE TABLE sensor_data_2024_01 PARTITION OF sensor_data
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE sensor_data_2024_02 PARTITION OF sensor_data
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Query automatically prunes partitions
EXPLAIN ANALYZE
SELECT * FROM sensor_data
WHERE time > '2024-01-15' AND time < '2024-01-20';
-- Only scans the January partition
Extension Ecosystem
| Extension | Purpose |
|---|---|
| PostGIS | Geospatial data and queries |
| TimescaleDB | Time-series data optimization |
| pgvector | Vector similarity search (AI/ML) |
| pg_trgm | Trigram similarity for fuzzy search |
| uuid-ossp | UUID generation |
| pg_stat_statements | Query performance analysis |
Practice Exercises
-
Window Functions: Write a SQL query to calculate the 7-day moving average of daily revenue, including day-over-day percentage change.
-
JSONB Design: Design a PostgreSQL schema for an e-commerce order system using JSONB for flexible product attributes. What indexes would you create?
-
Partitioning Strategy: Design the partitioning strategy for a logging table that receives 10M rows per day and needs to retain 1 year of data.
-
Performance Analysis: Given a slow query, write the steps you would take to diagnose and optimize it using EXPLAIN ANALYZE, pg_stat_statements, and index analysis.
Key Takeaways:
- PostgreSQL supports advanced SQL: window functions, CTEs, lateral joins
- JSONB enables document-like flexibility within a relational database
- Table partitioning (range, list, hash) improves query performance for large tables
- The extension ecosystem adds capabilities: PostGIS, TimescaleDB, pgvector
- Window functions perform analytics without collapsing rows
- Choose indexing strategy based on query patterns (B-tree, GIN, GiST)
What to Learn Next
-> SQL Deep Dive PostgreSQL, MySQL, indexing strategies, and query optimization.
-> MongoDB Deep Dive Advanced MongoDB features, aggregation pipeline, and sharding.
-> Spanner and CockroachDB Deep dive into specific NewSQL implementations.
-> Database Indexing B-trees, hash indexes, and indexing strategies.
-> Data Partitioning Sharding strategies, consistent hashing, and partition keys.
-> Data Replication Sync vs async replication, leader election, and consistency.