Interview Question: "When would you choose a BRIN index over a B-Tree? Explain covering indexes and index-only scans. How do partial indexes improve performance?" — Asked at Amazon, Pinterest, LinkedIn for Senior DBA roles
ℹ️
Difficulty: Advanced | Companies: Amazon, Pinterest, LinkedIn, Uber, Netflix | Time: 45-60 minutes
B-Tree Index Structure
B-Tree maintains sorted data with balanced tree properties:
Where is branching factor and is number of keys.
-- Create table for indexing experiments
CREATE TABLE product_sales (
sale_id SERIAL PRIMARY KEY,
product_id INT,
category VARCHAR(50),
sale_date DATE,
quantity INT,
unit_price DECIMAL(10,2),
total_amount DECIMAL(12,2),
region VARCHAR(20),
status VARCHAR(20)
);
-- Generate 10M rows
INSERT INTO product_sales (product_id, category, sale_date, quantity, unit_price, total_amount, region, status)
SELECT
(random() * 10000)::INT + 1,
(ARRAY['Electronics', 'Clothing', 'Food', 'Books', 'Home'])[floor(random()*5+1)],
CURRENT_DATE - (random() * 1000)::INT,
(random() * 100)::INT + 1,
(random() * 500)::DECIMAL(10,2),
(random() * 50000)::DECIMAL(12,2),
(ARRAY['North', 'South', 'East', 'West'])[floor(random()*4+1)],
(ARRAY['completed', 'pending', 'refunded'])[floor(random()*3+1)]
FROM generate_series(1, 10000000);
B-Tree Index Operations
-- Standard B-Tree index
CREATE INDEX idx_product_sales_product ON product_sales(product_id);
-- Composite B-Tree index
CREATE INDEX idx_product_sales_composite ON product_sales(category, sale_date, total_amount);
-- Check index structure
SELECT
indexname,
indexdef,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'product_sales';
-- Analyze index usage
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM product_sales
WHERE product_id = 1234;
-- Range scan performance
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM product_sales
WHERE category = 'Electronics'
AND sale_date BETWEEN '2024-01-01' AND '2024-12-31';
B-Tree Complexity:
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Search | ||
| Insert | ||
| Delete | ||
| Range Scan |
ℹ️
Branch Factor: PostgreSQL B-Tree has ~400 keys per page with 8KB pages. Height of 3 can store ~64M keys.
Hash Index
Hash indexes provide lookup for equality comparisons:
-- Hash index for equality lookups
CREATE INDEX idx_product_sales_hash_region ON product_sales USING hash(region);
-- Compare B-Tree vs Hash for equality
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*) FROM product_sales WHERE region = 'North';
-- Hash index only works with equality operators
-- This will NOT use hash index:
EXPLAIN SELECT * FROM product_sales WHERE region > 'North';
Hash vs B-Tree Comparison:
| Feature | Hash | B-Tree |
|---|---|---|
| Equality | ||
| Range | Not supported | |
| Order | Not maintained | Maintained |
| Memory | Higher | Lower |
| Write | Faster | Slower |
GiST Index for Complex Types
GiST (Generalized Search Tree) supports complex data types:
-- GiST index for range types
CREATE TABLE reservations (
reservation_id SERIAL PRIMARY KEY,
room_id INT,
during TSRANGE,
guest_name VARCHAR(100)
);
-- Create GiST index on range column
CREATE INDEX idx_reservations_during ON reservations USING gist(during);
-- Query for overlapping ranges
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM reservations
WHERE during @> '2024-06-15'::date;
-- Find overlapping reservations
EXPLAIN (ANALYZE, BUFFERS)
SELECT r1.*, r2.*
FROM reservations r1
JOIN reservations r2 ON r1.during && r2.during
WHERE r1.reservation_id < r2.reservation_id;
GiST Operator Classes:
| Type | Operators | Use Case |
|---|---|---|
| Range | @>, <@, &&, <<, >> | Temporal queries |
| Geometric | @>, <@, &&, ~, @ | Spatial queries |
| Full-text | @@, @> | Text search |
| Array | @>, <@ | Array containment |
BRIN Index for Large Tables
BRIN (Block Range Index) stores min/max per block range:
-- BRIN index for time-series data
CREATE TABLE sensor_data (
sensor_id INT,
reading_time TIMESTAMPTZ,
value DECIMAL(10,4)
);
-- Generate time-series data (correlated with insertion order)
INSERT INTO sensor_data
SELECT
(random() * 100)::INT + 1,
NOW() - (random() * 365 * 24 * 60 * 60)::INT * INTERVAL '1 second',
(random() * 100)::DECIMAL(10,4)
FROM generate_series(1, 10000000);
-- B-Tree index (large)
CREATE INDEX idx_sensor_btreen ON sensor_data(reading_time);
-- Size: ~200MB
-- BRIN index (small)
CREATE INDEX idx_sensor_brin ON sensor_data USING brin(reading_time)
WITH (pages_per_range = 32);
-- Size: ~100KB
-- Compare performance
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM sensor_data
WHERE reading_time BETWEEN '2024-06-01' AND '2024-06-30';
BRIN vs B-Tree Size Comparison:
| Rows | B-Tree Size | BRIN Size | Ratio |
|---|---|---|---|
| 1M | 21MB | 24KB | 875x |
| 10M | 210MB | 240KB | 875x |
| 100M | 2.1GB | 2.4MB | 875x |
⚠️
BRIN Prerequisite: Data must be physically correlated with the indexed column. Random data makes BRIN ineffective.
Partial Index
Partial indexes reduce size by indexing only subset of rows:
-- Partial index for active orders only
CREATE INDEX idx_orders_active ON orders(order_date, total_amount)
WHERE status = 'pending';
-- Partial index with expression
CREATE INDEX idx_orders_high_value ON orders(total_amount)
WHERE total_amount > 10000;
-- Compare query plans
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE status = 'pending' AND order_date > '2024-01-01';
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE status = 'shipped' AND order_date > '2024-01-01';
-- Check partial index size
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'orders'
AND indexname LIKE 'idx_orders_%';
Partial Index Selectivity:
Covering Index (INCLUDE)
Covering indexes include additional columns to enable index-only scans:
-- Covering index with INCLUDE
CREATE INDEX idx_orders_covering ON orders(customer_id, order_date)
INCLUDE (total_amount, status);
-- Query can be satisfied entirely from index
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, order_date, total_amount, status
FROM orders
WHERE customer_id = 1234
AND order_date > '2024-01-01';
-- Compare with non-covering index
CREATE INDEX idx_orders_non_covering ON orders(customer_id, order_date);
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, order_date, total_amount, status
FROM orders
WHERE customer_id = 1234
AND order_date > '2024-01-01';
Index-Only Scan Benefits:
| Metric | Table Scan | Index Scan | Index-Only Scan |
|---|---|---|---|
| I/O | High | Medium | Low |
| CPU | Low | Medium | Medium |
| Memory | Low | Medium | High |
ℹ️
Covering Index Formula: For query with columns and , optimal covering index is:
Expression Index
-- Index on expression
CREATE INDEX idx_orders_upper_status ON orders(UPPER(status));
-- Index on function result
CREATE INDEX idx_orders_year ON orders(EXTRACT(YEAR FROM order_date));
-- Check if expression index is used
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE UPPER(status) = 'PENDING';
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024;
Multi-Column Index Ordering
The order of columns in composite index matters:
-- Optimal column ordering for selective predicates
CREATE INDEX idx_product_category_date ON product_sales(category, sale_date);
-- This uses the index efficiently
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM product_sales
WHERE category = 'Electronics' AND sale_date > '2024-01-01';
-- This only uses category prefix
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM product_sales
WHERE category = 'Electronics';
-- This cannot use the index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM product_sales
WHERE sale_date > '2024-01-01';
Index Maintenance
-- Monitor index bloat
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS size,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexname::regclass) -
(SELECT pg_relation_size(indexname::regclass) * 0.3)) AS bloat_estimate
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexname::regclass) DESC;
-- Rebuild bloated indexes
REINDEX INDEX idx_product_sales_composite;
-- Check index fragmentation
SELECT
indexname,
round(100 * pg_relation_size(indexname::regclass) /
nullif(pg_relation_size('product_sales'::regclass), 0), 2) AS index_ratio
FROM pg_indexes
WHERE tablename = 'product_sales';
Index Selection Algorithm
The optimizer evaluates indexes using:
Decision Matrix:
| Query Pattern | Recommended Index | Avoid |
|---|---|---|
| Equality on 1 column | B-Tree or Hash | GiST |
| Range on 1 column | B-Tree | Hash |
| Equality + Range | Composite B-Tree | Separate indexes |
| Full-text search | GiST with tsvector | B-Tree |
| Large time-series | BRIN | B-Tree |
| Low-selectivity filter | Partial index | Full index |
⚠️
Index Anti-Patterns:
- Over-indexing: Too many indexes slow down writes
- Wrong column order in composite indexes
- Not using partial indexes for filtered queries
- Ignoring index bloat and maintenance
Advanced: Index-Only Scan with Visibility Map
-- Check visibility map status
SELECT
relname,
pg_size_pretty(pg_relation_size(relname::regclass)) AS table_size,
pg_size_pretty(pg_total_relation_size(relname::regclass)) AS total_size,
CASE
WHEN pg_relation_size(relname::regclass) > 1073741824 THEN 'Consider BRIN'
WHEN pg_relation_size(relname::regclass) > 104857600 THEN 'Monitor bloat'
ELSE 'OK'
END AS recommendation
FROM pg_class
WHERE relkind = 'r'
AND relnamespace = 'public'::regnamespace
ORDER BY pg_relation_size(relname::regclass) DESC
LIMIT 10;
ℹ️
Pro Tip: Use pg_stat_user_indexes to find unused indexes. Drop them to improve write performance.