The Interview Question
"Given a table of user logins, write a query to find the second most active user by login count in each country."
SQL live coding tests whether you can think and write SQL under pressure β a critical skill for data scientists who spend significant time querying databases.
Why Companies Ask This
βΉοΈ
Meta and Amazon use SQL as the primary tool for data analysis. They need to verify you can write correct, efficient, and readable SQL β not just in theory, but in real-time with an interviewer watching.
Interviewers evaluate:
- SQL Proficiency β Can you write complex queries correctly?
- Problem Decomposition β Can you break a problem into logical steps?
- Optimization Thinking β Do you consider performance?
- Communication β Can you explain your approach as you code?
- Edge Case Awareness β Do you handle NULLs, ties, and edge cases?
Common SQL Problem Categories
1. Window Functions (Most Common)
2. Joins and Subqueries
3. Aggregation and Grouping
4. String and Date Manipulation
5. Complex Business Logic
Example Problem: Second Most Active User per Country
Step 1: Clarify the Problem
"Before I start, let me clarify: What should we do if there's a tie for second place? Should we return all tied users, or just one? Also, are we looking at total login count across all time, or a specific time period?"
Step 2: Think Through the Approach
-- Approach:
-- 1. Count logins per user per country
-- 2. Rank users within each country by login count
-- 3. Filter for rank = 2
-- Key decision: Use DENSE_RANK() to handle ties properly
-- DENSE_RANK: 1, 2, 2, 3 (no gaps for ties)
-- RANK: 1, 2, 2, 4 (gaps after ties)
-- ROW_NUMBER: 1, 2, 3, 4 (arbitrary tie-breaking)
Step 3: Write the Query
WITH user_login_counts AS (
SELECT
country,
user_id,
COUNT(*) AS login_count
FROM user_logins
GROUP BY country, user_id
),
ranked_users AS (
SELECT
country,
user_id,
login_count,
DENSE_RANK() OVER (
PARTITION BY country
ORDER BY login_count DESC
) AS rank_in_country
FROM user_login_counts
)
SELECT
country,
user_id,
login_count
FROM ranked_users
WHERE rank_in_country = 2
ORDER BY country, login_count DESC;
Step 4: Explain Your Thinking
"I used a CTE to first aggregate login counts per user per country, then applied DENSE_RANK to rank users within each country. I chose DENSE_RANK over RANK or ROW_NUMBER because:
- If two users tie for second, DENSE_RANK returns both (which is typically what the business wants)
- RANK would skip to rank 4 after a tie at rank 2
- ROW_NUMBER would arbitrarily pick one user"
Advanced SQL Patterns
Pattern 1: Running Totals and Moving Averages
-- Calculate 7-day moving average of daily active users
WITH daily_metrics AS (
SELECT
date,
COUNT(DISTINCT user_id) AS dau
FROM user_activity
GROUP BY date
)
SELECT
date,
dau,
AVG(dau) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d,
SUM(dau) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_users
FROM daily_metrics
ORDER BY date;
Pattern 2: Sessionization
-- Group consecutive events into sessions (30-min gap = new session)
WITH events_with_lag AS (
SELECT
user_id,
event_time,
LAG(event_time) OVER (
PARTITION BY user_id
ORDER BY event_time
) AS prev_event_time
FROM user_events
),
session_boundaries AS (
SELECT
user_id,
event_time,
CASE
WHEN prev_event_time IS NULL
OR event_time - prev_event_time > INTERVAL '30 minutes'
THEN 1
ELSE 0
END AS new_session
FROM events_with_lag
),
sessions AS (
SELECT
user_id,
event_time,
SUM(new_session) OVER (
PARTITION BY user_id
ORDER BY event_time
) AS session_id
FROM session_boundaries
)
SELECT
user_id,
session_id,
MIN(event_time) AS session_start,
MAX(event_time) AS session_end,
COUNT(*) AS events_in_session,
EXTRACT(EPOCH FROM (
MAX(event_time) - MIN(event_time)
)) / 60 AS session_duration_minutes
FROM sessions
GROUP BY user_id, session_id;
Pattern 3: Cohort Analysis
-- Monthly retention cohort analysis
WITH user_first_activity AS (
SELECT
user_id,
DATE_TRUNC('month', MIN(activity_date)) AS cohort_month
FROM user_activity
GROUP BY user_id
),
monthly_activity AS (
SELECT
user_id,
DATE_TRUNC('month', activity_date) AS activity_month
FROM user_activity
GROUP BY user_id, DATE_TRUNC('month', activity_date)
),
cohort_activity AS (
SELECT
ufa.cohort_month,
EXTRACT(MONTH FROM AGE(
ma.activity_month, ufa.cohort_month
)) AS months_since_signup,
COUNT(DISTINCT ma.user_id) AS active_users
FROM user_first_activity ufa
JOIN monthly_activity ma ON ufa.user_id = ma.user_id
GROUP BY ufa.cohort_month, months_since_signup
)
SELECT
cohort_month,
MAX(CASE WHEN months_since_signup = 0 THEN active_users END) AS month_0,
MAX(CASE WHEN months_since_signup = 1 THEN active_users END) AS month_1,
MAX(CASE WHEN months_since_signup = 2 THEN active_users END) AS month_2,
MAX(CASE WHEN months_since_signup = 3 THEN active_users END) AS month_3,
ROUND(
MAX(CASE WHEN months_since_signup = 1 THEN active_users END)::FLOAT /
MAX(CASE WHEN months_since_signup = 0 THEN active_users END),
3
) AS retention_rate_m1
FROM cohort_activity
GROUP BY cohort_month
ORDER BY cohort_month;
Pattern 4: Gap and Islands (Consecutive Days)
-- Find consecutive days where a user was active
WITH daily_active AS (
SELECT DISTINCT
user_id,
activity_date
FROM user_activity
),
dated_groups AS (
SELECT
user_id,
activity_date,
activity_date - INTERVAL '1 day' * (
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY activity_date
)
) AS group_date
FROM daily_active
)
SELECT
user_id,
MIN(activity_date) AS streak_start,
MAX(activity_date) AS streak_end,
COUNT(*) AS streak_length
FROM dated_groups
GROUP BY user_id, group_date
HAVING COUNT(*) >= 7 -- At least 7 consecutive days
ORDER BY streak_length DESC;
Performance Optimization Tips
π‘
Meta and Amazon care about query performance because their tables have billions of rows. A slow query isn't just inconvenient β it's expensive.
1. Filter Early
-- BAD: Aggregates all data, then filters
SELECT user_id, COUNT(*)
FROM events
GROUP BY user_id
HAVING COUNT(*) > 100;
-- GOOD: Filters partition first, then aggregates
SELECT user_id, COUNT(*)
FROM events
WHERE date >= '2026-01-01' -- Filter partition first
GROUP BY user_id
HAVING COUNT(*) > 100;
2. Use Approximate Functions for Large Data
-- For billion-row tables, exact counts are expensive
-- Use HyperLogLog for approximate distinct counts
SELECT
date,
approx_count_distinct(user_id) AS approx_dau
FROM events
GROUP BY date;
-- Use T-Digest for approximate percentiles
SELECT
date,
approx_percentile(duration_ms, 0.95) AS p95_duration
FROM events
GROUP BY date;
3. Avoid Cartesian Products
-- BAD: Implicit cross join
SELECT *
FROM orders, customers;
-- GOOD: Explicit join with condition
SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.id;
4. Use CTEs for Readability (But Watch Performance)
-- CTEs are readable but may not be optimized in all databases
-- In PostgreSQL, CTEs are optimization fences (materialized)
-- In some databases, CTEs are inlined
-- If performance matters, consider:
-- 1. Temporary tables for large intermediate results
-- 2. Inline subqueries for simple cases
-- 3. CTEs for readability when performance isn't critical
Common Mistakes in SQL Interviews
1. Not Handling NULLs
-- NULLs break aggregations and comparisons
-- Always consider: COUNT(*) vs COUNT(column), NULL comparisons
-- BAD: This returns NULL, not 0
SELECT user_id, COUNT(DISTINCT order_id)
FROM orders
GROUP BY user_id;
-- GOOD: COALESCE to handle NULLs
SELECT
user_id,
COALESCE(COUNT(DISTINCT order_id), 0) AS order_count
FROM orders
GROUP BY user_id;
2. Wrong Join Type
-- INNER JOIN excludes users with no orders
-- LEFT JOIN includes all users
-- BAD: Only returns users who have orders
SELECT u.user_id, COUNT(o.order_id)
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id;
-- GOOD: Returns all users, even those with no orders
SELECT
u.user_id,
COALESCE(COUNT(o.order_id), 0) AS order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id;
3. Not Considering Data Types
-- Integer division truncates
SELECT 5 / 2; -- Returns 2, not 2.5
-- GOOD: Cast to float
SELECT 5.0 / 2; -- Returns 2.5
SELECT CAST(5 AS FLOAT) / 2; -- Returns 2.5
Interview Strategy
Communication Tips
- Read the problem aloud β Show you understand it
- Ask clarifying questions β Shows thoroughness
- Explain your approach before coding β Shows structured thinking
- Talk through your code as you write it β Shows communication skills
- Test your query mentally β Shows attention to detail
- Discuss optimizations β Shows performance awareness
Time Management
- Minutes 1-2: Understand the problem
- Minutes 3-4: Design your approach
- Minutes 5-12: Write the query
- Minutes 13-15: Review and optimize