SQL Aggregation

GROUP BY Clause

Collapse thousands of rows into meaningful summaries with grouping and aggregation.

Grouping — bucket rows by one or more columns
Aggregation — compute COUNT, SUM, AVG, MIN, MAX per group
Filtering Groups — use HAVING to filter after aggregation Turn detail into insight.

What Is GROUP BY?

DfGROUP BY

The GROUP BY clause divides a result set into groups based on the values in one or more columns. Aggregate functions (COUNT, SUM, AVG, MIN, MAX) are then applied to each group, producing one output row per group. It is the foundation of data aggregation in SQL.

SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;

Any column in the SELECT list that is not wrapped in an aggregate function must appear in the GROUP BY clause. Most databases enforce this rule strictly.

Syntax

SELECT
    column1,
    aggregate_function(column2)
FROM table_name
GROUP BY column1;

Element	Purpose
`GROUP BY`	Defines the grouping column(s)
`SELECT`	Columns to display — must be aggregated or in GROUP BY
Aggregate function	COUNT, SUM, AVG, MIN, MAX, etc.

Basic Example

-- Count employees and average salary per department
SELECT
    department,
    COUNT(*) AS employee_count,
    ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

Single Column Grouping

-- Total salary expenditure per department
SELECT
    department,
    COUNT(*) AS headcount,
    SUM(salary) AS total_salary,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary
FROM employees
GROUP BY department;

Department	Headcount	Total Salary	Min
Engineering	45	$4,500,000 \|$ 60,000	$150,000
Marketing	20	$1,600,000 \|$ 45,000	$110,000
Sales	35	$2,450,000 \|$ 40,000	$120,000

Multiple Column Grouping

-- Breakdown by department and job title
SELECT
    department,
    job_title,
    COUNT(*) AS employee_count,
    AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title
ORDER BY department, avg_salary DESC;

-- Monthly revenue by product
SELECT
    YEAR(order_date) AS order_year,
    MONTH(order_date) AS order_month,
    product_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_revenue
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date), product_id
ORDER BY order_year, order_month, total_revenue DESC;

WHERE vs HAVING

Clause	Filters	Timing	Can Use Aggregates
`WHERE`	Individual rows	Before GROUP BY	No
`HAVING`	Groups	After GROUP BY	Yes

Query Execution Order

-- WHERE filters rows first, then GROUP BY groups, then HAVING filters groups
SELECT
    department,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employees
WHERE status = 'Active'            -- Step 1: filter rows
GROUP BY department                 -- Step 2: group remaining rows
HAVING AVG(salary) > 75000         -- Step 3: filter groups
ORDER BY avg_salary DESC;          -- Step 4: sort

Examples

-- WHERE: filter before grouping (only active employees)
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE status = 'Active'
GROUP BY department;

-- HAVING: filter after grouping (departments with more than 5 people)
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;

-- Combined: active employees, grouped, then filter high-paying departments
SELECT
    department,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employees
WHERE status = 'Active'
GROUP BY department
HAVING AVG(salary) > 75000
ORDER BY avg_salary DESC;

You cannot use aggregate functions in a WHERE clause. WHERE COUNT(*) > 5 is invalid — use HAVING instead. Conversely, you cannot use column aliases in HAVING — repeat the full expression.

GROUP BY with NULLs

NULL values form their own group in GROUP BY.

-- NULLs are grouped together
SELECT
    department_id,
    COUNT(*) AS employee_count
FROM employees
GROUP BY department_id
ORDER BY department_id;

-- Replace NULLs with a default value for cleaner output
SELECT
    COALESCE(department_id, 0) AS department_id,
    COUNT(*) AS employee_count
FROM employees
GROUP BY COALESCE(department_id, 0);

GROUP BY with ROLLUP and CUBE

-- ROLLUP: adds subtotals and a grand total
SELECT
    department,
    job_title,
    COUNT(*) AS employee_count
FROM employees
GROUP BY ROLLUP (department, job_title);

-- CUBE: all possible subtotals (not supported in MySQL)
SELECT
    department,
    job_title,
    COUNT(*) AS employee_count
FROM employees
GROUP BY CUBE (department, job_title);

Extension	Subtotals Generated
`GROUP BY`	Individual groups only
`ROLLUP`	Hierarchical subtotals + grand total
`CUBE`	All possible subtotal combinations + grand total

GROUP BY with JOINs

-- Aggregate order data joined with customer info
SELECT
    c.first_name,
    c.last_name,
    COUNT(o.order_id) AS order_count,
    SUM(o.total_amount) AS total_spent
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
HAVING COUNT(o.order_id) > 5
ORDER BY total_spent DESC;

Performance Considerations

Factor	Recommendation
Indexes	Create indexes on GROUP BY columns
Sorting	GROUP BY may require implicit sorting
Memory	Large groups can exceed memory limits
Statistics	Keep table statistics current
Execution plan	Use EXPLAIN to verify index usage

-- Index recommendations for common GROUP BY patterns
CREATE INDEX idx_employees_department ON employees(department);
CREATE INDEX idx_employees_dept_job ON employees(department, job_title);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_product_date ON orders(product_id, order_date);

Practice Exercises

Exercise 1: Count the number of employees in each department.

-- Solution
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department
ORDER BY employee_count DESC;

Exercise 2: Find the average salary per department for departments with more than 10 employees.

-- Solution
SELECT
    department,
    COUNT(*) AS employee_count,
    ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) > 10
ORDER BY avg_salary DESC;

Exercise 3: Generate a monthly sales report for 2024 with order count, total revenue, and average order value.

-- Solution
SELECT
    MONTH(order_date) AS month_number,
    MONTHNAME(order_date) AS month_name,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_revenue,
    ROUND(AVG(total_amount), 2) AS avg_order_value
FROM orders
WHERE YEAR(order_date) = 2024
GROUP BY MONTH(order_date), MONTHNAME(order_date)
ORDER BY month_number;

Key Takeaways

GROUP BY collapses rows into groups based on column values.
Aggregate functions (COUNT, SUM, AVG, MIN, MAX) operate on each group.
WHERE filters individual rows before grouping; HAVING filters groups after aggregation.
Any non-aggregated column in SELECT must appear in GROUP BY.
NULL values form their own group automatically.
Use ROLLUP or CUBE for subtotals and grand totals.
Always index GROUP BY columns for optimal performance.

GROUP BY Clause

GROUP BY Clause

What Is GROUP BY?

DfGROUP BY

Syntax

Basic Example

Single Column Grouping

Multiple Column Grouping

WHERE vs HAVING

Query Execution Order

Examples

GROUP BY with NULLs

GROUP BY with ROLLUP and CUBE

GROUP BY with JOINs

Performance Considerations

Practice Exercises

Key Takeaways

See Also

Premium Content

Need Expert SQL Help?