GROUP BY Clause
Collapse thousands of rows into meaningful summaries with grouping and aggregation.
- Grouping β bucket rows by one or more columns
- Aggregation β compute COUNT, SUM, AVG, MIN, MAX per group
- Filtering Groups β use HAVING to filter after aggregation Turn detail into insight.
What Is GROUP BY?
DfGROUP BY
The GROUP BY clause divides a result set into groups based on the values in one or more columns. Aggregate functions (COUNT, SUM, AVG, MIN, MAX) are then applied to each group, producing one output row per group. It is the foundation of data aggregation in SQL.
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
Any column in the SELECT list that is not wrapped in an aggregate function must appear in the GROUP BY clause. Most databases enforce this rule strictly.
Syntax
SELECT
column1,
aggregate_function(column2)
FROM table_name
GROUP BY column1;
| Element | Purpose |
|---|---|
GROUP BY | Defines the grouping column(s) |
SELECT | Columns to display β must be aggregated or in GROUP BY |
| Aggregate function | COUNT, SUM, AVG, MIN, MAX, etc. |
Basic Example
-- Count employees and average salary per department
SELECT
department,
COUNT(*) AS employee_count,
ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;
Single Column Grouping
-- Total salary expenditure per department
SELECT
department,
COUNT(*) AS headcount,
SUM(salary) AS total_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees
GROUP BY department;
| Department | Headcount | Total Salary | Min | Max |
|---|---|---|---|---|
| Engineering | 45 | 60,000 | $150,000 | |
| Marketing | 20 | 45,000 | $110,000 | |
| Sales | 35 | 40,000 | $120,000 |
Multiple Column Grouping
-- Breakdown by department and job title
SELECT
department,
job_title,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title
ORDER BY department, avg_salary DESC;
-- Monthly revenue by product
SELECT
YEAR(order_date) AS order_year,
MONTH(order_date) AS order_month,
product_id,
COUNT(*) AS order_count,
SUM(total_amount) AS total_revenue
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date), product_id
ORDER BY order_year, order_month, total_revenue DESC;
WHERE vs HAVING
| Clause | Filters | Timing | Can Use Aggregates |
|---|---|---|---|
WHERE | Individual rows | Before GROUP BY | No |
HAVING | Groups | After GROUP BY | Yes |
Query Execution Order
-- WHERE filters rows first, then GROUP BY groups, then HAVING filters groups
SELECT
department,
COUNT(*) AS emp_count,
AVG(salary) AS avg_salary
FROM employees
WHERE status = 'Active' -- Step 1: filter rows
GROUP BY department -- Step 2: group remaining rows
HAVING AVG(salary) > 75000 -- Step 3: filter groups
ORDER BY avg_salary DESC; -- Step 4: sort
Examples
-- WHERE: filter before grouping (only active employees)
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE status = 'Active'
GROUP BY department;
-- HAVING: filter after grouping (departments with more than 5 people)
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
-- Combined: active employees, grouped, then filter high-paying departments
SELECT
department,
COUNT(*) AS emp_count,
AVG(salary) AS avg_salary
FROM employees
WHERE status = 'Active'
GROUP BY department
HAVING AVG(salary) > 75000
ORDER BY avg_salary DESC;
You cannot use aggregate functions in a WHERE clause. WHERE COUNT(*) > 5 is invalid β use HAVING instead. Conversely, you cannot use column aliases in HAVING β repeat the full expression.
GROUP BY with NULLs
NULL values form their own group in GROUP BY.
-- NULLs are grouped together
SELECT
department_id,
COUNT(*) AS employee_count
FROM employees
GROUP BY department_id
ORDER BY department_id;
-- Replace NULLs with a default value for cleaner output
SELECT
COALESCE(department_id, 0) AS department_id,
COUNT(*) AS employee_count
FROM employees
GROUP BY COALESCE(department_id, 0);
GROUP BY with ROLLUP and CUBE
-- ROLLUP: adds subtotals and a grand total
SELECT
department,
job_title,
COUNT(*) AS employee_count
FROM employees
GROUP BY ROLLUP (department, job_title);
-- CUBE: all possible subtotals (not supported in MySQL)
SELECT
department,
job_title,
COUNT(*) AS employee_count
FROM employees
GROUP BY CUBE (department, job_title);
| Extension | Subtotals Generated |
|---|---|
GROUP BY | Individual groups only |
ROLLUP | Hierarchical subtotals + grand total |
CUBE | All possible subtotal combinations + grand total |
GROUP BY with JOINs
-- Aggregate order data joined with customer info
SELECT
c.first_name,
c.last_name,
COUNT(o.order_id) AS order_count,
SUM(o.total_amount) AS total_spent
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
HAVING COUNT(o.order_id) > 5
ORDER BY total_spent DESC;
Performance Considerations
| Factor | Recommendation |
|---|---|
| Indexes | Create indexes on GROUP BY columns |
| Sorting | GROUP BY may require implicit sorting |
| Memory | Large groups can exceed memory limits |
| Statistics | Keep table statistics current |
| Execution plan | Use EXPLAIN to verify index usage |
-- Index recommendations for common GROUP BY patterns
CREATE INDEX idx_employees_department ON employees(department);
CREATE INDEX idx_employees_dept_job ON employees(department, job_title);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_product_date ON orders(product_id, order_date);
Practice Exercises
Exercise 1: Count the number of employees in each department.
-- Solution
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department
ORDER BY employee_count DESC;
Exercise 2: Find the average salary per department for departments with more than 10 employees.
-- Solution
SELECT
department,
COUNT(*) AS employee_count,
ROUND(AVG(salary), 2) AS avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) > 10
ORDER BY avg_salary DESC;
Exercise 3: Generate a monthly sales report for 2024 with order count, total revenue, and average order value.
-- Solution
SELECT
MONTH(order_date) AS month_number,
MONTHNAME(order_date) AS month_name,
COUNT(*) AS order_count,
SUM(total_amount) AS total_revenue,
ROUND(AVG(total_amount), 2) AS avg_order_value
FROM orders
WHERE YEAR(order_date) = 2024
GROUP BY MONTH(order_date), MONTHNAME(order_date)
ORDER BY month_number;
Key Takeaways
- GROUP BY collapses rows into groups based on column values.
- Aggregate functions (COUNT, SUM, AVG, MIN, MAX) operate on each group.
- WHERE filters individual rows before grouping; HAVING filters groups after aggregation.
- Any non-aggregated column in SELECT must appear in GROUP BY.
- NULL values form their own group automatically.
- Use ROLLUP or CUBE for subtotals and grand totals.
- Always index GROUP BY columns for optimal performance.