SQL Fundamentals

SELECT DISTINCT

Eliminate duplicates and discover the unique values hiding in your data.

Single Column — find unique values in one field
Multi-Column — find unique combinations across columns
Count Unique — combine with COUNT for aggregate insights

DISTINCT reveals the true variety in your data.

What is DISTINCT?

DfDISTINCT

A keyword used in SELECT to eliminate duplicate rows from the result set. Only unique combinations of the specified columns are returned.

SELECT DISTINCT column1, column2
FROM table_name;

Component	Description	Example
`DISTINCT`	Keyword to remove duplicates	Required
`column1, column2`	Columns to check for uniqueness	`city, state`

Simple Example

Without DISTINCT:

SELECT city FROM customers;

city
New York
Los Angeles
Chicago
New York
New York

With DISTINCT:

SELECT DISTINCT city FROM customers;

city
New York
Los Angeles
Chicago

DISTINCT on Multiple Columns

DfMulti-Column DISTINCT

When you apply DISTINCT to multiple columns, it removes rows where all specified columns have identical values. Two rows with different values in any column are both kept.

-- Get unique combinations of city AND state
SELECT DISTINCT city, state FROM customers;

city	state
New York	NY
Los Angeles	CA
Chicago	IL
Houston	TX

-- Unique combinations across three columns
SELECT DISTINCT city, state, country
FROM customer_locations;

DISTINCT applies to the entire row combination, not each column independently. If you need unique values per column, you need separate queries or UNION.

DISTINCT with COUNT

-- Count unique values
SELECT COUNT(DISTINCT city) AS unique_cities
FROM customers;

unique_cities
8

-- Count unique city-state combinations
SELECT COUNT(DISTINCT city, state) FROM customers;

-- Count unique values with condition
SELECT COUNT(DISTINCT category)
FROM products
WHERE price > 100;

DISTINCT vs GROUP BY

-- These produce the same result:
SELECT DISTINCT city FROM customers;
SELECT city FROM customers GROUP BY city;

Approach	Result	When to Use
`SELECT DISTINCT`	Unique values	Simple unique values only
`GROUP BY`	Unique values	When you need aggregation

-- Use DISTINCT for simple unique values
SELECT DISTINCT city FROM customers;

-- Use GROUP BY for aggregation
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city;

-- Use GROUP BY with HAVING
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city
HAVING COUNT(*) > 5;

GROUP BY is generally slightly faster than DISTINCT because it can leverage sorting optimizations. Use DISTINCT for simple deduplication, and GROUP BY when you need aggregate functions.

DISTINCT with NULL

-- DISTINCT treats NULL as a unique value
SELECT DISTINCT email FROM customers;

email
alice@email.com
bob@email.com
NULL
carol@email.com

NULL is treated as a single unique value by DISTINCT. Multiple NULL rows are collapsed into one NULL in the result.

DISTINCT with ORDER BY

-- Combine DISTINCT with ORDER BY
SELECT DISTINCT city
FROM customers
ORDER BY city ASC;

city
Boston
Chicago
Houston
Los Angeles
New York

Common Use Cases

Finding Available Values

-- What cities do we have customers in?
SELECT DISTINCT city FROM customers ORDER BY city;

-- What product categories exist?
SELECT DISTINCT category FROM products;

-- What are the possible order statuses?
SELECT DISTINCT status FROM orders;

Checking Data Quality

-- Are there any duplicate emails?
SELECT email, COUNT(*) as count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

Building Reference Data

-- Create a lookup table from existing data
CREATE TABLE cities AS
SELECT DISTINCT city, state
FROM customers
ORDER BY city;

-- Get all unique department names
SELECT DISTINCT department
FROM employees
WHERE department IS NOT NULL
ORDER BY department;

Performance Considerations

Factor	Impact	Optimization
Table size	Larger = Slower	Add WHERE clause
Unique values	More unique = Slower	Consider GROUP BY
Index	With index = Faster	Create index on column
Column width	Wider = Slower	Use narrower columns

DISTINCT requires the database to sort or hash all values to find duplicates. On large tables without indexes, this can be very slow. Always add a WHERE clause to reduce the dataset first.

Common Mistakes

-- BAD: DISTINCT on all columns
SELECT DISTINCT * FROM customers;

-- GOOD: DISTINCT on specific column
SELECT DISTINCT city FROM customers;

-- BAD: Using DISTINCT when not needed
SELECT DISTINCT first_name, last_name, email
FROM customers;

-- GOOD: Remove DISTINCT if duplicates unlikely
SELECT first_name, last_name, email
FROM customers;

-- BAD: DISTINCT with COUNT(*)
SELECT DISTINCT COUNT(*) FROM customers;

-- GOOD: COUNT(DISTINCT column)
SELECT COUNT(DISTINCT city) FROM customers;

Practice Exercises

Exercise 1: Find all unique product categories from a products table.

SELECT DISTINCT category FROM products;

Exercise 2: Find all unique city-state combinations from a customers table.

SELECT DISTINCT city, state FROM customers;

Exercise 3: Count the number of unique customers who have placed orders.

SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;

Exercise 4: Find all unique order statuses, sorted alphabetically.

SELECT DISTINCT status
FROM orders
WHERE status IS NOT NULL
ORDER BY status ASC;

Key Takeaways

DISTINCT removes duplicate rows from results
Works on single or multiple columns — checks the full combination
Use with COUNT to count unique values: COUNT(DISTINCT column)
NULL is treated as a unique value — multiple NULLs collapse to one
GROUP BY is similar but more powerful — use it when you need aggregation

SELECT DISTINCT

SELECT DISTINCT

What is DISTINCT?

DfDISTINCT

Simple Example

DISTINCT on Multiple Columns

DfMulti-Column DISTINCT

DISTINCT with COUNT

DISTINCT vs GROUP BY

DISTINCT with NULL

DISTINCT with ORDER BY

Common Use Cases

Finding Available Values

Checking Data Quality

Building Reference Data

Performance Considerations

Common Mistakes

Practice Exercises

Key Takeaways

Premium Content

Need Expert SQL Help?