SQL Fundamentals
SELECT DISTINCT
Eliminate duplicates and discover the unique values hiding in your data.
- Single Column β find unique values in one field
- Multi-Column β find unique combinations across columns
- Count Unique β combine with COUNT for aggregate insights
DISTINCT reveals the true variety in your data.
What is DISTINCT?
DfDISTINCT
A keyword used in SELECT to eliminate duplicate rows from the result set. Only unique combinations of the specified columns are returned.
SELECT DISTINCT column1, column2
FROM table_name;
| Component | Description | Example |
|---|---|---|
DISTINCT | Keyword to remove duplicates | Required |
column1, column2 | Columns to check for uniqueness | city, state |
Simple Example
Without DISTINCT:
SELECT city FROM customers;
| city |
|---|
| New York |
| Los Angeles |
| Chicago |
| New York |
| New York |
With DISTINCT:
SELECT DISTINCT city FROM customers;
| city |
|---|
| New York |
| Los Angeles |
| Chicago |
DISTINCT on Multiple Columns
DfMulti-Column DISTINCT
When you apply DISTINCT to multiple columns, it removes rows where all specified columns have identical values. Two rows with different values in any column are both kept.
-- Get unique combinations of city AND state
SELECT DISTINCT city, state FROM customers;
| city | state |
|---|---|
| New York | NY |
| Los Angeles | CA |
| Chicago | IL |
| Houston | TX |
-- Unique combinations across three columns
SELECT DISTINCT city, state, country
FROM customer_locations;
DISTINCT applies to the entire row combination, not each column independently. If you need unique values per column, you need separate queries or UNION.
DISTINCT with COUNT
-- Count unique values
SELECT COUNT(DISTINCT city) AS unique_cities
FROM customers;
| unique_cities |
|---|
| 8 |
-- Count unique city-state combinations
SELECT COUNT(DISTINCT city, state) FROM customers;
-- Count unique values with condition
SELECT COUNT(DISTINCT category)
FROM products
WHERE price > 100;
DISTINCT vs GROUP BY
-- These produce the same result:
SELECT DISTINCT city FROM customers;
SELECT city FROM customers GROUP BY city;
| Approach | Result | When to Use |
|---|---|---|
SELECT DISTINCT | Unique values | Simple unique values only |
GROUP BY | Unique values | When you need aggregation |
-- Use DISTINCT for simple unique values
SELECT DISTINCT city FROM customers;
-- Use GROUP BY for aggregation
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city;
-- Use GROUP BY with HAVING
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city
HAVING COUNT(*) > 5;
GROUP BY is generally slightly faster than DISTINCT because it can leverage sorting optimizations. Use DISTINCT for simple deduplication, and GROUP BY when you need aggregate functions.
DISTINCT with NULL
-- DISTINCT treats NULL as a unique value
SELECT DISTINCT email FROM customers;
NULL is treated as a single unique value by DISTINCT. Multiple NULL rows are collapsed into one NULL in the result.
DISTINCT with ORDER BY
-- Combine DISTINCT with ORDER BY
SELECT DISTINCT city
FROM customers
ORDER BY city ASC;
| city |
|---|
| Boston |
| Chicago |
| Houston |
| Los Angeles |
| New York |
Common Use Cases
Finding Available Values
-- What cities do we have customers in?
SELECT DISTINCT city FROM customers ORDER BY city;
-- What product categories exist?
SELECT DISTINCT category FROM products;
-- What are the possible order statuses?
SELECT DISTINCT status FROM orders;
Checking Data Quality
-- Are there any duplicate emails?
SELECT email, COUNT(*) as count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;
Building Reference Data
-- Create a lookup table from existing data
CREATE TABLE cities AS
SELECT DISTINCT city, state
FROM customers
ORDER BY city;
-- Get all unique department names
SELECT DISTINCT department
FROM employees
WHERE department IS NOT NULL
ORDER BY department;
Performance Considerations
| Factor | Impact | Optimization |
|---|---|---|
| Table size | Larger = Slower | Add WHERE clause |
| Unique values | More unique = Slower | Consider GROUP BY |
| Index | With index = Faster | Create index on column |
| Column width | Wider = Slower | Use narrower columns |
DISTINCT requires the database to sort or hash all values to find duplicates. On large tables without indexes, this can be very slow. Always add a WHERE clause to reduce the dataset first.
Common Mistakes
-- BAD: DISTINCT on all columns
SELECT DISTINCT * FROM customers;
-- GOOD: DISTINCT on specific column
SELECT DISTINCT city FROM customers;
-- BAD: Using DISTINCT when not needed
SELECT DISTINCT first_name, last_name, email
FROM customers;
-- GOOD: Remove DISTINCT if duplicates unlikely
SELECT first_name, last_name, email
FROM customers;
-- BAD: DISTINCT with COUNT(*)
SELECT DISTINCT COUNT(*) FROM customers;
-- GOOD: COUNT(DISTINCT column)
SELECT COUNT(DISTINCT city) FROM customers;
Practice Exercises
Exercise 1: Find all unique product categories from a products table.
SELECT DISTINCT category FROM products;
Exercise 2: Find all unique city-state combinations from a customers table.
SELECT DISTINCT city, state FROM customers;
Exercise 3: Count the number of unique customers who have placed orders.
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;
Exercise 4: Find all unique order statuses, sorted alphabetically.
SELECT DISTINCT status
FROM orders
WHERE status IS NOT NULL
ORDER BY status ASC;
Key Takeaways
- DISTINCT removes duplicate rows from results
- Works on single or multiple columns β checks the full combination
- Use with COUNT to count unique values:
COUNT(DISTINCT column) - NULL is treated as a unique value β multiple NULLs collapse to one
- GROUP BY is similar but more powerful β use it when you need aggregation