πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Database Normalization

SQL PerformanceNormalization🟒 Free Lesson

Advertisement

SQL Performance

Database Normalization

Design efficient databases by eliminating data redundancy through normalization.

  • Normal Forms β€” Structured approach from 1NF to BCNF for organizing data
  • Data Integrity β€” Prevent anomalies during insert, update, and delete operations
  • Schema Design β€” Build flexible databases that scale with your application Good database design is the foundation of efficient, maintainable data systems.

What is Normalization?

DfNormalization

Normalization is the process of organizing relational database tables to reduce data redundancy and improve data integrity. It involves decomposing tables into smaller, well-structured tables and defining relationships between them, following a series of normal forms (1NF, 2NF, 3NF, BCNF).

Without normalization, databases suffer from insertion anomalies, update anomalies, and deletion anomalies β€” situations where adding, changing, or removing data causes unintended loss of information.

Unnormalizedorder_id | customer1 | Aliceproducts: A,B,CViolation:Repeating groupin single column1NForder | cust | prod1 | Alice | A1 | Alice | B1 | Alice | CAtomic valuesEach row unique2NForders tableorder_id | customerproducts tableproduct_id | name3NFordersorder_id, cust_idcustomerscust_id, nameproductsproduct_id, name1NF: Atomic values β†’ 2NF: No partial deps β†’ 3NF: No transitive deps
AnomalyDescriptionExample
InsertionCannot add data without unrelated dataNeed a customer to add an order
UpdateChanging one row requires updating multiple rowsCustomer address stored in every order
DeletionDeleting a row loses unrelated dataCanceling an order deletes customer info

Normal Forms Explained

First Normal Form (1NF)

DfFirst Normal Form (1NF)

A table is in 1NF if all columns contain atomic (indivisible) values and each row is unique. No repeating groups or arrays are allowed in a single column.

-- VIOLATION: Repeating group in a single column
CREATE TABLE orders_bad (
    order_id INTEGER PRIMARY KEY,
    customer_name TEXT,
    products TEXT  -- "Widget A, Gadget B, Widget C" (not atomic!)
);

-- COMPLIANT: 1NF with atomic values
CREATE TABLE orders_1nf (
    order_id INTEGER PRIMARY KEY,
    customer_name TEXT NOT NULL,
    product_name TEXT NOT NULL,
    quantity INTEGER NOT NULL
);

-- Insert data in 1NF format
INSERT INTO orders_1nf (order_id, customer_name, product_name, quantity)
VALUES
    (1, 'Alice Smith', 'Widget A', 3),
    (1, 'Alice Smith', 'Gadget B', 1),
    (2, 'Bob Jones', 'Widget C', 5);
1NF eliminates repeating groups by ensuring each column holds a single value. If you find comma-separated values in a column, it violates 1NF.

Second Normal Form (2NF)

DfSecond Normal Form (2NF)

A table is in 2NF if it is in 1NF and all non-key columns are fully dependent on the entire primary key. This eliminates partial dependencies where a column depends on only part of a composite key.

-- VIOLATION: Partial dependency (product_name depends only on product_id, not order_id)
CREATE TABLE order_items_bad (
    order_id INTEGER,
    product_id INTEGER,
    product_name TEXT,  -- Depends only on product_id
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

-- COMPLIANT: 2NF by splitting into separate tables
CREATE TABLE products_2nf (
    product_id INTEGER PRIMARY KEY,
    product_name TEXT NOT NULL,
    unit_price DECIMAL(10,2)
);

CREATE TABLE order_items_2nf (
    order_id INTEGER,
    product_id INTEGER,
    quantity INTEGER NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (product_id) REFERENCES products_2nf(product_id)
);

Third Normal Form (3NF)

DfThird Normal Form (3NF)

A table is in 3NF if it is in 2NF and no non-key column depends on another non-key column (no transitive dependencies). Every non-key attribute must depend only on the primary key.

-- VIOLATION: Transitive dependency (city -> state)
CREATE TABLE customers_bad (
    customer_id INTEGER PRIMARY KEY,
    customer_name TEXT NOT NULL,
    city TEXT NOT NULL,
    state TEXT NOT NULL  -- Depends on city, not customer_id!
);

-- COMPLIANT: 3NF by extracting city-state relationship
CREATE TABLE locations_3nf (
    city TEXT PRIMARY KEY,
    state TEXT NOT NULL
);

CREATE TABLE customers_3nf (
    customer_id INTEGER PRIMARY KEY,
    customer_name TEXT NOT NULL,
    city TEXT NOT NULL,
    FOREIGN KEY (city) REFERENCES locations_3nf(city)
);
Over-normalization can lead to excessive JOINs that hurt query performance. Sometimes a controlled denormalization is the right trade-off for read-heavy systems.

Boyce-Codd Normal Form (BCNF)

DfBoyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF. A table is in BCNF if for every functional dependency X β†’ Y, X is a superkey. This handles edge cases where 3NF is insufficient due to overlapping candidate keys.

-- BCNF example: Instructor-Course-Student relationship
CREATE TABLE teaches (
    instructor_id INTEGER,
    course_id INTEGER,
    student_id INTEGER,
    semester TEXT NOT NULL,
    PRIMARY KEY (instructor_id, course_id, student_id)
);

-- If course_id determines instructor_id (each course has one instructor),
-- we decompose to satisfy BCNF:
CREATE TABLE course_instructor (
    course_id INTEGER PRIMARY KEY,
    instructor_id INTEGER NOT NULL
);

CREATE TABLE enrollment (
    course_id INTEGER,
    student_id INTEGER,
    semester TEXT NOT NULL,
    PRIMARY KEY (course_id, student_id),
    FOREIGN KEY (course_id) REFERENCES course_instructor(course_id)
);

Normalization Summary Table

Normal FormRequirementWhat It Eliminates
1NFAtomic values, unique rowsRepeating groups
2NF1NF + full functional dependencyPartial dependencies
3NF2NF + no transitive dependenciesTransitive dependencies
BCNFStricter 3NF for all FDsAnomalies from overlapping keys
4NFBCNF + no multi-valued dependenciesIndependent multi-valued facts
5NF4NF + no join dependenciesLossy decompositions

Practical Example: E-Commerce Database

-- Denormalized (bad) single table
CREATE TABLE order_data_denorm (
    order_id INTEGER PRIMARY KEY,
    customer_name TEXT,
    customer_email TEXT,
    product_name TEXT,
    product_category TEXT,
    product_price DECIMAL(10,2),
    order_date DATE,
    ship_address TEXT
);

-- Normalized (good) schema
CREATE TABLE customers_norm (
    customer_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE
);

CREATE TABLE addresses (
    address_id INTEGER PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    street TEXT NOT NULL,
    city TEXT NOT NULL,
    state TEXT NOT NULL,
    zip_code TEXT NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES customers_norm(customer_id)
);

CREATE TABLE categories (
    category_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL
);

CREATE TABLE products_norm (
    product_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    category_id INTEGER NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    FOREIGN KEY (category_id) REFERENCES categories(category_id)
);

CREATE TABLE orders_norm (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    address_id INTEGER NOT NULL,
    order_date DATE NOT NULL DEFAULT CURRENT_DATE,
    FOREIGN KEY (customer_id) REFERENCES customers_norm(customer_id),
    FOREIGN KEY (address_id) REFERENCES addresses(address_id)
);

CREATE TABLE order_items_norm (
    order_id INTEGER,
    product_id INTEGER,
    quantity INTEGER NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders_norm(order_id),
    FOREIGN KEY (product_id) REFERENCES products_norm(product_id)
);
A common rule of thumb: normalize your OLTP (transactional) systems for write performance and data integrity, but consider denormalizing your OLAP (analytical) systems for read performance.

Checking Normalization Violations

-- Find duplicate customer emails (potential data integrity issue)
SELECT email, COUNT(*) AS occurrences
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

-- Detect transitive dependency: city determines state
SELECT city, COUNT(DISTINCT state) AS unique_states
FROM customers
GROUP BY city
HAVING COUNT(DISTINCT state) > 1;

-- Find tables with redundant data patterns
SELECT
    table_name,
    column_name,
    COUNT(*) OVER (PARTITION BY column_name) AS duplicate_count
FROM information_schema.columns
WHERE table_schema = 'public';
In production databases, always test normalization changes thoroughly. Restructuring tables can break existing queries and application code.

When to Denormalize

Sometimes breaking normalization rules is the right choice:

ScenarioReasonExample
Read-heavy analyticsReduce JOINs for reportingMaterialized views
Caching frequently accessed dataAvoid repeated computationsDenormalized totals in orders table
Write-once dataSimplicity over normalizationLog tables, event data
Performance-critical pathsSub-millisecond latency requiredUser profile with embedded address
-- Controlled denormalization: add computed column for fast reads
ALTER TABLE orders
ADD COLUMN total_amount DECIMAL(12,2)
GENERATED ALWAYS AS (
    (SELECT SUM(quantity * unit_price) FROM order_items WHERE order_id = orders.order_id)
) STORED;

-- Use a materialized view for analytical queries
CREATE MATERIALIZED VIEW customer_order_summary AS
SELECT
    c.customer_id,
    c.name,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total_amount) AS lifetime_value,
    MAX(o.order_date) AS last_order_date
FROM customers_norm c
LEFT JOIN orders_norm o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

Key Takeaways

  1. Normalization reduces redundancy β€” Each fact is stored in exactly one place
  2. Follow normal forms sequentially β€” 1NF β†’ 2NF β†’ 3NF β†’ BCNF for most applications
  3. 3NF is the practical sweet spot β€” Handles most real-world scenarios without over-complication
  4. Denormalize strategically β€” Use it as a performance optimization, not a default
  5. Document design decisions β€” Future developers need to understand your normalization choices

Quick Reference Card

TermDefinition
Functional DependencyColumn A determines column B (A β†’ B)
Partial DependencyNon-key column depends on part of a composite key
Transitive DependencyNon-key column depends on another non-key column
SuperkeySet of columns that uniquely identifies a row
Candidate KeyMinimal superkey (no redundant columns)
Surrogate KeyArtificial primary key (auto-increment, UUID)
Natural KeyBusiness-meaningful primary key (email, SSN)
⭐

Premium Content

Database Normalization

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert SQL Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement