SQL Aggregation

String Functions

Transform, extract, and manipulate text data with SQL string functions.

UPPER / LOWER — change case
SUBSTRING / LENGTH — extract and measure String functions turn raw text into useful information.

What Are String Functions?

DfString Functions

String functions operate on character data, allowing you to transform, extract, concatenate, and analyze text values in SQL queries.

-- Basic string function usage
SELECT UPPER(first_name) AS upper_name,
       LOWER(last_name) AS lower_name,
       LENGTH(first_name) AS name_length
FROM employees;

UPPER and LOWER

DfUPPER and LOWER

UPPER converts all characters to uppercase. LOWER converts all characters to lowercase. Both are useful for case-insensitive comparisons.

-- Standardize email formats
SELECT LOWER(email) AS standardized_email
FROM customers;

-- Case-insensitive search
SELECT * FROM employees
WHERE UPPER(department) = UPPER('engineering');

-- Display formatting
SELECT CONCAT(UPPER(LEFT(first_name, 1)), LOWER(SUBSTRING(first_name, 2))) AS formatted_name
FROM employees;

Always use LOWER or UPPER when comparing strings that might have inconsistent casing. This prevents case-sensitive mismatch issues.

LENGTH

DfLENGTH

LENGTH returns the number of characters in a string. Some databases use LEN instead.

-- Find long employee names
SELECT first_name, last_name,
       LENGTH(first_name) + LENGTH(last_name) AS total_length
FROM employees
WHERE LENGTH(first_name) + LENGTH(last_name) > 15;

-- Validate data quality
SELECT email,
       LENGTH(email) AS email_length
FROM customers
WHERE LENGTH(email) < 5;

SUBSTRING

DfSUBSTRING

SUBSTRING extracts a portion of a string. Syntax: SUBSTRING(string, start, length).

-- Extract area codes from phone numbers
SELECT phone_number,
       SUBSTRING(phone_number, 1, 3) AS area_code
FROM employees;

-- Get first initial
SELECT CONCAT(SUBSTRING(first_name, 1, 1), '.', SUBSTRING(last_name, 1, 1)) AS initials
FROM employees;

-- Extract domain from email
SELECT email,
       SUBSTRING(email, LOCATE('@', email) + 1) AS domain
FROM customers;

CONCAT

DfCONCAT

CONCAT joins two or more strings together. Some databases use the || operator instead.

-- Full name from parts
SELECT CONCAT(first_name, ' ', last_name) AS full_name
FROM employees;

-- Build formatted output
SELECT CONCAT(department, ' - ', job_title, ' (', employee_id, ')') AS employee_info
FROM employees;

-- CONCAT_WS for delimited strings
SELECT CONCAT_WS(', ', first_name, last_name, city) AS contact_info
FROM employees;

CONCAT_WS (concatenate with separator) is cleaner than CONCAT when you need a consistent delimiter between values.

TRIM, LTRIM, RTRIM

DfTRIM Functions

TRIM removes leading and trailing whitespace. LTRIM removes only leading spaces; RTRIM removes only trailing spaces.

-- Clean imported data
SELECT TRIM(first_name) AS cleaned_name,
       LENGTH(TRIM(first_name)) AS cleaned_length,
       LENGTH(first_name) AS original_length
FROM employees
WHERE first_name != TRIM(first_name);

-- Remove specific characters
SELECT TRIM(BOTH '0' FROM product_code) AS trimmed_code
FROM products;

REPLACE

DfREPLACE

REPLACE searches for a substring and replaces it with another string. Syntax: REPLACE(string, search, replacement).

-- Clean phone number formatting
SELECT phone_number,
       REPLACE(REPLACE(phone_number, '-', ''), ' ', '') AS clean_phone
FROM employees;

-- Redact sensitive data
SELECT CONCAT(LEFT(email, 2), '***@', SUBSTRING(email, LOCATE('@', email) + 1)) AS masked_email
FROM customers;

LEFT and RIGHT

DfLEFT and RIGHT

LEFT extracts the first N characters from a string. RIGHT extracts the last N characters.

-- Extract first 3 characters
SELECT LEFT(department, 3) AS dept_code
FROM employees;

-- Extract file extension
SELECT file_name,
       RIGHT(file_name, LENGTH(file_name) - LOCATE('.', REVERSE(file_name))) AS extension
FROM uploaded_files;

CHAR_LENGTH vs OCTET_LENGTH

-- Character length vs byte length
SELECT first_name,
       CHAR_LENGTH(first_name) AS char_count,
       OCTET_LENGTH(first_name) AS byte_count
FROM employees;

Use CHAR_LENGTH (or LENGTH in some databases) for character count. OCTET_LENGTH or DATALENGTH gives byte count, important for multi-byte encodings like UTF-8.