String Functions
Transform, extract, and manipulate text data with SQL string functions.
- UPPER / LOWER β change case
- SUBSTRING / LENGTH β extract and measure String functions turn raw text into useful information.
What Are String Functions?
DfString Functions
String functions operate on character data, allowing you to transform, extract, concatenate, and analyze text values in SQL queries.
-- Basic string function usage
SELECT UPPER(first_name) AS upper_name,
LOWER(last_name) AS lower_name,
LENGTH(first_name) AS name_length
FROM employees;
UPPER and LOWER
DfUPPER and LOWER
UPPER converts all characters to uppercase. LOWER converts all characters to lowercase. Both are useful for case-insensitive comparisons.
-- Standardize email formats
SELECT LOWER(email) AS standardized_email
FROM customers;
-- Case-insensitive search
SELECT * FROM employees
WHERE UPPER(department) = UPPER('engineering');
-- Display formatting
SELECT CONCAT(UPPER(LEFT(first_name, 1)), LOWER(SUBSTRING(first_name, 2))) AS formatted_name
FROM employees;
Always use LOWER or UPPER when comparing strings that might have inconsistent casing. This prevents case-sensitive mismatch issues.
LENGTH
DfLENGTH
LENGTH returns the number of characters in a string. Some databases use LEN instead.
-- Find long employee names
SELECT first_name, last_name,
LENGTH(first_name) + LENGTH(last_name) AS total_length
FROM employees
WHERE LENGTH(first_name) + LENGTH(last_name) > 15;
-- Validate data quality
SELECT email,
LENGTH(email) AS email_length
FROM customers
WHERE LENGTH(email) < 5;
SUBSTRING
DfSUBSTRING
SUBSTRING extracts a portion of a string. Syntax: SUBSTRING(string, start, length).
-- Extract area codes from phone numbers
SELECT phone_number,
SUBSTRING(phone_number, 1, 3) AS area_code
FROM employees;
-- Get first initial
SELECT CONCAT(SUBSTRING(first_name, 1, 1), '.', SUBSTRING(last_name, 1, 1)) AS initials
FROM employees;
-- Extract domain from email
SELECT email,
SUBSTRING(email, LOCATE('@', email) + 1) AS domain
FROM customers;
CONCAT
DfCONCAT
CONCAT joins two or more strings together. Some databases use the || operator instead.
-- Full name from parts
SELECT CONCAT(first_name, ' ', last_name) AS full_name
FROM employees;
-- Build formatted output
SELECT CONCAT(department, ' - ', job_title, ' (', employee_id, ')') AS employee_info
FROM employees;
-- CONCAT_WS for delimited strings
SELECT CONCAT_WS(', ', first_name, last_name, city) AS contact_info
FROM employees;
CONCAT_WS (concatenate with separator) is cleaner than CONCAT when you need a consistent delimiter between values.
TRIM, LTRIM, RTRIM
DfTRIM Functions
TRIM removes leading and trailing whitespace. LTRIM removes only leading spaces; RTRIM removes only trailing spaces.
-- Clean imported data
SELECT TRIM(first_name) AS cleaned_name,
LENGTH(TRIM(first_name)) AS cleaned_length,
LENGTH(first_name) AS original_length
FROM employees
WHERE first_name != TRIM(first_name);
-- Remove specific characters
SELECT TRIM(BOTH '0' FROM product_code) AS trimmed_code
FROM products;
REPLACE
DfREPLACE
REPLACE searches for a substring and replaces it with another string. Syntax: REPLACE(string, search, replacement).
-- Clean phone number formatting
SELECT phone_number,
REPLACE(REPLACE(phone_number, '-', ''), ' ', '') AS clean_phone
FROM employees;
-- Redact sensitive data
SELECT CONCAT(LEFT(email, 2), '***@', SUBSTRING(email, LOCATE('@', email) + 1)) AS masked_email
FROM customers;
LEFT and RIGHT
DfLEFT and RIGHT
LEFT extracts the first N characters from a string. RIGHT extracts the last N characters.
-- Extract first 3 characters
SELECT LEFT(department, 3) AS dept_code
FROM employees;
-- Extract file extension
SELECT file_name,
RIGHT(file_name, LENGTH(file_name) - LOCATE('.', REVERSE(file_name))) AS extension
FROM uploaded_files;
CHAR_LENGTH vs OCTET_LENGTH
-- Character length vs byte length
SELECT first_name,
CHAR_LENGTH(first_name) AS char_count,
OCTET_LENGTH(first_name) AS byte_count
FROM employees;
Use CHAR_LENGTH (or LENGTH in some databases) for character count. OCTET_LENGTH or DATALENGTH gives byte count, important for multi-byte encodings like UTF-8.
String Functions with Aggregates
-- Most common department names
SELECT department, COUNT(*) AS cnt
FROM employees
GROUP BY UPPER(TRIM(department))
ORDER BY cnt DESC;
-- String length statistics
SELECT department,
AVG(LENGTH(first_name)) AS avg_name_length,
MIN(LENGTH(first_name)) AS shortest_name,
MAX(LENGTH(first_name)) AS longest_name
FROM employees
GROUP BY department;
Pattern Matching with LIKE
-- Names starting with 'J'
SELECT * FROM employees
WHERE first_name LIKE 'J%';
-- Emails containing a specific domain
SELECT * FROM customers
WHERE email LIKE '%@gmail.com';
-- Phone numbers with specific pattern
SELECT * FROM employees
WHERE phone_number LIKE '___-___-____';
Real-World String Transformations
-- Generate username from name
SELECT CONCAT(LOWER(SUBSTRING(first_name, 1, 1)), LOWER(last_name)) AS username
FROM employees;
-- Parse full name into components
SELECT SUBSTRING_INDEX(full_name, ' ', 1) AS first_name,
SUBSTRING_INDEX(full_name, ' ', -1) AS last_name
FROM customers;
-- Format currency strings
SELECT CONCAT('$', FORMAT(salary, 2)) AS formatted_salary
FROM employees;
String functions are often used in SELECT clauses for display formatting. For WHERE clause filters, consider using indexes on the raw columns instead.
Common String Function Reference
Key Takeaways
- UPPER and LOWER ensure case-insensitive comparisons.
- SUBSTRING extracts specific portions of text.
- CONCAT joins strings; CONCAT_WS adds separators.
- TRIM removes unwanted whitespace from data.
- REPLACE cleans and transforms text patterns.
- LENGTH validates data quality and text size.
| Function | Syntax | Purpose |
|---|---|---|
| UPPER | UPPER(string) | Convert to uppercase |
| LOWER | LOWER(string) | Convert to lowercase |
| LENGTH | LENGTH(string) | Get character count |
| SUBSTRING | SUBSTRING(string, start, len) | Extract portion |
| CONCAT | CONCAT(str1, str2, ...) | Join strings |
| TRIM | TRIM(string) | Remove whitespace |
| REPLACE | REPLACE(string, search, repl) | Substitute text |
| LEFT | LEFT(string, n) | First N characters |
| RIGHT | RIGHT(string, n) | Last N characters |
| LOCATE | LOCATE(sub, string) | Find position |