π Regex & Text Processing
Meta & Apple Interview Deep Dive
π’ Metaπ’ Appleβ‘ Difficulty: Mediumβ±οΈ 30 min
π Interview Question
βΉοΈπ΄ Meta/Apple Interview Question
"Given a products table with product names and descriptions, write queries to: 1) Find products matching complex patterns using regex, 2) Extract specific parts from text, 3) Perform full-text search with ranking, 4) Clean and transform text data."
Companies: Meta, Apple | Difficulty: Medium | Time: 30 minutes
π Setup: Products Table
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR(200),
description TEXT,
sku VARCHAR(50),
tags TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO products (product_name, description, sku, tags) VALUES
('iPhone 15 Pro', 'Latest Apple smartphone with A17 Pro chip, titanium design, and 48MP camera system', 'APL-IP15P-256', 'apple, smartphone, premium'),
('Samsung Galaxy S24', 'Android flagship with AI features, 200MP camera, and Snapdragon 8 Gen 3', 'SAM-GS24-128', 'samsung, smartphone, android'),
('MacBook Pro 14"', 'Professional laptop with M3 Pro chip, 18GB RAM, stunning Liquid Retina XDR display', 'APL-MBP14-512', 'apple, laptop, professional'),
('Dell XPS 15', 'Premium Windows laptop with Intel i7, 16GB RAM, OLED display option', 'DEL-XPS15-256', 'dell, laptop, windows'),
('Sony WH-1000XM5', 'Industry-leading noise canceling headphones with 30-hour battery', 'SNY-WH1000-BLK', 'sony, headphones, wireless'),
('AirPods Pro 2', 'Active noise cancellation, adaptive transparency, personalized spatial audio', 'APL-APP2-USB', 'apple, headphones, wireless'),
('iPad Air M2', 'Powerful tablet with M2 chip, 11-inch display, perfect for creativity', 'APL-IPADAM2-64', 'apple, tablet, creative'),
('Nintendo Switch OLED', 'Hybrid gaming console with vibrant 7-inch OLED screen', 'NIN-SW-OLED', 'nintendo, gaming, console'),
('PS5 DualSense Controller', 'Wireless controller with haptic feedback and adaptive triggers', 'SNY-DS-WHITE', 'sony, gaming, controller'),
('Logitech MX Master 3S', 'Ergonomic wireless mouse with MagSpeed scroll wheel and USB-C charging', 'LOG-MXM3S-BLK', 'logitech, mouse, ergonomic');
π’ Part 1: Regular Expressions in SQL
βΉοΈπ PostgreSQL Regex Operators
~Case-sensitive match~*Case-insensitive match!~Case-sensitive not match!~*Case-insensitive not matchSIMILAR TOSQL standard regex
Basic Pattern Matching
-- Find products with "Pro" in name (case-insensitive)
SELECT product_name
FROM products
WHERE product_name ~* 'pro';
-- Find products starting with specific brand
SELECT product_name, sku
FROM products
WHERE product_name ~* '^(apple|samsung|sony)';
-- Find products with numbers in SKU
SELECT product_name, sku
FROM products
WHERE sku ~ '\d{3}';
-- Find email-like patterns in descriptions
SELECT product_name, description
FROM products
WHERE description ~ '[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z]+';
Complex Regex Patterns
-- Extract model numbers from SKU (APL-XXX-NNN pattern)
SELECT
product_name,
sku,
(regexp_match(sku, '^[A-Z]+-([A-Z0-9]+)-'))[1] AS model_code
FROM products;
-- Find products with specific price patterns
-- Assuming description contains price info
SELECT product_name, description
FROM products
WHERE description ~ '\$\d+(\.\d{2})?';
-- Extract words matching specific patterns
SELECT
product_name,
regexp_matches(product_name, '(Pro|Max|Ultra|Plus)', 'gi') AS suffixes
FROM products
WHERE product_name ~ '(Pro|Max|Ultra|Plus)';
Regex Replace
-- Clean product names
SELECT
product_name,
regexp_replace(product_name, '\s+', ' ', 'g') AS cleaned_name,
regexp_replace(product_name, '[^a-zA-Z0-9\s]', '', 'g') AS alphanumeric_only
FROM products;
-- Extract and transform
SELECT
product_name,
regexp_replace(
regexp_replace(product_name, '(\d+)"', '\1 inch'),
'"', ''
) AS formatted_name
FROM products
WHERE product_name ~ '\d+"';
-- Standardize SKU format
SELECT
product_name,
sku,
regexp_replace(sku, '-', '', 'g') AS sku_no_dashes,
LOWER(regexp_replace(sku, '^([A-Z]+)', '\1')) AS sku_lower_prefix
FROM products;
π Part 2: Full-Text Search
βΉοΈπ Full-Text Search
PostgreSQL provides powerful full-text search with:
- tsvector: Document representation
- tsquery: Search query representation
- Ranking: Relevance scoring
- Highlighting: Match highlighting
Basic Full-Text Search
-- Add tsvector column for full-text search
ALTER TABLE products ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
to_tsvector('english',
coalesce(product_name, '') || ' ' ||
coalesce(description, '') || ' ' ||
coalesce(tags, '')
)
) STORED;
-- Create GIN index for fast search
CREATE INDEX idx_products_search ON products USING gin(search_vector);
-- Simple search
SELECT product_name, description
FROM products
WHERE search_vector @@ to_tsquery('english', 'apple & phone');
-- Search with ranking
SELECT
product_name,
ts_rank(search_vector, query) AS rank,
ts_headline('english', description, query) AS highlighted
FROM products,
to_tsquery('english', 'wireless & headphones') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;
Advanced Search Queries
-- Boolean search with AND, OR, NOT
SELECT product_name
FROM products
WHERE search_vector @@ to_tsquery('english',
'(apple | samsung) & smartphone & !used'
);
-- Prefix matching
SELECT product_name
FROM products
WHERE search_vector @@ to_tsquery('english',
'phon:*' -- Matches phone, phones, phonetic, etc.
);
-- Phrase matching
SELECT product_name
FROM products
WHERE search_vector @@ plainto_tsquery('english',
'noise canceling headphones'
);
-- Weighted ranking
SELECT
product_name,
ts_rank_cd(search_vector, query) AS rank_cd,
ts_rank(search_vector, query) AS rank
FROM products,
to_tsquery('english', 'pro & laptop') AS query
WHERE search_vector @@ query
ORDER BY rank_cd DESC;
π Part 3: Text Processing Functions
String Manipulation
-- Extract patterns from text
SELECT
product_name,
-- Extract all numbers
(regexp_matches(product_name, '\d+', 'g'))[1] AS first_number,
-- Extract words after specific keyword
(regexp_match(description, 'with\s+([\w\s]+?)(?:,|\.|$)'))[1] AS feature_after_with
FROM products;
-- Split text into arrays
SELECT
product_name,
string_to_array(tags, ',') AS tag_array,
array_length(string_to_array(tags, ','), 1) AS tag_count
FROM products;
-- Array operations
SELECT
product_name,
tags,
unnest(string_to_array(tags, ',')) AS individual_tag
FROM products;
Text Cleaning
-- Remove special characters
SELECT
product_name,
regexp_replace(product_name, '[^a-zA-Z0-9\s]', '', 'g') AS cleaned,
initcap(lower(product_name)) AS title_case,
upper(left(product_name, 3)) AS abbreviation
FROM products;
-- Normalize whitespace
SELECT
product_name,
regexp_replace(product_name, '\s+', ' ', 'g') AS normalized,
trim(product_name) AS trimmed,
ltrim(rtrim(product_name)) AS fully_trimmed
FROM products;
-- Extract and format
SELECT
product_name,
CASE
WHEN product_name ~ '\d+"' THEN
regexp_replace(product_name, '(\d+)"', '\1 inches')
ELSE product_name
END AS formatted_name
FROM products;
π Part 4: Pattern-Based Analytics
-- Analyze product categories from tags
WITH tag_stats AS (
SELECT
product_name,
unnest(string_to_array(tags, ',')) AS tag
FROM products
)
SELECT
tag,
COUNT(*) AS product_count,
ARRAY_AGG(product_name) AS products
FROM tag_stats
GROUP BY tag
ORDER BY product_count DESC;
-- Extract brand patterns from SKU
SELECT
product_name,
sku,
CASE
WHEN sku ~ '^APL' THEN 'Apple'
WHEN sku ~ '^SAM' THEN 'Samsung'
WHEN sku ~ '^SONY|^SNY' THEN 'Sony'
WHEN sku ~ '^DEL' THEN 'Dell'
WHEN sku ~ '^LOG' THEN 'Logitech'
WHEN sku ~ '^NIN' THEN 'Nintendo'
ELSE 'Other'
END AS brand
FROM products;
π― Quiz Section
π Best Practices for Interviews
π‘β Text Processing Best Practices
1. Use Appropriate Indexes:
-- For full-text search
CREATE INDEX idx_search ON table USING gin(search_vector);
-- For trigram similarity
CREATE INDEX idx_trgm ON table USING gin(column gin_trgm_ops);
-- For specific patterns
CREATE INDEX idx_pattern ON table ((column ~ 'pattern'));
2. Handle NULLs:
-- Use COALESCE for safe concatenation
to_tsvector('english',
coalesce(col1, '') || ' ' || coalesce(col2, '')
)
3. Consider Language:
-- Use correct language for stemming
to_tsvector('english', text) -- English stemming
to_tsvector('simple', text) -- No stemming
4. Test Regex Performance:
-- Regex can be slow on large datasets
-- Consider trigram indexes for LIKE queries
-- Use materialized columns for complex extractions
5. Validate Patterns:
-- Test regex patterns thoroughly
SELECT 'test string' ~ 'your_pattern';
-- Use online regex testers to verify patterns
β οΈβ οΈ Common Pitfalls
- Regex performance: Complex patterns can be slow without indexes
- Case sensitivity: Remember ~ vs ~* difference
- NULL handling: Regex on NULL returns NULL
- Encoding issues: Ensure proper character encoding
- Over-indexing: GIN indexes can be large