πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Topic: Regex and Text Processing in SQL for FAANG Interviews

SQL AdvancedRegex and Text Processing⭐ Premium

Advertisement

πŸ” Regex & Text Processing

Meta & Apple Interview Deep Dive

🏒 Meta🏒 Apple⚑ Difficulty: Medium⏱️ 30 min

πŸ“‹ Interview Question

β„ΉοΈπŸ”΄ Meta/Apple Interview Question

"Given a products table with product names and descriptions, write queries to: 1) Find products matching complex patterns using regex, 2) Extract specific parts from text, 3) Perform full-text search with ranking, 4) Clean and transform text data."

Companies: Meta, Apple | Difficulty: Medium | Time: 30 minutes

πŸ“Š Setup: Products Table

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(200),
    description TEXT,
    sku VARCHAR(50),
    tags TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

INSERT INTO products (product_name, description, sku, tags) VALUES
('iPhone 15 Pro', 'Latest Apple smartphone with A17 Pro chip, titanium design, and 48MP camera system', 'APL-IP15P-256', 'apple, smartphone, premium'),
('Samsung Galaxy S24', 'Android flagship with AI features, 200MP camera, and Snapdragon 8 Gen 3', 'SAM-GS24-128', 'samsung, smartphone, android'),
('MacBook Pro 14"', 'Professional laptop with M3 Pro chip, 18GB RAM, stunning Liquid Retina XDR display', 'APL-MBP14-512', 'apple, laptop, professional'),
('Dell XPS 15', 'Premium Windows laptop with Intel i7, 16GB RAM, OLED display option', 'DEL-XPS15-256', 'dell, laptop, windows'),
('Sony WH-1000XM5', 'Industry-leading noise canceling headphones with 30-hour battery', 'SNY-WH1000-BLK', 'sony, headphones, wireless'),
('AirPods Pro 2', 'Active noise cancellation, adaptive transparency, personalized spatial audio', 'APL-APP2-USB', 'apple, headphones, wireless'),
('iPad Air M2', 'Powerful tablet with M2 chip, 11-inch display, perfect for creativity', 'APL-IPADAM2-64', 'apple, tablet, creative'),
('Nintendo Switch OLED', 'Hybrid gaming console with vibrant 7-inch OLED screen', 'NIN-SW-OLED', 'nintendo, gaming, console'),
('PS5 DualSense Controller', 'Wireless controller with haptic feedback and adaptive triggers', 'SNY-DS-WHITE', 'sony, gaming, controller'),
('Logitech MX Master 3S', 'Ergonomic wireless mouse with MagSpeed scroll wheel and USB-C charging', 'LOG-MXM3S-BLK', 'logitech, mouse, ergonomic');

πŸ”’ Part 1: Regular Expressions in SQL

β„ΉοΈπŸ” PostgreSQL Regex Operators

  • ~ Case-sensitive match
  • ~* Case-insensitive match
  • !~ Case-sensitive not match
  • !~* Case-insensitive not match
  • SIMILAR TO SQL standard regex

Basic Pattern Matching

-- Find products with "Pro" in name (case-insensitive)
SELECT product_name
FROM products
WHERE product_name ~* 'pro';

-- Find products starting with specific brand
SELECT product_name, sku
FROM products
WHERE product_name ~* '^(apple|samsung|sony)';

-- Find products with numbers in SKU
SELECT product_name, sku
FROM products
WHERE sku ~ '\d{3}';

-- Find email-like patterns in descriptions
SELECT product_name, description
FROM products
WHERE description ~ '[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z]+';

Complex Regex Patterns

-- Extract model numbers from SKU (APL-XXX-NNN pattern)
SELECT
    product_name,
    sku,
    (regexp_match(sku, '^[A-Z]+-([A-Z0-9]+)-'))[1] AS model_code
FROM products;

-- Find products with specific price patterns
-- Assuming description contains price info
SELECT product_name, description
FROM products
WHERE description ~ '\$\d+(\.\d{2})?';

-- Extract words matching specific patterns
SELECT
    product_name,
    regexp_matches(product_name, '(Pro|Max|Ultra|Plus)', 'gi') AS suffixes
FROM products
WHERE product_name ~ '(Pro|Max|Ultra|Plus)';

Regex Replace

-- Clean product names
SELECT
    product_name,
    regexp_replace(product_name, '\s+', ' ', 'g') AS cleaned_name,
    regexp_replace(product_name, '[^a-zA-Z0-9\s]', '', 'g') AS alphanumeric_only
FROM products;

-- Extract and transform
SELECT
    product_name,
    regexp_replace(
        regexp_replace(product_name, '(\d+)"', '\1 inch'),
        '"', ''
    ) AS formatted_name
FROM products
WHERE product_name ~ '\d+"';

-- Standardize SKU format
SELECT
    product_name,
    sku,
    regexp_replace(sku, '-', '', 'g') AS sku_no_dashes,
    LOWER(regexp_replace(sku, '^([A-Z]+)', '\1')) AS sku_lower_prefix
FROM products;

πŸ” Part 2: Full-Text Search

β„ΉοΈπŸ” Full-Text Search

PostgreSQL provides powerful full-text search with:

  • tsvector: Document representation
  • tsquery: Search query representation
  • Ranking: Relevance scoring
  • Highlighting: Match highlighting

Basic Full-Text Search

-- Add tsvector column for full-text search
ALTER TABLE products ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
    to_tsvector('english',
        coalesce(product_name, '') || ' ' ||
        coalesce(description, '') || ' ' ||
        coalesce(tags, '')
    )
) STORED;

-- Create GIN index for fast search
CREATE INDEX idx_products_search ON products USING gin(search_vector);

-- Simple search
SELECT product_name, description
FROM products
WHERE search_vector @@ to_tsquery('english', 'apple & phone');

-- Search with ranking
SELECT
    product_name,
    ts_rank(search_vector, query) AS rank,
    ts_headline('english', description, query) AS highlighted
FROM products,
    to_tsquery('english', 'wireless & headphones') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

Advanced Search Queries

-- Boolean search with AND, OR, NOT
SELECT product_name
FROM products
WHERE search_vector @@ to_tsquery('english',
    '(apple | samsung) & smartphone & !used'
);

-- Prefix matching
SELECT product_name
FROM products
WHERE search_vector @@ to_tsquery('english',
    'phon:*'  -- Matches phone, phones, phonetic, etc.
);

-- Phrase matching
SELECT product_name
FROM products
WHERE search_vector @@ plainto_tsquery('english',
    'noise canceling headphones'
);

-- Weighted ranking
SELECT
    product_name,
    ts_rank_cd(search_vector, query) AS rank_cd,
    ts_rank(search_vector, query) AS rank
FROM products,
    to_tsquery('english', 'pro & laptop') AS query
WHERE search_vector @@ query
ORDER BY rank_cd DESC;

πŸ“ Part 3: Text Processing Functions

String Manipulation

-- Extract patterns from text
SELECT
    product_name,
    -- Extract all numbers
    (regexp_matches(product_name, '\d+', 'g'))[1] AS first_number,
    -- Extract words after specific keyword
    (regexp_match(description, 'with\s+([\w\s]+?)(?:,|\.|$)'))[1] AS feature_after_with
FROM products;

-- Split text into arrays
SELECT
    product_name,
    string_to_array(tags, ',') AS tag_array,
    array_length(string_to_array(tags, ','), 1) AS tag_count
FROM products;

-- Array operations
SELECT
    product_name,
    tags,
    unnest(string_to_array(tags, ',')) AS individual_tag
FROM products;

Text Cleaning

-- Remove special characters
SELECT
    product_name,
    regexp_replace(product_name, '[^a-zA-Z0-9\s]', '', 'g') AS cleaned,
    initcap(lower(product_name)) AS title_case,
    upper(left(product_name, 3)) AS abbreviation
FROM products;

-- Normalize whitespace
SELECT
    product_name,
    regexp_replace(product_name, '\s+', ' ', 'g') AS normalized,
    trim(product_name) AS trimmed,
    ltrim(rtrim(product_name)) AS fully_trimmed
FROM products;

-- Extract and format
SELECT
    product_name,
    CASE
        WHEN product_name ~ '\d+"' THEN
            regexp_replace(product_name, '(\d+)"', '\1 inches')
        ELSE product_name
    END AS formatted_name
FROM products;

πŸ“Š Part 4: Pattern-Based Analytics

-- Analyze product categories from tags
WITH tag_stats AS (
    SELECT
        product_name,
        unnest(string_to_array(tags, ',')) AS tag
    FROM products
)
SELECT
    tag,
    COUNT(*) AS product_count,
    ARRAY_AGG(product_name) AS products
FROM tag_stats
GROUP BY tag
ORDER BY product_count DESC;

-- Extract brand patterns from SKU
SELECT
    product_name,
    sku,
    CASE
        WHEN sku ~ '^APL' THEN 'Apple'
        WHEN sku ~ '^SAM' THEN 'Samsung'
        WHEN sku ~ '^SONY|^SNY' THEN 'Sony'
        WHEN sku ~ '^DEL' THEN 'Dell'
        WHEN sku ~ '^LOG' THEN 'Logitech'
        WHEN sku ~ '^NIN' THEN 'Nintendo'
        ELSE 'Other'
    END AS brand
FROM products;

🎯 Quiz Section

πŸ† Best Practices for Interviews

πŸ’‘βœ… Text Processing Best Practices

1. Use Appropriate Indexes:

-- For full-text search
CREATE INDEX idx_search ON table USING gin(search_vector);

-- For trigram similarity
CREATE INDEX idx_trgm ON table USING gin(column gin_trgm_ops);

-- For specific patterns
CREATE INDEX idx_pattern ON table ((column ~ 'pattern'));

2. Handle NULLs:

-- Use COALESCE for safe concatenation
to_tsvector('english',
    coalesce(col1, '') || ' ' || coalesce(col2, '')
)

3. Consider Language:

-- Use correct language for stemming
to_tsvector('english', text)  -- English stemming
to_tsvector('simple', text)   -- No stemming

4. Test Regex Performance:

-- Regex can be slow on large datasets
-- Consider trigram indexes for LIKE queries
-- Use materialized columns for complex extractions

5. Validate Patterns:

-- Test regex patterns thoroughly
SELECT 'test string' ~ 'your_pattern';
-- Use online regex testers to verify patterns

⚠️⚠️ Common Pitfalls

  1. Regex performance: Complex patterns can be slow without indexes
  2. Case sensitivity: Remember ~ vs ~* difference
  3. NULL handling: Regex on NULL returns NULL
  4. Encoding issues: Ensure proper character encoding
  5. Over-indexing: GIN indexes can be large

Advertisement