Snowflake Advanced UDFs and UDTFs
Advanced UDFs and UDTFs in Snowflake enable custom data transformations, complex calculations, and reusable business logic that extends Snowflake's built-in capabilities.
What are UDFs and UDTFs?
- UDFs return a single value per call (scalar)
- UDTFs return multiple rows (table functions)
- Supported in SQL, JavaScript, Python, Java, and Scala
Architecture Overview
The UDF/UDTF architecture includes two main types:
User-Defined Functions (UDFs)
- Scalar UDFs β Single value in/out
- Stored UDFs β Persisted result caching
- Languages β SQL, JavaScript, Python, Java, Scala
- Usage β SELECT, WHERE, GROUP BY
- Determinism β Deterministic or non-deterministic
User-Defined Table Functions (UDTFs)
- Table Functions β Multi-row output
- Generator Functions β Row generation
- Usage β LATERAL FLATTEN, TABLE()
- Returns β Result sets; can process arrays and objects
Comparison
| Type | Output | Usage | Performance |
|---|---|---|---|
| UDF | Single value | Expressions | Fast |
| UDTF | Multiple rows | TABLE() | Medium |
| Stored Procedure | DML/DDL | Side effects | Varies |
| External Function | Cloud APIs | Network latency | Variable |
Scalar UDFs
SQL Scalar UDF
CREATE OR REPLACE FUNCTION calculate_discount(price DECIMAL(10,2), discount_pct DECIMAL(5,2))
RETURNS DECIMAL(10,2)
LANGUAGE SQL
IMMUTABLE
AS
$$
SELECT price * (1 - discount_pct / 100)
$$;
-- Usage
SELECT
product_name,
price,
calculate_discount(price, 15) as discounted_price
FROM products;
JavaScript Scalar UDF
CREATE OR REPLACE FUNCTION parse_email(email VARCHAR)
RETURNS OBJECT
LANGUAGE JAVASCRIPT
IMMUTABLE
AS
$$
const parts = email.split('@');
return {
username: parts[0],
domain: parts[1],
isValid: parts.length === 2 && parts[1].includes('.')
};
$$;
-- Usage
SELECT
email,
parse_email(email):username::STRING as username,
parse_email(email):domain::STRING as domain
FROM users;
Python Scalar UDF
CREATE OR REPLACE FUNCTION sentiment_score(text VARCHAR)
RETURNS FLOAT
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('textblob')
HANDLER = 'get_sentiment'
AS
$$
from textblob import TextBlob
def get_sentiment(text):
blob = TextBlob(text)
return blob.sentiment.polarity
$$;
User-Defined Table Functions (UDTFs)
SQL UDTF
CREATE OR REPLACE FUNCTION generate_date_range(start_date DATE, end_date DATE)
RETURNS TABLE (date DATE, day_name VARCHAR, week_number INTEGER)
LANGUAGE SQL
AS
$$
WITH date_series AS (
SELECT DATEADD(day, SEQ4(), :start_date) as date
FROM TABLE(GENERATOR(ROWCOUNT => DATEDIFF('day', :start_date, :end_date) + 1))
)
SELECT
date,
DAYNAME(date) as day_name,
WEEKOFYEAR(date) as week_number
FROM date_series
$$;
-- Usage
SELECT * FROM TABLE(generate_date_range('2024-01-01', '2024-12-31'));
JavaScript UDTF
CREATE OR REPLACE FUNCTION split_string_to_rows(input VARCHAR, delimiter VARCHAR)
RETURNS TABLE (value VARCHAR, position INTEGER)
LANGUAGE JAVASCRIPT
AS
$$
const values = INPUT_DELIMITER.split(DELIMITER);
for (let i = 0; i < values.length; i++) {
yield {VALUE: values[i], POSITION: i + 1};
}
$$
INPUT_DELIMITER => (delimiter VARCHAR, delimiter VARCHAR);
-- Usage
SELECT * FROM TABLE(split_string_to_rows('a,b,c,d', ','));
Python UDTF for Time Series
CREATE OR REPLACE FUNCTION moving_average(values ARRAY, window_size INTEGER)
RETURNS TABLE (index INTEGER, value FLOAT, moving_avg FLOAT)
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
HANDLER = 'compute_moving_avg'
AS
$$
class compute_moving_avg:
def __init__(self):
self.values = []
self.window_size = 0
def process(self, values, window_size):
self.values = values
self.window_size = window_size
for i in range(len(values)):
start = max(0, i - window_size + 1)
window_vals = values[start:i+1]
moving_avg = sum(window_vals) / len(window_vals)
yield (i, values[i], moving_avg)
$$;
Stored UDFs for Caching
CREATE OR REPLACE FUNCTION expensive_calculation(input VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
CALLED ON NULL INPUT
STABLE
AS
$$
SELECT MD5(input || CURRENT_DATE())
$$;
-- Stored UDF with caching
CREATE OR REPLACE FUNCTION cached_lookup(key VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
STABLE
AS
$$
SELECT value FROM lookup_table WHERE lookup_key = key
$$;
Use IMMUTABLE for functions with no side effects and consistent outputs. Use STABLE for functions that depend on session state. Use VOLATILE for functions with random or time-dependent outputs.
UDF Best Practices
| Aspect | Best Practice | Impact |
|---|---|---|
| Determinism | Mark as IMMUTABLE/STABLE | Enables caching |
| NULL handling | Use COALESCE or CALLED ON NULL | Prevents errors |
| Language choice | Match to complexity | Performance |
| Batch size | Process in batches | Memory efficiency |
| Error handling | Validate inputs | Reliability |
- UDFs return single values, UDTFs return result sets
- SQL, JavaScript, Python, Java, and Scala are supported
- IMMUTABLE/STABLE/VOLATILE affect caching behavior
- UDTFs are essential for row-generating transformations
- Stored UDFs cache results for repeated calls