πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Snowflake External Functions

🟒 Free Lesson

Advertisement

Snowflake External Functions

External functions in Snowflake allow you to call custom code running as serverless functions in cloud providers, extending Snowflake's capabilities with custom APIs and third-party services.

External Function ArchitectureSnowflakeSQL QueryAPI GatewayREST EndpointLambdaServerless CodeExternal APIThird-partyResultsBatch RowsMulti-Cloud SupportAWS Lambda, Azure Functions, GCPBatch ProcessingRow batching for efficiencyError HandlingNULL on failure

What are External Functions?

  • Execute code in a remote cloud service (AWS Lambda, Azure Functions, GCP Cloud Functions)
  • Return results directly to Snowflake queries
  • Process data in batches for efficiency

Architecture Overview

External functions in Snowflake follow this architecture:

  1. Snowflake Side β€” SQL Query Processing β†’ API Integration Layer β†’ CREATE EXTERNAL FUNCTION
  2. HTTP/S Connection β€” Snowflake sends batched row data to the cloud provider
  3. Cloud Provider Side β€” API Gateway / Load Balancer β†’ Lambda functions (1..N) β†’ Custom Business Logic (APIs, ML Models, Transforms)

Key Concepts

DfExternal Function

DfBatching

API Integration Setup

AWS API Gateway + Lambda

-- Create API integration
CREATE OR REPLACE API INTEGRATION aws_api_integration
  API_PROVIDER = aws_api_gateway
  API_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/snowflake-api-role'
  API_ALLOWED_PREFIXES = ('https://abc123.execute-api.us-east-1.amazonaws.com/prod')
  ENABLED = TRUE;

-- Check integration status
DESC INTEGRATION aws_api_integration;

Azure API Management + Functions

CREATE OR REPLACE API INTEGRATION azure_api_integration
  API_PROVIDER = azure_api_management
  AZURE_TENANT_ID = 'your-tenant-id'
  AZURE_AD_APPLICATION_ID = 'your-app-id'
  API_ALLOWED_PREFIXES = ('https://myapim.azure-api.net/')
  ENABLED = TRUE;

GCP API Gateway + Cloud Functions

CREATE OR REPLACE API INTEGRATION gcp_api_integration
  API_PROVIDER = google_cloud
  GCP_SERVICE_ACCOUNT = 'snowflake@my-project.iam.gserviceaccount.com'
  API_ALLOWED_PREFIXES = ('https://my-gateway-abc123.uc.r.appspot.com/')
  ENABLED = TRUE;

Creating External Functions

Simple External Function

CREATE OR REPLACE EXTERNAL FUNCTION sentiment_analysis(text VARCHAR)
  RETURNS VARCHAR
  API_INTEGRATION = aws_api_integration
  MAX_BATCH_ROWS = 100
  AS 'https://abc123.execute-api.us-east-1.amazonaws.com/prod/sentiment';

Complex External Function with Multiple Returns

CREATE OR REPLACE EXTERNAL FUNCTION translate_text(
  input_text VARCHAR,
  source_lang VARCHAR,
  target_lang VARCHAR
)
  RETURNS TABLE (translated_text VARCHAR, confidence DECIMAL(5,4))
  API_INTEGRATION = azure_api_integration
  MAX_BATCH_ROWS = 50
  AS 'https://myapim.azure-api.net/translate';

Using External Functions

-- Simple call
SELECT
  customer_feedback,
  sentiment_analysis(customer_feedback) as sentiment
FROM feedback_table;

-- Table function call
SELECT
  f.translated_text,
  f.confidence
FROM my_table,
LATERAL TABLE(translate_text(text, 'en', 'es')) f;

Lambda Function Example (Python)

import json
import boto3
from datetime import datetime

def lambda_handler(event, context):
    """
    Snowflake External Function Handler
    Receives batch of rows and returns results
    """
    # Parse Snowflake request
    rows = event['data']
    results = []
    
    for row in rows:
        row_number = row[0]
        input_data = row[1:]
        
        # Custom processing logic
        result = process_data(input_data)
        
        results.append({
            'row_number': row_number,
            'data': [result]
        })
    
    return {'data': results}

def process_data(data):
    # Custom business logic here
    # Example: sentiment analysis, translation, etc.
    return {"processed": True, "timestamp": datetime.utcnow().isoformat()}

Performance Considerations

External function performance depends on API response time, network latency, and batch size. Optimal batch sizes typically range from 100-500 rows. Monitor function duration to avoid timeouts (default 10 minutes).

Batching Strategy

Batch SizeLatencyThroughputUse Case
10-50LowModerateReal-time APIs
100-250MediumHighBatch processing
500-1000HighMaximumBulk transformations

Error Handling

-- Create function with error handling
CREATE OR REPLACE EXTERNAL FUNCTION safe_api_call(input VARCHAR)
  RETURNS VARCHAR
  API_INTEGRATION = aws_api_integration
  MAX_BATCH_ROWS = 100
  AS 'https://api.example.com/process';

-- Handle errors in queries
SELECT
  input_data,
  safe_api_call(input_data) as result
FROM my_table
WHERE safe_api_call(input_data) IS NOT NULL;

Monitoring and Logging

-- Check function invocations
SELECT *
FROM TABLE(INFORMATION_SCHEMA.EXTERNAL_FUNCTION_INVOCATION_HISTORY(
  START_TIME => DATEADD(hour, -24, CURRENT_TIMESTAMP()),
  END_TIME => CURRENT_TIMESTAMP()
))
ORDER BY START_TIME DESC;
  • External functions extend Snowflake with custom cloud-native code
  • API Integrations configure secure connections to cloud services
  • Batch processing optimizes performance and reduces API calls
  • Support for AWS, Azure, and GCP serverless functions
  • Monitor invocation history for performance optimization
⭐

Premium Content

Snowflake External Functions

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Snowflake Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement