Cortex AI: LLM Integration, Cortex Functions & AI Features
Architecture Diagram 1: Cortex AI Architecture
Architecture Diagram 2: LLM Integration Patterns
Architecture Diagram 3: ML Model Deployment
Architecture Diagram 4: Cortex AI RAG Pipeline
Snowflake Cortex is Snowflake's AI/ML service providing built-in LLM inference, fine-tuning, and embeddings directly within Snowflake. It eliminates the need to move data to external AI platforms by executing ML workloads on Snowflake's managed infrastructure.
Snowflake Copilot is a natural language interface that translates English questions into SQL queries. It uses Cortex LLMs to understand intent, generate SQL, and explain results, enabling non-technical users to query data conversationally.
Embedding Model Cost
- Cost per embedding: ~$0.0001 per 1K tokens (Cortex Embed)
- Batch size: Up to 1000 texts per request for efficiency
- Dimension: 768-1536 depending on model choice
- Storage: ~4 KB per 768-dim embedding (float32)
- Indexing: Vector search with cosine similarity on Cortex Search
Start with smaller Cortex models (llama-3.1-8b) for cost efficiency; upgrade to larger models (llama-3.1-70b) for complex reasoning. Use batch inference for bulk processing. Combine Copilot with role-based access for governance.
- Cortex: Built-in LLM inference, fine-tuning, embeddings within Snowflake
- Copilot: Natural language -> SQL translation for non-technical users
- Cost model: Pay per token; smaller models cheaper, larger models more capable
- Security: Data never leaves Snowflake; model inference runs in Snowflake's VPC
- Use cases: Text analysis, sentiment classification, entity extraction, summarization
Detailed Explanation
What is Snowflake Cortex?
Fully managed AI service bringing ML and LLM capabilities directly to your data within Snowflake.
Cortex Function Categories
LLM Functions:
COMPLETEβ text generationTRANSLATEβ 100+ language translationSENTIMENTβ positive/negative/neutral scoringCLASSIFYβ zero-shot text classificationEXTRACTβ named entity recognition
ML Functions:
EMBEDβ text β vector embeddingsSIMILARITYβ cosine similarity between vectors
LLM Integration Patterns
| Pattern | Components | Use Case |
|---|---|---|
| Text Generation | COMPLETE | Summarization, content creation |
| RAG | EMBED + SIMILARITY + COMPLETE | Knowledge base, Q&A |
| Classification | CLASSIFY | Content moderation, routing |
| Entity Extraction | EXTRACT | Data enrichment |
Embedding and Vector Search
EMBEDconverts text β numerical vectors capturing semantic meaning- Store in Snowflake tables; search via
SIMILARITY - Enables semantic search (not just keyword matching)
- Useful for search, recommendations, duplicate detection
ML Model Deployment
- Train externally β register in Snowflake Model Registry
- Invoke via SQL
PREDICTor Snowpark ML APIs - Automatic compute scaling for inference workloads
Advanced AI Patterns
- Multi-step pipelines β combine multiple Cortex functions in sequence
- AI-powered enrichment β add sentiment, translations, classifications to existing data
- Conversational AI β COMPLETE functions with chat models for chatbots
Key Takeaway: Start with smaller models (Llama-2-7B) for cost efficiency; upgrade to larger (Llama-2-70B) for complex reasoning. Use RAG for grounded, factual responses.
Key Concepts Table
| Cortex Function | Purpose | Input | Output |
|---|---|---|---|
| COMPLETE | Text generation | Prompt, model | Generated text |
| TRANSLATE | Translation | Text, languages | Translated text |
| SENTIMENT | Sentiment analysis | Text | Sentiment score |
| CLASSIFY | Text classification | Text, categories | Category label |
| EMBED | Text embedding | Text | Vector embedding |
| SIMILARITY | Vector similarity | Two vectors | Similarity score |
| Model Type | Use Case | Performance | Cost |
|---|---|---|---|
| Llama-2-7B | General purpose | Fast | Low |
| Llama-2-70B | Complex reasoning | Medium | High |
| Mistral-7B | Fast inference | Very fast | Low |
| Embedding models | Vector search | Fast | Low |
| AI Pattern | Components | Complexity | Use Case |
|---|---|---|---|
| Text Generation | COMPLETE | Low | Summarization, content creation |
| RAG | EMBED + SIMILARITY + COMPLETE | High | Knowledge base, Q&A |
| Classification | CLASSIFY | Low | Content moderation, routing |
| Entity Extraction | EXTRACT | Medium | Data enrichment |
Code Examples
LLM Text Generation
-- Example 1: Basic text generation with COMPLETE function
-- Parameters:
-- 'llama2-70b-chat': Model identifier (smaller models = lower cost, faster)
-- prompt text: The input text for the model to process
-- JSON object: Optional configuration parameters
-- temperature: 0.0-1.0 (lower = more deterministic, higher = more creative)
-- max_tokens: Maximum number of tokens to generate (controls output length)
-- top_p: Nucleus sampling parameter (0.0-1.0, controls diversity)
-- stop: List of stop sequences to halt generation
SELECT SNOWFLAKE.CORTEX.COMPLETE(
'llama2-70b-chat', -- Model name
'Summarize the following sales data: ' || sales_summary, -- Prompt text
{
'temperature': 0.7, -- Moderate creativity
'max_tokens': 500 -- Limit output to 500 tokens
}
) AS summary
FROM sales_data;
-- Example 2: Temperature comparison for same prompt
-- Lower temperature = more focused, predictable output
-- Higher temperature = more creative, varied output
SELECT
SNOWFLAKE.CORTEX.COMPLETE('llama2-70b-chat',
'Write a tagline for our product',
{'temperature': 0.1, 'max_tokens': 50}
) AS deterministic_output,
SNOWFLAKE.CORTEX.COMPLETE('llama2-70b-chat',
'Write a tagline for our product',
{'temperature': 0.9, 'max_tokens': 50}
) AS creative_output;
Text Translation and Language Operations
-- Example 3: Text translation
-- TRANSLATE parameters:
-- text: Source text to translate
-- source_lang: ISO 639-1 language code ('en', 'es', 'fr', 'de', 'ja', etc.)
-- target_lang: ISO 639-1 target language code
SELECT
product_name,
description,
SNOWFLAKE.CORTEX.TRANSLATE(
description, -- Input text
'en', -- Source language (English)
'es' -- Target language (Spanish)
) AS description_spanish,
SNOWFLAKE.CORTEX.TRANSLATE(description, 'en', 'fr') AS description_french,
SNOWFLAKE.CORTEX.TRANSLATE(description, 'en', 'de') AS description_german
FROM products;
-- Example 4: Batch translation with language detection
SELECT
feedback_id,
feedback_text,
SNOWFLAKE.CORTEX.DETECT_LANGUAGE(feedback_text) AS detected_language,
SNOWFLAKE.CORTEX.TRANSLATE(feedback_text, 'auto', 'en') AS english_translation
FROM customer_feedback
WHERE SNOWFLAKE.CORTEX.DETECT_LANGUAGE(feedback_text) != 'en';
Sentiment Analysis and Text Classification
-- Example 5: Sentiment analysis
-- SENTIMENT returns a float from -1.0 (negative) to 1.0 (positive)
-- Useful for analyzing customer feedback, reviews, support tickets
SELECT
feedback_id,
feedback_text,
SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) AS sentiment_score,
CASE
WHEN SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) > 0.5 THEN 'Positive'
WHEN SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) < -0.5 THEN 'Negative'
ELSE 'Neutral'
END AS sentiment_label
FROM customer_feedback;
-- Example 6: Text classification
-- CLASSIFY parameters:
-- text: Input text to classify
-- categories: Array of category labels (2-20 categories typical)
-- Returns the most likely category
SELECT
ticket_id,
ticket_text,
SNOWFLAKE.CORTEX.CLASSIFY(
ticket_text,
['technical_issue', 'billing', 'general_inquiry', 'complaint']
) AS ticket_category,
SNOWFLAKE.CORTEX.SENTIMENT(ticket_text) AS urgency_score
FROM support_tickets;
-- Example 7: Multi-category classification with confidence
-- For more nuanced classification, use COMPLETE with structured output
SELECT
ticket_id,
SNOWFLAKE.CORTEX.COMPLETE(
'llama2-7b-chat',
'Classify this support ticket into exactly one category: ' || ticket_text ||
'. Categories: technical, billing, general, complaint. Respond with just the category.',
{'temperature': 0.0}
) AS classified_category
FROM support_tickets;
Embeddings and Vector Search (RAG)
-- Example 8: Generate embeddings for documents
-- EMBED parameters:
-- model: Embedding model name (text-embedding-3-small or text-embedding-3-large)
-- text: Input text to embed (up to 8191 tokens)
-- Returns a FLOAT64 array (768-3072 dimensions depending on model)
-- Storage: ~4 KB per 768-dim embedding (float32), ~8 KB per 1536-dim embedding
CREATE TABLE document_embeddings AS
SELECT
document_id,
title,
content,
SNOWFLAKE.CORTEX.EMBED(
'text-embedding-3-small', -- Model: 1536 dimensions, fast
content -- Input text (chunk to ~512 tokens for best results)
) AS embedding
FROM documents;
-- Example 9: Similarity search for semantic matching
-- SIMILARITY parameters:
-- vector1: First embedding vector
-- vector2: Second embedding vector
-- Returns cosine similarity score (-1 to 1, higher = more similar)
SELECT
d.document_id,
d.title,
d.content,
SNOWFLAKE.CORTEX.SIMILARITY(
d.embedding, -- Document embedding
SNOWFLAKE.CORTEX.EMBED('text-embedding-3-small', 'search query text') -- Query embedding
) AS similarity_score
FROM document_embeddings d
ORDER BY similarity_score DESC
LIMIT 5;
-- Example 10: Full RAG (Retrieval-Augmented Generation) pipeline
-- Step 1: Embed the user query
WITH query_embedding AS (
SELECT SNOWFLAKE.CORTEX.EMBED(
'text-embedding-3-small',
'What are the key findings from Q4 sales?'
) AS embedding
),
-- Step 2: Retrieve relevant documents using vector similarity
relevant_docs AS (
SELECT
d.document_id,
d.title,
d.content,
SNOWFLAKE.CORTEX.SIMILARITY(
d.embedding,
qe.embedding
) AS similarity_score
FROM document_embeddings d, query_embedding qe
ORDER BY similarity_score DESC
LIMIT 3 -- Top 3 most relevant documents
)
-- Step 3: Generate response with retrieved context
SELECT SNOWFLAKE.CORTEX.COMPLETE(
'llama2-70b-chat',
'You are a helpful assistant. Use ONLY the following context to answer the question. ' ||
'If the context does not contain the answer, say "I don''t have enough information." ' ||
'\n\nContext:\n' || STRING_AGG(content, '\n---\n') ||
'\n\nQuestion: What are the key findings from Q4 sales?',
{'temperature': 0.3, 'max_tokens': 1000}
)
FROM relevant_docs;
Batch Processing and Data Enrichment
-- Example 11: Batch sentiment analysis and classification
-- Process all feedback records in a single query for efficiency
CREATE TABLE feedback_analysis AS
SELECT
feedback_id,
feedback_text,
SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) AS sentiment,
SNOWFLAKE.CORTEX.CLASSIFY(
feedback_text,
['product_quality', 'shipping', 'customer_service', 'price']
) AS category
FROM customer_feedback;
-- Example 12: AI-powered data enrichment with SQL function
-- Create a reusable function that combines multiple AI operations
CREATE OR REPLACE FUNCTION analyze_feedback(feedback_text VARCHAR)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
SELECT OBJECT_CONSTRUCT(
'sentiment', SNOWFLAKE.CORTEX.SENTIMENT(feedback_text),
'category', SNOWFLAKE.CORTEX.CLASSIFY(
feedback_text,
['positive', 'negative', 'neutral']
),
'key_phrases', SNOWFLAKE.CORTEX.EXTRACT(
feedback_text,
['product', 'service', 'price', 'delivery']
)
)
$$;
-- Use the function in queries
SELECT
feedback_id,
feedback_text,
analyze_feedback(feedback_text):sentiment::FLOAT AS sentiment,
analyze_feedback(feedback_text):category::VARCHAR AS category,
analyze_feedback(feedback_text):key_phrases::VARIANT AS key_phrases
FROM customer_feedback;
-- Example 13: Entity extraction for structured data enrichment
-- EXTRACT parameters:
-- text: Input text
-- entities: Array of entity types to extract
SELECT
feedback_id,
feedback_text,
SNOWFLAKE.CORTEX.EXTRACT(
feedback_text,
['person', 'organization', 'location', 'date', 'product']
) AS extracted_entities
FROM customer_feedback;
-- Example 14: Batch summarize long documents
-- COMPLETE with structured prompts for consistent output
SELECT
document_id,
title,
SNOWFLAKE.CORTEX.COMPLETE(
'llama2-70b-chat',
'Summarize this document in exactly 3 bullet points:\n\n' || content,
{'temperature': 0.2, 'max_tokens': 200}
) AS summary
FROM long_documents
WHERE LENGTH(content) > 1000;
Python Integration
# Example 15: Cortex AI with Snowpark Python
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col, lit, call_builtin
session = Session.builder.configs({...}).create()
# Use Cortex functions via SQL
result = session.sql("""
SELECT
feedback_id,
SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) as sentiment,
SNOWFLAKE.CORTEX.CLASSIFY(
feedback_text,
['positive', 'negative', 'neutral']
) as category
FROM customer_feedback
""").collect()
# Batch processing with Snowpark DataFrame
df = session.table("documents")
# Add sentiment column using SQL expression
enriched_df = df.with_column(
"sentiment",
call_builtin("SNOWFLAKE.CORTEX.SENTIMENT", col("content"))
)
enriched_df.write.mode("overwrite").save_as_table("documents_analyzed")
Performance Metrics
| Metric | Target | Warning | Critical |
|---|---|---|---|
| LLM Response Time | < 2s | 2-5s | > 5s |
| Embedding Generation | < 100ms | 100-500ms | > 500ms |
| Similarity Search | < 50ms | 50-200ms | > 200ms |
| Classification Accuracy | > 85% | 70-85% | < 70% |
| Batch Processing Rate | > 1000 rows/s | 500-1000 rows/s | < 500 rows/s |
Best Practices
-
Choose appropriate models: Use smaller models (Llama-2-7B, Mistral-7B) for simple tasks and larger models (Llama-2-70B) for complex reasoning.
-
Optimize prompts: Craft clear, specific prompts to improve response quality. Include examples and constraints when needed.
-
Implement caching: Cache embeddings and LLM responses to reduce costs and improve latency for repeated queries.
-
Monitor model performance: Track accuracy, latency, and cost metrics. Set up alerts for performance degradation.
-
Use RAG for accuracy: Combine vector search with LLM generation to ground responses in factual data and reduce hallucinations.
-
Batch process when possible: Use batch inference for large datasets to optimize costs and throughput.
-
Implement fallback strategies: Design systems to handle LLM failures gracefully, with fallback to simpler models or rule-based systems.
-
Monitor costs: Cortex functions consume compute resources. Track usage and optimize expensive operations.
-
Secure sensitive data: Be careful with prompts containing sensitive information. Consider data masking before LLM processing.
-
Test thoroughly: Validate AI outputs for accuracy and bias. Implement human review for critical decisions.
Additional Theory: RAG Architecture Patterns
Retrieval-Augmented Generation (RAG) combines vector search with LLM generation to produce grounded, factual responses. The key insight is that LLMs have a knowledge cutoff and can hallucinate β RAG grounds responses in your actual data.
RAG pipeline stages:
- Ingestion: Chunk documents, generate embeddings, store in vector table
- Retrieval: Embed user query, find top-K similar documents via cosine similarity
- Generation: Pass retrieved context + query to LLM for response generation
Chunking strategies:
| Strategy | Chunk Size | Overlap | Best For |
|---|---|---|---|
| Fixed-size | 512 tokens | 50 tokens | General documents |
| Semantic | Variable | None | Well-structured content |
| Recursive | 1024 tokens | 100 tokens | Long documents |
| Sentence | 1-3 sentences | None | Q&A systems |
Embedding model selection:
text-embedding-3-small: 1536 dimensions, fast, cost-effective (default)text-embedding-3-large: 3072 dimensions, higher accuracy, slower
Additional Theory: LLM Temperature and Parameters
| Parameter | Range | Effect | Use Case |
|---|---|---|---|
| temperature | 0.0-1.0 | Lower = deterministic, Higher = creative | 0.1 for factual, 0.7 for creative |
| max_tokens | 1-4096 | Limits output length | Match to expected response size |
| top_p | 0.0-1.0 | Nucleus sampling threshold | 0.9 for balanced diversity |
| stop | Array | Stop sequences | ["\n\n"] to stop at paragraph |
See Also
- 11-Snowpark-Python - Snowpark ML model serving
- 06-External-Tables-Integration - Loading documents for embedding
- 14-Monitoring-Queries - Monitoring AI function costs and performance
- PySpark Iceberg - ML integration patterns
- Delta Lake on Databricks - Delta Lake ML integration
- Data Warehouse Concepts - Data warehouse design principles