Cortex AI: LLM Integration, Cortex Functions & AI Features

Architecture Diagram 1: Cortex AI Architecture

Architecture Diagram 2: LLM Integration Patterns

Architecture Diagram 3: ML Model Deployment

Architecture Diagram 4: Cortex AI RAG Pipeline

Snowflake Cortex is Snowflake's AI/ML service providing built-in LLM inference, fine-tuning, and embeddings directly within Snowflake. It eliminates the need to move data to external AI platforms by executing ML workloads on Snowflake's managed infrastructure.

Snowflake Copilot is a natural language interface that translates English questions into SQL queries. It uses Cortex LLMs to understand intent, generate SQL, and explain results, enabling non-technical users to query data conversationally.

Cortex LLM Cost Model

cost\_per\_query = tokens\_in \times rate\_in + tokens\_out \times rate\_out

Embedding Model Cost

Cost per embedding: ~$0.0001 per 1K tokens (Cortex Embed)
Batch size: Up to 1000 texts per request for efficiency
Dimension: 768-1536 depending on model choice
Storage: ~4 KB per 768-dim embedding (float32)
Indexing: Vector search with cosine similarity on Cortex Search

Start with smaller Cortex models (llama-3.1-8b) for cost efficiency; upgrade to larger models (llama-3.1-70b) for complex reasoning. Use batch inference for bulk processing. Combine Copilot with role-based access for governance.

Cortex: Built-in LLM inference, fine-tuning, embeddings within Snowflake
Copilot: Natural language -> SQL translation for non-technical users
Cost model: Pay per token; smaller models cheaper, larger models more capable
Security: Data never leaves Snowflake; model inference runs in Snowflake's VPC
Use cases: Text analysis, sentiment classification, entity extraction, summarization

Detailed Explanation

What is Snowflake Cortex?

Fully managed AI service bringing ML and LLM capabilities directly to your data within Snowflake.

Cortex Function Categories

LLM Functions:

COMPLETE — text generation
TRANSLATE — 100+ language translation
SENTIMENT — positive/negative/neutral scoring
CLASSIFY — zero-shot text classification
EXTRACT — named entity recognition

ML Functions:

EMBED — text → vector embeddings
SIMILARITY — cosine similarity between vectors

LLM Integration Patterns

Pattern	Components	Use Case
Text Generation	COMPLETE	Summarization, content creation
RAG	EMBED + SIMILARITY + COMPLETE	Knowledge base, Q&A
Classification	CLASSIFY	Content moderation, routing
Entity Extraction	EXTRACT	Data enrichment

Embedding and Vector Search

EMBED converts text → numerical vectors capturing semantic meaning
Store in Snowflake tables; search via SIMILARITY
Enables semantic search (not just keyword matching)
Useful for search, recommendations, duplicate detection

ML Model Deployment

Train externally → register in Snowflake Model Registry
Invoke via SQL PREDICT or Snowpark ML APIs
Automatic compute scaling for inference workloads

Advanced AI Patterns

Multi-step pipelines — combine multiple Cortex functions in sequence
AI-powered enrichment — add sentiment, translations, classifications to existing data
Conversational AI — COMPLETE functions with chat models for chatbots

Key Takeaway: Start with smaller models (Llama-2-7B) for cost efficiency; upgrade to larger (Llama-2-70B) for complex reasoning. Use RAG for grounded, factual responses.

Key Concepts Table

Cortex Function	Purpose	Input	Output
COMPLETE	Text generation	Prompt, model	Generated text
TRANSLATE	Translation	Text, languages	Translated text
SENTIMENT	Sentiment analysis	Text	Sentiment score
CLASSIFY	Text classification	Text, categories	Category label
EMBED	Text embedding	Text	Vector embedding
SIMILARITY	Vector similarity	Two vectors	Similarity score

Model Type	Use Case	Performance	Cost
Llama-2-7B	General purpose	Fast	Low
Llama-2-70B	Complex reasoning	Medium	High
Mistral-7B	Fast inference	Very fast	Low
Embedding models	Vector search	Fast	Low

AI Pattern	Components	Complexity	Use Case
Text Generation	COMPLETE	Low	Summarization, content creation
RAG	EMBED + SIMILARITY + COMPLETE	High	Knowledge base, Q&A
Classification	CLASSIFY	Low	Content moderation, routing
Entity Extraction	EXTRACT	Medium	Data enrichment

Code Examples

LLM Text Generation

-- Example 1: Basic text generation with COMPLETE function
-- Parameters:
--   'llama2-70b-chat': Model identifier (smaller models = lower cost, faster)
--   prompt text: The input text for the model to process
--   JSON object: Optional configuration parameters
--     temperature: 0.0-1.0 (lower = more deterministic, higher = more creative)
--     max_tokens: Maximum number of tokens to generate (controls output length)
--     top_p: Nucleus sampling parameter (0.0-1.0, controls diversity)
--     stop: List of stop sequences to halt generation
SELECT SNOWFLAKE.CORTEX.COMPLETE(
    'llama2-70b-chat',                                      -- Model name
    'Summarize the following sales data: ' || sales_summary, -- Prompt text
    {
        'temperature': 0.7,    -- Moderate creativity
        'max_tokens': 500      -- Limit output to 500 tokens
    }
) AS summary
FROM sales_data;

-- Example 2: Temperature comparison for same prompt
-- Lower temperature = more focused, predictable output
-- Higher temperature = more creative, varied output
SELECT 
    SNOWFLAKE.CORTEX.COMPLETE('llama2-70b-chat', 
        'Write a tagline for our product', 
        {'temperature': 0.1, 'max_tokens': 50}
    ) AS deterministic_output,
    SNOWFLAKE.CORTEX.COMPLETE('llama2-70b-chat', 
        'Write a tagline for our product', 
        {'temperature': 0.9, 'max_tokens': 50}
    ) AS creative_output;

Text Translation and Language Operations

-- Example 3: Text translation
-- TRANSLATE parameters:
--   text: Source text to translate
--   source_lang: ISO 639-1 language code ('en', 'es', 'fr', 'de', 'ja', etc.)
--   target_lang: ISO 639-1 target language code
SELECT 
    product_name,
    description,
    SNOWFLAKE.CORTEX.TRANSLATE(
        description,   -- Input text
        'en',          -- Source language (English)
        'es'           -- Target language (Spanish)
    ) AS description_spanish,
    SNOWFLAKE.CORTEX.TRANSLATE(description, 'en', 'fr') AS description_french,
    SNOWFLAKE.CORTEX.TRANSLATE(description, 'en', 'de') AS description_german
FROM products;

-- Example 4: Batch translation with language detection
SELECT 
    feedback_id,
    feedback_text,
    SNOWFLAKE.CORTEX.DETECT_LANGUAGE(feedback_text) AS detected_language,
    SNOWFLAKE.CORTEX.TRANSLATE(feedback_text, 'auto', 'en') AS english_translation
FROM customer_feedback
WHERE SNOWFLAKE.CORTEX.DETECT_LANGUAGE(feedback_text) != 'en';

Sentiment Analysis and Text Classification

-- Example 5: Sentiment analysis
-- SENTIMENT returns a float from -1.0 (negative) to 1.0 (positive)
-- Useful for analyzing customer feedback, reviews, support tickets
SELECT 
    feedback_id,
    feedback_text,
    SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) AS sentiment_score,
    CASE 
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) > 0.5 THEN 'Positive'
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) < -0.5 THEN 'Negative'
        ELSE 'Neutral'
    END AS sentiment_label
FROM customer_feedback;

-- Example 6: Text classification
-- CLASSIFY parameters:
--   text: Input text to classify
--   categories: Array of category labels (2-20 categories typical)
-- Returns the most likely category
SELECT 
    ticket_id,
    ticket_text,
    SNOWFLAKE.CORTEX.CLASSIFY(
        ticket_text,
        ['technical_issue', 'billing', 'general_inquiry', 'complaint']
    ) AS ticket_category,
    SNOWFLAKE.CORTEX.SENTIMENT(ticket_text) AS urgency_score
FROM support_tickets;

-- Example 7: Multi-category classification with confidence
-- For more nuanced classification, use COMPLETE with structured output
SELECT 
    ticket_id,
    SNOWFLAKE.CORTEX.COMPLETE(
        'llama2-7b-chat',
        'Classify this support ticket into exactly one category: ' || ticket_text || 
        '. Categories: technical, billing, general, complaint. Respond with just the category.',
        {'temperature': 0.0}
    ) AS classified_category
FROM support_tickets;

Embeddings and Vector Search (RAG)

-- Example 8: Generate embeddings for documents
-- EMBED parameters:
--   model: Embedding model name (text-embedding-3-small or text-embedding-3-large)
--   text: Input text to embed (up to 8191 tokens)
-- Returns a FLOAT64 array (768-3072 dimensions depending on model)
-- Storage: ~4 KB per 768-dim embedding (float32), ~8 KB per 1536-dim embedding
CREATE TABLE document_embeddings AS
SELECT 
    document_id,
    title,
    content,
    SNOWFLAKE.CORTEX.EMBED(
        'text-embedding-3-small',   -- Model: 1536 dimensions, fast
        content                     -- Input text (chunk to ~512 tokens for best results)
    ) AS embedding
FROM documents;

-- Example 9: Similarity search for semantic matching
-- SIMILARITY parameters:
--   vector1: First embedding vector
--   vector2: Second embedding vector
-- Returns cosine similarity score (-1 to 1, higher = more similar)
SELECT 
    d.document_id,
    d.title,
    d.content,
    SNOWFLAKE.CORTEX.SIMILARITY(
        d.embedding,                                              -- Document embedding
        SNOWFLAKE.CORTEX.EMBED('text-embedding-3-small', 'search query text') -- Query embedding
    ) AS similarity_score
FROM document_embeddings d
ORDER BY similarity_score DESC
LIMIT 5;

-- Example 10: Full RAG (Retrieval-Augmented Generation) pipeline
-- Step 1: Embed the user query
WITH query_embedding AS (
    SELECT SNOWFLAKE.CORTEX.EMBED(
        'text-embedding-3-small',
        'What are the key findings from Q4 sales?'
    ) AS embedding
),
-- Step 2: Retrieve relevant documents using vector similarity
relevant_docs AS (
    SELECT 
        d.document_id,
        d.title,
        d.content,
        SNOWFLAKE.CORTEX.SIMILARITY(
            d.embedding,
            qe.embedding
        ) AS similarity_score
    FROM document_embeddings d, query_embedding qe
    ORDER BY similarity_score DESC
    LIMIT 3  -- Top 3 most relevant documents
)
-- Step 3: Generate response with retrieved context
SELECT SNOWFLAKE.CORTEX.COMPLETE(
    'llama2-70b-chat',
    'You are a helpful assistant. Use ONLY the following context to answer the question. ' ||
    'If the context does not contain the answer, say "I don''t have enough information." ' ||
    '\n\nContext:\n' || STRING_AGG(content, '\n---\n') || 
    '\n\nQuestion: What are the key findings from Q4 sales?',
    {'temperature': 0.3, 'max_tokens': 1000}
)
FROM relevant_docs;

Batch Processing and Data Enrichment

-- Example 11: Batch sentiment analysis and classification
-- Process all feedback records in a single query for efficiency
CREATE TABLE feedback_analysis AS
SELECT 
    feedback_id,
    feedback_text,
    SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) AS sentiment,
    SNOWFLAKE.CORTEX.CLASSIFY(
        feedback_text,
        ['product_quality', 'shipping', 'customer_service', 'price']
    ) AS category
FROM customer_feedback;

-- Example 12: AI-powered data enrichment with SQL function
-- Create a reusable function that combines multiple AI operations
CREATE OR REPLACE FUNCTION analyze_feedback(feedback_text VARCHAR)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
SELECT OBJECT_CONSTRUCT(
    'sentiment', SNOWFLAKE.CORTEX.SENTIMENT(feedback_text),
    'category', SNOWFLAKE.CORTEX.CLASSIFY(
        feedback_text,
        ['positive', 'negative', 'neutral']
    ),
    'key_phrases', SNOWFLAKE.CORTEX.EXTRACT(
        feedback_text,
        ['product', 'service', 'price', 'delivery']
    )
)
$$;

-- Use the function in queries
SELECT 
    feedback_id,
    feedback_text,
    analyze_feedback(feedback_text):sentiment::FLOAT AS sentiment,
    analyze_feedback(feedback_text):category::VARCHAR AS category,
    analyze_feedback(feedback_text):key_phrases::VARIANT AS key_phrases
FROM customer_feedback;

-- Example 13: Entity extraction for structured data enrichment
-- EXTRACT parameters:
--   text: Input text
--   entities: Array of entity types to extract
SELECT 
    feedback_id,
    feedback_text,
    SNOWFLAKE.CORTEX.EXTRACT(
        feedback_text,
        ['person', 'organization', 'location', 'date', 'product']
    ) AS extracted_entities
FROM customer_feedback;

-- Example 14: Batch summarize long documents
-- COMPLETE with structured prompts for consistent output
SELECT 
    document_id,
    title,
    SNOWFLAKE.CORTEX.COMPLETE(
        'llama2-70b-chat',
        'Summarize this document in exactly 3 bullet points:\n\n' || content,
        {'temperature': 0.2, 'max_tokens': 200}
    ) AS summary
FROM long_documents
WHERE LENGTH(content) > 1000;

Python Integration

# Example 15: Cortex AI with Snowpark Python
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col, lit, call_builtin

session = Session.builder.configs({...}).create()

# Use Cortex functions via SQL
result = session.sql("""
    SELECT 
        feedback_id,
        SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) as sentiment,
        SNOWFLAKE.CORTEX.CLASSIFY(
            feedback_text, 
            ['positive', 'negative', 'neutral']
        ) as category
    FROM customer_feedback
""").collect()

# Batch processing with Snowpark DataFrame
df = session.table("documents")
# Add sentiment column using SQL expression
enriched_df = df.with_column(
    "sentiment",
    call_builtin("SNOWFLAKE.CORTEX.SENTIMENT", col("content"))
)
enriched_df.write.mode("overwrite").save_as_table("documents_analyzed")

Performance Metrics

Metric	Target	Warning	Critical
LLM Response Time	< 2s	2-5s	> 5s
Embedding Generation	< 100ms	100-500ms	> 500ms
Similarity Search	< 50ms	50-200ms	> 200ms
Classification Accuracy	> 85%	70-85%	< 70%
Batch Processing Rate	> 1000 rows/s	500-1000 rows/s	< 500 rows/s

Best Practices

Choose appropriate models: Use smaller models (Llama-2-7B, Mistral-7B) for simple tasks and larger models (Llama-2-70B) for complex reasoning.
Optimize prompts: Craft clear, specific prompts to improve response quality. Include examples and constraints when needed.
Implement caching: Cache embeddings and LLM responses to reduce costs and improve latency for repeated queries.
Monitor model performance: Track accuracy, latency, and cost metrics. Set up alerts for performance degradation.
Use RAG for accuracy: Combine vector search with LLM generation to ground responses in factual data and reduce hallucinations.
Batch process when possible: Use batch inference for large datasets to optimize costs and throughput.
Implement fallback strategies: Design systems to handle LLM failures gracefully, with fallback to simpler models or rule-based systems.
Monitor costs: Cortex functions consume compute resources. Track usage and optimize expensive operations.
Secure sensitive data: Be careful with prompts containing sensitive information. Consider data masking before LLM processing.
Test thoroughly: Validate AI outputs for accuracy and bias. Implement human review for critical decisions.

Additional Theory: RAG Architecture Patterns

Retrieval-Augmented Generation (RAG) combines vector search with LLM generation to produce grounded, factual responses. The key insight is that LLMs have a knowledge cutoff and can hallucinate — RAG grounds responses in your actual data.

RAG pipeline stages:

Ingestion: Chunk documents, generate embeddings, store in vector table
Retrieval: Embed user query, find top-K similar documents via cosine similarity
Generation: Pass retrieved context + query to LLM for response generation

Chunking strategies:

Strategy	Chunk Size	Overlap	Best For
Fixed-size	512 tokens	50 tokens	General documents
Semantic	Variable	None	Well-structured content
Recursive	1024 tokens	100 tokens	Long documents
Sentence	1-3 sentences	None	Q&A systems

Embedding model selection:

text-embedding-3-small: 1536 dimensions, fast, cost-effective (default)
text-embedding-3-large: 3072 dimensions, higher accuracy, slower

Additional Theory: LLM Temperature and Parameters

Parameter	Range	Effect	Use Case
temperature	0.0-1.0	Lower = deterministic, Higher = creative	0.1 for factual, 0.7 for creative
max_tokens	1-4096	Limits output length	Match to expected response size
top_p	0.0-1.0	Nucleus sampling threshold	0.9 for balanced diversity
stop	Array	Stop sequences	["\n\n"] to stop at paragraph

Cortex AI: LLM Integration, Cortex Functions & AI Features

Cortex AI: LLM Integration, Cortex Functions & AI Features

Architecture Diagram 1: Cortex AI Architecture

Architecture Diagram 2: LLM Integration Patterns

Architecture Diagram 3: ML Model Deployment

Architecture Diagram 4: Cortex AI RAG Pipeline

Embedding Model Cost

Detailed Explanation

What is Snowflake Cortex?

Cortex Function Categories

LLM Integration Patterns

Embedding and Vector Search

ML Model Deployment

Advanced AI Patterns

Key Concepts Table

Code Examples

LLM Text Generation

Text Translation and Language Operations

Sentiment Analysis and Text Classification

Embeddings and Vector Search (RAG)

Batch Processing and Data Enrichment

Python Integration

Performance Metrics

Best Practices

Additional Theory: RAG Architecture Patterns

Additional Theory: LLM Temperature and Parameters

See Also

Premium Content

Need Expert Snowflake Help?