πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Cortex AI: LLM Integration, Cortex Functions & AI Features

🟒 Free Lesson

Advertisement

Cortex AI: LLM Integration, Cortex Functions & AI Features

Architecture Diagram 1: Cortex AI Architecture

Architecture Diagram 2: LLM Integration Patterns

Architecture Diagram 3: ML Model Deployment

Architecture Diagram 4: Cortex AI RAG Pipeline

Cortex AI RAG (Retrieval-Augmented Generation) PipelineEmbed Documentstext-embedding-3-smallStore in vector tableUser QueryEmbed questionSIMILARITY searchRetrieve ContextTop-K chunksRelevant passagesGenerate AnswerLlama-2-70B + contextGrounded responseZero data movement | Single platform | Built-in governance | Real-time embeddings

Snowflake Cortex is Snowflake's AI/ML service providing built-in LLM inference, fine-tuning, and embeddings directly within Snowflake. It eliminates the need to move data to external AI platforms by executing ML workloads on Snowflake's managed infrastructure.

Snowflake Copilot is a natural language interface that translates English questions into SQL queries. It uses Cortex LLMs to understand intent, generate SQL, and explain results, enabling non-technical users to query data conversationally.

Cortex LLM Cost Model
cost_per_query=tokens_inΓ—rate_in+tokens_outΓ—rate_outcost\_per\_query = tokens\_in \times rate\_in + tokens\_out \times rate\_out

Embedding Model Cost

  • Cost per embedding: ~$0.0001 per 1K tokens (Cortex Embed)
  • Batch size: Up to 1000 texts per request for efficiency
  • Dimension: 768-1536 depending on model choice
  • Storage: ~4 KB per 768-dim embedding (float32)
  • Indexing: Vector search with cosine similarity on Cortex Search

Start with smaller Cortex models (llama-3.1-8b) for cost efficiency; upgrade to larger models (llama-3.1-70b) for complex reasoning. Use batch inference for bulk processing. Combine Copilot with role-based access for governance.

  • Cortex: Built-in LLM inference, fine-tuning, embeddings within Snowflake
  • Copilot: Natural language -> SQL translation for non-technical users
  • Cost model: Pay per token; smaller models cheaper, larger models more capable
  • Security: Data never leaves Snowflake; model inference runs in Snowflake's VPC
  • Use cases: Text analysis, sentiment classification, entity extraction, summarization


Detailed Explanation

What is Snowflake Cortex?

Fully managed AI service bringing ML and LLM capabilities directly to your data within Snowflake.


Cortex Function Categories

LLM Functions:

  • COMPLETE β€” text generation
  • TRANSLATE β€” 100+ language translation
  • SENTIMENT β€” positive/negative/neutral scoring
  • CLASSIFY β€” zero-shot text classification
  • EXTRACT β€” named entity recognition

ML Functions:

  • EMBED β€” text β†’ vector embeddings
  • SIMILARITY β€” cosine similarity between vectors

LLM Integration Patterns

PatternComponentsUse Case
Text GenerationCOMPLETESummarization, content creation
RAGEMBED + SIMILARITY + COMPLETEKnowledge base, Q&A
ClassificationCLASSIFYContent moderation, routing
Entity ExtractionEXTRACTData enrichment

Embedding and Vector Search

  • EMBED converts text β†’ numerical vectors capturing semantic meaning
  • Store in Snowflake tables; search via SIMILARITY
  • Enables semantic search (not just keyword matching)
  • Useful for search, recommendations, duplicate detection

ML Model Deployment

  • Train externally β†’ register in Snowflake Model Registry
  • Invoke via SQL PREDICT or Snowpark ML APIs
  • Automatic compute scaling for inference workloads

Advanced AI Patterns

  1. Multi-step pipelines β€” combine multiple Cortex functions in sequence
  2. AI-powered enrichment β€” add sentiment, translations, classifications to existing data
  3. Conversational AI β€” COMPLETE functions with chat models for chatbots

Key Takeaway: Start with smaller models (Llama-2-7B) for cost efficiency; upgrade to larger (Llama-2-70B) for complex reasoning. Use RAG for grounded, factual responses.

Key Concepts Table

Cortex FunctionPurposeInputOutput
COMPLETEText generationPrompt, modelGenerated text
TRANSLATETranslationText, languagesTranslated text
SENTIMENTSentiment analysisTextSentiment score
CLASSIFYText classificationText, categoriesCategory label
EMBEDText embeddingTextVector embedding
SIMILARITYVector similarityTwo vectorsSimilarity score
Model TypeUse CasePerformanceCost
Llama-2-7BGeneral purposeFastLow
Llama-2-70BComplex reasoningMediumHigh
Mistral-7BFast inferenceVery fastLow
Embedding modelsVector searchFastLow
AI PatternComponentsComplexityUse Case
Text GenerationCOMPLETELowSummarization, content creation
RAGEMBED + SIMILARITY + COMPLETEHighKnowledge base, Q&A
ClassificationCLASSIFYLowContent moderation, routing
Entity ExtractionEXTRACTMediumData enrichment

Code Examples

LLM Text Generation

-- Example 1: Basic text generation with COMPLETE function
-- Parameters:
--   'llama2-70b-chat': Model identifier (smaller models = lower cost, faster)
--   prompt text: The input text for the model to process
--   JSON object: Optional configuration parameters
--     temperature: 0.0-1.0 (lower = more deterministic, higher = more creative)
--     max_tokens: Maximum number of tokens to generate (controls output length)
--     top_p: Nucleus sampling parameter (0.0-1.0, controls diversity)
--     stop: List of stop sequences to halt generation
SELECT SNOWFLAKE.CORTEX.COMPLETE(
    'llama2-70b-chat',                                      -- Model name
    'Summarize the following sales data: ' || sales_summary, -- Prompt text
    {
        'temperature': 0.7,    -- Moderate creativity
        'max_tokens': 500      -- Limit output to 500 tokens
    }
) AS summary
FROM sales_data;

-- Example 2: Temperature comparison for same prompt
-- Lower temperature = more focused, predictable output
-- Higher temperature = more creative, varied output
SELECT 
    SNOWFLAKE.CORTEX.COMPLETE('llama2-70b-chat', 
        'Write a tagline for our product', 
        {'temperature': 0.1, 'max_tokens': 50}
    ) AS deterministic_output,
    SNOWFLAKE.CORTEX.COMPLETE('llama2-70b-chat', 
        'Write a tagline for our product', 
        {'temperature': 0.9, 'max_tokens': 50}
    ) AS creative_output;

Text Translation and Language Operations

-- Example 3: Text translation
-- TRANSLATE parameters:
--   text: Source text to translate
--   source_lang: ISO 639-1 language code ('en', 'es', 'fr', 'de', 'ja', etc.)
--   target_lang: ISO 639-1 target language code
SELECT 
    product_name,
    description,
    SNOWFLAKE.CORTEX.TRANSLATE(
        description,   -- Input text
        'en',          -- Source language (English)
        'es'           -- Target language (Spanish)
    ) AS description_spanish,
    SNOWFLAKE.CORTEX.TRANSLATE(description, 'en', 'fr') AS description_french,
    SNOWFLAKE.CORTEX.TRANSLATE(description, 'en', 'de') AS description_german
FROM products;

-- Example 4: Batch translation with language detection
SELECT 
    feedback_id,
    feedback_text,
    SNOWFLAKE.CORTEX.DETECT_LANGUAGE(feedback_text) AS detected_language,
    SNOWFLAKE.CORTEX.TRANSLATE(feedback_text, 'auto', 'en') AS english_translation
FROM customer_feedback
WHERE SNOWFLAKE.CORTEX.DETECT_LANGUAGE(feedback_text) != 'en';

Sentiment Analysis and Text Classification

-- Example 5: Sentiment analysis
-- SENTIMENT returns a float from -1.0 (negative) to 1.0 (positive)
-- Useful for analyzing customer feedback, reviews, support tickets
SELECT 
    feedback_id,
    feedback_text,
    SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) AS sentiment_score,
    CASE 
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) > 0.5 THEN 'Positive'
        WHEN SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) < -0.5 THEN 'Negative'
        ELSE 'Neutral'
    END AS sentiment_label
FROM customer_feedback;

-- Example 6: Text classification
-- CLASSIFY parameters:
--   text: Input text to classify
--   categories: Array of category labels (2-20 categories typical)
-- Returns the most likely category
SELECT 
    ticket_id,
    ticket_text,
    SNOWFLAKE.CORTEX.CLASSIFY(
        ticket_text,
        ['technical_issue', 'billing', 'general_inquiry', 'complaint']
    ) AS ticket_category,
    SNOWFLAKE.CORTEX.SENTIMENT(ticket_text) AS urgency_score
FROM support_tickets;

-- Example 7: Multi-category classification with confidence
-- For more nuanced classification, use COMPLETE with structured output
SELECT 
    ticket_id,
    SNOWFLAKE.CORTEX.COMPLETE(
        'llama2-7b-chat',
        'Classify this support ticket into exactly one category: ' || ticket_text || 
        '. Categories: technical, billing, general, complaint. Respond with just the category.',
        {'temperature': 0.0}
    ) AS classified_category
FROM support_tickets;

Embeddings and Vector Search (RAG)

-- Example 8: Generate embeddings for documents
-- EMBED parameters:
--   model: Embedding model name (text-embedding-3-small or text-embedding-3-large)
--   text: Input text to embed (up to 8191 tokens)
-- Returns a FLOAT64 array (768-3072 dimensions depending on model)
-- Storage: ~4 KB per 768-dim embedding (float32), ~8 KB per 1536-dim embedding
CREATE TABLE document_embeddings AS
SELECT 
    document_id,
    title,
    content,
    SNOWFLAKE.CORTEX.EMBED(
        'text-embedding-3-small',   -- Model: 1536 dimensions, fast
        content                     -- Input text (chunk to ~512 tokens for best results)
    ) AS embedding
FROM documents;

-- Example 9: Similarity search for semantic matching
-- SIMILARITY parameters:
--   vector1: First embedding vector
--   vector2: Second embedding vector
-- Returns cosine similarity score (-1 to 1, higher = more similar)
SELECT 
    d.document_id,
    d.title,
    d.content,
    SNOWFLAKE.CORTEX.SIMILARITY(
        d.embedding,                                              -- Document embedding
        SNOWFLAKE.CORTEX.EMBED('text-embedding-3-small', 'search query text') -- Query embedding
    ) AS similarity_score
FROM document_embeddings d
ORDER BY similarity_score DESC
LIMIT 5;

-- Example 10: Full RAG (Retrieval-Augmented Generation) pipeline
-- Step 1: Embed the user query
WITH query_embedding AS (
    SELECT SNOWFLAKE.CORTEX.EMBED(
        'text-embedding-3-small',
        'What are the key findings from Q4 sales?'
    ) AS embedding
),
-- Step 2: Retrieve relevant documents using vector similarity
relevant_docs AS (
    SELECT 
        d.document_id,
        d.title,
        d.content,
        SNOWFLAKE.CORTEX.SIMILARITY(
            d.embedding,
            qe.embedding
        ) AS similarity_score
    FROM document_embeddings d, query_embedding qe
    ORDER BY similarity_score DESC
    LIMIT 3  -- Top 3 most relevant documents
)
-- Step 3: Generate response with retrieved context
SELECT SNOWFLAKE.CORTEX.COMPLETE(
    'llama2-70b-chat',
    'You are a helpful assistant. Use ONLY the following context to answer the question. ' ||
    'If the context does not contain the answer, say "I don''t have enough information." ' ||
    '\n\nContext:\n' || STRING_AGG(content, '\n---\n') || 
    '\n\nQuestion: What are the key findings from Q4 sales?',
    {'temperature': 0.3, 'max_tokens': 1000}
)
FROM relevant_docs;

Batch Processing and Data Enrichment

-- Example 11: Batch sentiment analysis and classification
-- Process all feedback records in a single query for efficiency
CREATE TABLE feedback_analysis AS
SELECT 
    feedback_id,
    feedback_text,
    SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) AS sentiment,
    SNOWFLAKE.CORTEX.CLASSIFY(
        feedback_text,
        ['product_quality', 'shipping', 'customer_service', 'price']
    ) AS category
FROM customer_feedback;

-- Example 12: AI-powered data enrichment with SQL function
-- Create a reusable function that combines multiple AI operations
CREATE OR REPLACE FUNCTION analyze_feedback(feedback_text VARCHAR)
RETURNS VARIANT
LANGUAGE SQL
AS
$$
SELECT OBJECT_CONSTRUCT(
    'sentiment', SNOWFLAKE.CORTEX.SENTIMENT(feedback_text),
    'category', SNOWFLAKE.CORTEX.CLASSIFY(
        feedback_text,
        ['positive', 'negative', 'neutral']
    ),
    'key_phrases', SNOWFLAKE.CORTEX.EXTRACT(
        feedback_text,
        ['product', 'service', 'price', 'delivery']
    )
)
$$;

-- Use the function in queries
SELECT 
    feedback_id,
    feedback_text,
    analyze_feedback(feedback_text):sentiment::FLOAT AS sentiment,
    analyze_feedback(feedback_text):category::VARCHAR AS category,
    analyze_feedback(feedback_text):key_phrases::VARIANT AS key_phrases
FROM customer_feedback;

-- Example 13: Entity extraction for structured data enrichment
-- EXTRACT parameters:
--   text: Input text
--   entities: Array of entity types to extract
SELECT 
    feedback_id,
    feedback_text,
    SNOWFLAKE.CORTEX.EXTRACT(
        feedback_text,
        ['person', 'organization', 'location', 'date', 'product']
    ) AS extracted_entities
FROM customer_feedback;

-- Example 14: Batch summarize long documents
-- COMPLETE with structured prompts for consistent output
SELECT 
    document_id,
    title,
    SNOWFLAKE.CORTEX.COMPLETE(
        'llama2-70b-chat',
        'Summarize this document in exactly 3 bullet points:\n\n' || content,
        {'temperature': 0.2, 'max_tokens': 200}
    ) AS summary
FROM long_documents
WHERE LENGTH(content) > 1000;

Python Integration

# Example 15: Cortex AI with Snowpark Python
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col, lit, call_builtin

session = Session.builder.configs({...}).create()

# Use Cortex functions via SQL
result = session.sql("""
    SELECT 
        feedback_id,
        SNOWFLAKE.CORTEX.SENTIMENT(feedback_text) as sentiment,
        SNOWFLAKE.CORTEX.CLASSIFY(
            feedback_text, 
            ['positive', 'negative', 'neutral']
        ) as category
    FROM customer_feedback
""").collect()

# Batch processing with Snowpark DataFrame
df = session.table("documents")
# Add sentiment column using SQL expression
enriched_df = df.with_column(
    "sentiment",
    call_builtin("SNOWFLAKE.CORTEX.SENTIMENT", col("content"))
)
enriched_df.write.mode("overwrite").save_as_table("documents_analyzed")

Performance Metrics

MetricTargetWarningCritical
LLM Response Time< 2s2-5s> 5s
Embedding Generation< 100ms100-500ms> 500ms
Similarity Search< 50ms50-200ms> 200ms
Classification Accuracy> 85%70-85%< 70%
Batch Processing Rate> 1000 rows/s500-1000 rows/s< 500 rows/s

Best Practices

  1. Choose appropriate models: Use smaller models (Llama-2-7B, Mistral-7B) for simple tasks and larger models (Llama-2-70B) for complex reasoning.

  2. Optimize prompts: Craft clear, specific prompts to improve response quality. Include examples and constraints when needed.

  3. Implement caching: Cache embeddings and LLM responses to reduce costs and improve latency for repeated queries.

  4. Monitor model performance: Track accuracy, latency, and cost metrics. Set up alerts for performance degradation.

  5. Use RAG for accuracy: Combine vector search with LLM generation to ground responses in factual data and reduce hallucinations.

  6. Batch process when possible: Use batch inference for large datasets to optimize costs and throughput.

  7. Implement fallback strategies: Design systems to handle LLM failures gracefully, with fallback to simpler models or rule-based systems.

  8. Monitor costs: Cortex functions consume compute resources. Track usage and optimize expensive operations.

  9. Secure sensitive data: Be careful with prompts containing sensitive information. Consider data masking before LLM processing.

  10. Test thoroughly: Validate AI outputs for accuracy and bias. Implement human review for critical decisions.


Additional Theory: RAG Architecture Patterns

Retrieval-Augmented Generation (RAG) combines vector search with LLM generation to produce grounded, factual responses. The key insight is that LLMs have a knowledge cutoff and can hallucinate β€” RAG grounds responses in your actual data.

RAG pipeline stages:

  1. Ingestion: Chunk documents, generate embeddings, store in vector table
  2. Retrieval: Embed user query, find top-K similar documents via cosine similarity
  3. Generation: Pass retrieved context + query to LLM for response generation

Chunking strategies:

StrategyChunk SizeOverlapBest For
Fixed-size512 tokens50 tokensGeneral documents
SemanticVariableNoneWell-structured content
Recursive1024 tokens100 tokensLong documents
Sentence1-3 sentencesNoneQ&A systems

Embedding model selection:

  • text-embedding-3-small: 1536 dimensions, fast, cost-effective (default)
  • text-embedding-3-large: 3072 dimensions, higher accuracy, slower

Additional Theory: LLM Temperature and Parameters

ParameterRangeEffectUse Case
temperature0.0-1.0Lower = deterministic, Higher = creative0.1 for factual, 0.7 for creative
max_tokens1-4096Limits output lengthMatch to expected response size
top_p0.0-1.0Nucleus sampling threshold0.9 for balanced diversity
stopArrayStop sequences["\n\n"] to stop at paragraph

See Also

⭐

Premium Content

Cortex AI: LLM Integration, Cortex Functions & AI Features

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Snowflake Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement