Cloud ML: AWS SageMaker and GCP Vertex AI

Cloud ML platforms provide managed infrastructure for training, tuning, deploying, and monitoring models at scale.

/instance-hour | Inference:

/instance-hourSpot instances save up to 70% on training costs

Cloud ML Landscape

1. AWS SageMaker Workflow

SageMaker SDK Example

import sagemaker
from sagemaker.sklearn.estimator import SKLearn

estimator = SKLearn(
    entry_point="train.py",
    role=sagemaker.get_execution_role(),
    instance_count=1,
    instance_type="ml.m5.xlarge",
    framework_version="1.0-1",
    hyperparameters={"n_estimators": 100, "max_depth": 8},
    output_path="s3://bucket/output"
)

estimator.fit({"train": "s3://bucket/train/", "test": "s3://bucket/test/"})

# Deploy to endpoint
predictor = estimator.deploy(
    initial_instance_count=1,
    instance_type="ml.t2.medium"
)

2. GCP Vertex AI

Vertex AI Pipeline (Kubeflow)

from google_cloud_pipeline.pipeline import pipeline
from kfp import dsl

@dsl.pipeline(name="training-pipeline")
def training_pipeline(
    data_path: str = "gs://bucket/data/",
    n_estimators: int = 200
):
    prepare = dsl.ContainerOp(
        name="prepare",
        image="gcr.io/project/prepare:latest",
        arguments=["--input", data_path]
    )

    train = dsl.ContainerOp(
        name="train",
        image="gcr.io/project/train:latest",
        arguments=["--n-estimators", n_estimators]
    ).after(prepare)

    evaluate = dsl.ContainerOp(
        name="evaluate",
        image="gcr.io/project/evaluate:latest"
    ).after(train)

3. Feature Store Comparison

Feature	SageMaker Feature Store	Vertex AI Feature Store
Storage	Online + Offline	Online + Offline
Query	Point lookup, batch	Point lookup, batch
Integration	SageMaker pipelines	Vertex AI pipelines
Refresh	Scheduled or on-demand	Streaming or batch
Pricing	Per GB stored + reads	Per GB stored + reads

4. Managed Inference

# SageMaker Real-time
from sagemaker.model import Model

model = Model(
    image_uri=inference_image,
    model_data="s3://bucket/model.tar.gz",
    role=role
)
predictor = model.deploy(instance_type="ml.g4dn.xlarge", initial_instance_count=1)

# Serverless inference
from sagemaker.serverless import ServerlessInferenceConfig

predictor = estimator.deploy(
    serverless_inference_config=ServerlessInferenceConfig(
        memory_size_in_mb=2048,
        max_concurrency=50
    )
)

5. Cost Optimization

6. Multi-Cloud Strategy

Avoid lock-in: Use abstracted interfaces (Kubeflow, MLflow)
Data gravity: Keep data where compute lives
Cost comparison: Profile workloads across providers
Compliance: Consider data residency requirements

Key Takeaways

SageMaker: Most comprehensive; best for AWS-native shops
Vertex AI: Strong AutoML and BigQuery integration; Google Research models
Azure ML: Enterprise integration; OpenAI access; responsible AI tools
Cost: Spot training + right-sized inference = 50-70% savings

Cloud ML: AWS SageMaker and GCP Vertex AI

Cloud ML: AWS SageMaker and GCP Vertex AI

Cloud ML Landscape

1. AWS SageMaker Workflow

SageMaker SDK Example

2. GCP Vertex AI

Vertex AI Pipeline (Kubeflow)

3. Feature Store Comparison

4. Managed Inference

5. Cost Optimization

6. Multi-Cloud Strategy

Key Takeaways

Premium Content

Need Expert Data Science Help?