Operations
Cost Optimization
Cloud costs can spiral without deliberate management. Cost optimization is about getting the most value from your infrastructure spend while avoiding waste.
- Right-Sizing β Match resources to actual needs
- Reserved Capacity β Commit for discounts
- FinOps β Cultural practice of financial accountability
Cost optimization is not about spending less β it's about spending wisely.
The Cloud Cost Problem
Cloud computing shifts capital expenditure to operational expenditure, but without governance, costs grow unchecked.
DfCloud Waste
Cloud waste is spending on cloud resources that are underutilized, idle, or misconfigured. Studies show 30-35% of cloud spend is wasted on average. Common sources: over-provisioned instances, idle resources, lack of autoscaling, and unused storage.
Cost Per Transaction
Here,
- =Monthly cloud spend across all services
- =Total transactions in the same period
Cloud Waste Calculation
Monthly cloud spend: $10,000
- Idle resources: $2,000 (20%)
- Over-provisioned instances: $1,500 (15%)
- Unused storage: $500 (5%)
Total waste: 48,000/year) Optimizing could save 40% of the bill.
Right-Sizing
Matching instance types and sizes to actual workload requirements.
DfRight-Sizing
Right-sizing is the process of analyzing resource utilization and adjusting instance types, sizes, and configurations to match actual needs. It involves selecting the optimal instance type (CPU, memory, network) and scaling horizontally rather than vertically when possible.
Most cloud providers offer right-sizing recommendations based on historical utilization. AWS Compute Optimizer, Azure Advisor, and GCP Recommender analyze 14+ days of metrics to suggest optimal instance types.
Right-Sizing Strategy
| Step | Action |
|---|---|
| 1 | Collect utilization metrics (CPU, memory, network, disk) |
| 2 | Identify underutilized instances (< 30% average) |
| 3 | Right-size to smaller instance type |
| 4 | Switch to Graviton/ARM if supported |
| 5 | Use spot instances for fault-tolerant workloads |
| 6 | Set up autoscaling policies |
Pricing Models
Pricing Strategy
DfReserved Instance Strategy
Combine pricing models based on workload characteristics:
- Steady-state (databases, always-on): Reserved instances (1-3 year)
- Variable (web servers, APIs): On-demand + autoscaling
- Fault-tolerant (batch jobs, CI): Spot instances (70-90% discount)
- Flexible: Savings Plans (commitment with flexibility)
Use a mix of pricing models. A typical optimized workload uses 40% reserved (base load), 30% on-demand (variable load), and 30% spot (fault-tolerant workloads). This achieves ~50% overall savings vs pure on-demand.
Storage Optimization
Storage Tiering
Here,
- =Storage tier (hot, warm, cold, archive)
- =Amount of data in each tier
- =Price per GB for each tier
| Tier | Access Pattern | Cost/GB/month | Use Case |
|---|---|---|---|
| Hot | Frequent | $0.023 | Active data, databases |
| Warm | Infrequent | $0.0125 | Recent logs, analytics |
| Cold | Rare | $0.004 | Compliance, archives |
| Archive | Never | $0.001 | Long-term backup |
Storage Tiering Savings
10TB of data with access pattern: 2TB hot, 3TB warm, 5TB cold:
- Without tiering (all hot): 10TB Γ 230/month
- With tiering: 2Γ0.0125 + 5Γ0.046 + 0.02 = 104/month
Savings: 55% ($1,512/year)
FinOps
A cultural practice for financial accountability in cloud spending.
DfFinOps
FinOps is an evolving cloud financial management discipline that enables organizations to get maximum business value by helping engineering, finance, and business teams collaborate on data-driven spending decisions. It combines tools, processes, and culture to optimize cloud costs.
FinOps Lifecycle
| Phase | Activities |
|---|---|
| Inform | Showback/chargeback, budgeting, forecasting |
| Optimize | Right-sizing, reserved instances, architecture |
| Operate | Continuous monitoring, anomaly detection, governance |
Implement showback reports that attribute cloud costs to teams/services. When teams see their cloud bill, they naturally optimize. AWS Cost Explorer, Azure Cost Management, and GCP Billing provide cost allocation by project, service, and tag.
Cost Monitoring
| Metric | Description | Target |
|---|---|---|
| Cost per transaction | Infrastructure cost per business unit | Decreasing trend |
| Utilization rate | CPU/memory usage vs provisioned | > 60% average |
| Idle cost | Spend on unused resources | < 5% of total |
| Savings coverage | % of spend covered by reservations | > 70% for steady-state |
Set up billing alerts at 50%, 75%, and 100% of budget. Use anomaly detection to identify unexpected spikes. Review costs weekly at the team level and monthly at the executive level.
Practice Exercises
-
Analysis: A startup spends $50,000/month on AWS. Analyze their usage: 40% on-demand EC2, 20% RDS, 15% S3, 15% data transfer, 10% other. Identify optimization opportunities.
-
Strategy: Design a cost optimization strategy for a SaaS company with 100 microservices. Include right-sizing, pricing models, and autoscaling policies.
-
Storage: Design a storage tiering strategy for a healthcare app with 50TB of data: 5TB accessed daily, 15TB weekly, 30TB monthly. Calculate monthly costs with and without tiering.
-
FinOps: Implement a showback system for a 50-person engineering team. How do you attribute cloud costs to services, teams, and features?
Key Takeaways:
- 30-35% of cloud spend is typically wasted; right-sizing is the first optimization
- Combine pricing models: reserved for base load, on-demand for variable, spot for fault-tolerant
- Storage tiering moves infrequently accessed data to cheaper storage classes
- FinOps creates cultural accountability for cloud spending
- Monitor cost per transaction, utilization rate, and idle cost
- Set up billing alerts and cost anomaly detection
- Cost optimization is an ongoing process, not a one-time project
What to Learn Next
-> Containerization Docker, Kubernetes, and resource management.
-> Scalability Fundamentals Vertical vs horizontal scaling and capacity planning.
-> CDN Edge caching and content distribution.
-> Observability Logging, metrics, tracing, and monitoring.
-> Load Balancing Distribution algorithms and health checks.
-> CI/CD Pipelines Continuous integration and deployment strategies.