The Cloud Cost Traps Nobody Warns You About

Cloud providers make moving money in cheap and moving data out expensive. Understanding the cost structure before you build will save you from architectural decisions that are expensive to undo.

The first AWS bill is usually a surprise. Not always in the bad direction — sometimes organizations discover the cloud is cheaper than expected for their initial workload. But within 12-18 months, as usage grows and architectural decisions accumulate, the bill starts looking like it was designed to be confusing. Because in many ways, it was.

I’ve helped organizations cut cloud spend by 30-50% without any change to workload performance or capability. The savings are almost always hidden in the same handful of patterns. Understanding them before you build is worth more than any optimization effort after the fact.

Egress Is the Hidden Tax

Cloud providers charge next to nothing for inbound data. They charge significantly for outbound data — the traffic leaving their network to the public internet or to another provider. This asymmetry is deliberate. Once your data is in, moving it out is expensive.

AWS charges roughly $0.09/GB for data transfer out to the internet (after the first 10TB/month, where it drops slightly). GCP is similar. For a service that transfers significant data to users — video platforms, large file delivery, data-heavy APIs — this line item can dwarf compute costs.

The fix at the architecture level: put a CDN in front of anything that serves repeated content to end users. CloudFront, Fastly, Cloudflare — the specific choice matters less than having one. CDN egress is dramatically cheaper than cloud provider egress, and CDN hits from cache have no origin transfer cost at all.

The fix at the audit level: if you’re already paying high egress bills, pull the Cost Explorer data broken out by data transfer and identify what’s generating the volume. Often it’s a single service with caching that was never configured, or internal traffic between regions that could be consolidated.

Reserved Instances and Savings Plans Are Worth the Commitment

On-demand compute pricing is designed for unpredictable workloads. If you’re running predictable compute — any server that needs to be up 24/7 — you’re paying a significant premium for flexibility you don’t need.

AWS Reserved Instances (1 or 3 year terms) typically reduce compute costs by 30-40% compared to on-demand. Savings Plans are more flexible and offer similar discounts. GCP Committed Use Discounts work similarly.

The reason organizations don’t use these: the commitment feels risky. What if we need different instances? What if we change architectures? In practice, the base compute tier of most applications is predictable over 12-month horizons. Reserve the baseline, run variable capacity on-demand. The hybrid model is easy to manage and delivers most of the savings with limited risk.

One important nuance: Reserved Instances are tied to instance type and region. Instance type flexibility (Convertible RIs) is available at lower discounts. If you’re not sure about instance types, Savings Plans offer more flexibility at slightly lower savings rates. Start with Savings Plans for simplicity.

Idle Resources Are the Most Immediate Waste

In every cloud cost audit I’ve run, idle resources are the first finding. Resources that exist but aren’t serving any purpose. The common ones:

Unattached EBS volumes. When an EC2 instance is terminated, attached volumes aren’t automatically deleted (by default). They keep generating charges at $0.10/GB/month. A 500GB volume forgotten for 18 months costs $900 with zero benefit.

Idle load balancers. Application Load Balancers have a base charge ($0.008/LLB-hour) plus usage charges. An ALB standing up with no registered targets is pure waste.

Old snapshots. EBS snapshots and RDS snapshots accumulate over time. Old snapshots from instances that no longer exist, test database snapshots from two years ago — these pile up. A simple lifecycle policy cleans them up automatically.

Unused Elastic IPs. AWS charges for Elastic IPs not associated with running instances. One IP is $0.005/hour — not dramatic, but the unused IP tends to be a signal that other resources associated with that deployment were also left running.

The fix: a quarterly cloud audit with a focus on identifying unused resources. Most cloud management platforms have resource utilization reports. The harder problem is organizational: who owns the cleanup? Whose budget does the waste come from? Without clear ownership, nobody has incentive to terminate resources they’re not sure about.

Data Storage Tiers Matter More Than You Think

S3 Standard is the default. It’s also the most expensive S3 tier. Objects that haven’t been accessed in 30+ days don’t belong in Standard.

S3 Intelligent-Tiering automatically moves objects between access tiers based on usage patterns. The monitoring cost ($0.0025 per 1,000 objects) is trivial for most workloads, and the savings on infrequently accessed data are significant.

For objects you know you’ll rarely need — old logs, archive data, compliance records — S3 Glacier starts at a fraction of Standard pricing. The tradeoff is retrieval time (minutes to hours depending on the tier). If your use case is “we might need these someday but rarely do,” Glacier is appropriate.

The pattern I see: organizations put everything in S3 Standard at project start and never revisit storage tiers. Adding lifecycle policies to existing buckets takes an afternoon. The savings are immediate and ongoing.

RDS and Database Sizing

Databases are where organizations consistently overprovision. The initial sizing exercise is based on estimated peak load with a healthy buffer. That buffer persists indefinitely.

RDS instance costs are linear with instance size. A db.r5.4xlarge costs 8x a db.r5.large. Most small-to-mid applications run comfortably on db.r5.large or smaller, but were provisioned on db.r5.2xlarge or larger “to be safe.”

Pull CPU and memory utilization metrics from your RDS instances over the past 90 days. If P95 CPU utilization is under 30% and memory usage is consistently below 60%, you’re provisioned larger than you need. One size down is usually safe; two sizes requires testing.

Aurora Serverless v2 is worth considering for workloads with variable load — it scales capacity up and down automatically, which can dramatically reduce cost for databases that have clear usage patterns (business hours load, off-peak low load).

Network Architecture That Creates Unnecessary Costs

Some network architecture decisions that seem reasonable create ongoing cost overhead:

Multi-region deployments for single-region applications. If your users are all in one region and your application has no DR requirement for cross-region failover, running in multiple regions doubles your base infrastructure cost. Separate availability zones within a single region provide sufficient resilience for most applications.

Cross-AZ traffic. Within a region, traffic between availability zones is charged ($0.01/GB in AWS). In a well-architected application, this is low. In applications with chatty microservices calling each other across AZs, it accumulates. This isn’t a reason to avoid multi-AZ deployments (the resilience is worth it), but chatty microservice architectures should be aware of the hidden cost.

NAT Gateway for egress. NAT Gateway charges per hour ($0.045/hour) plus per GB of data processed ($0.045/GB). For applications with high egress volume from private subnets, NAT Gateway costs can be significant. Alternatives include VPC endpoints (for AWS service traffic that shouldn’t go through NAT) and evaluating whether some workloads can run in public subnets.

The Audit You Should Do Now

If you’re spending more than $5k/month on cloud services and haven’t done a formal cost audit in the past 6 months:

  1. Export Cost Explorer data for the past 90 days, broken out by service and tag
  2. Identify the top 10 cost drivers by dollar amount
  3. For each: is this expected? Is utilization appropriate? Are there cheaper alternatives?
  4. Identify all resources with no activity in the past 30 days
  5. Check Reserved Instance/Savings Plan coverage against on-demand usage

This audit will surface savings in 90% of environments. The question is always whether someone is accountable for following through.

Our cloud infrastructure practice runs these audits as part of new engagements, and our cloud migration and cost optimization service handles the more intensive FinOps work for organizations where cloud spend has become a significant budget item. The savings we’ve found consistently run 30-40% of total cloud spend for organizations that haven’t done structured cost management.

The cloud billing dashboard shows you what you spent. It rarely shows you what you wasted.