Network cost: egress, cross-AZ, the surprise bill

Compute and storage are visible on every dashboard. The team can see how many cores are running, how many terabytes are stored, and roughly what each costs. Nobody is surprised by the EC2 line on the bill, because somebody is watching it. The line that surprises teams is the one labelled “data transfer”, and it surprises them because nobody is watching it. A pipeline that copies a hundred gigabytes between regions does not show up on a CPU graph. A service that fans out a thousand small responses across availability zones does not show up on a memory graph. Both show up on the invoice at the end of the month, often as the largest single item, and often only after a panic-stricken finance email.

This lesson is about that line. The mechanics of cloud network pricing, the patterns that produce surprise bills, the architectural levers that bring the number under control. The focus is AWS because its pricing structure is the most documented and the most copied; GCP and Azure differ in detail but the shapes are the same.

Why network charges hide

Three things conspire to make network cost invisible until the bill arrives.

First, the unit is small. A gigabyte of egress at $0.09 is nine cents. Nine cents per gigabyte does not feel like a thing worth thinking about. The instinct is to compare it to the per-hour cost of an instance ($0.10 to several dollars) and conclude that bytes are noise. The error is that bytes accumulate at machine speeds. A service moving 1 Gbps continuously is moving 10.8 TB per day, which at $0.09 per GB is around $970 per day, or $30K per month, for one service.

Second, the metering happens at infrastructure layers the application team does not look at. The application sees a successful HTTP response. The bill sees a charge for the bytes that flowed through a NAT gateway, a VPC peer, an internet gateway, or a cross-region link. The accounting is correct; it is just remote from the place where the cost is generated.

Third, the worst patterns look identical to the cheap patterns from the application’s point of view. A service calling another service at the same IP address is the same code path whether the target is in the same availability zone (free or near-free) or a different region (expensive). The network bill is where the geography matters, and geography is hidden from the code.

AWS egress: the headline number

The most prominent network charge is egress: traffic leaving AWS for the public internet. The AWS pricing page (https://aws.amazon.com/ec2/pricing/on-demand/, retrieved 2026-05-01) lists tiered rates that depend on region and volume. As of 2026 the typical bracket for North American and European regions is around $0.05 to $0.09 per GB, with discounts kicking in at the 10 TB and 50 TB monthly thresholds and bigger discounts negotiable above 500 TB.

The arithmetic is unforgiving once the volume is large. A team serving 1 TB per day of public traffic is paying roughly $80 per day, which is $2400 per month, which is $29K per year. A team serving 10 TB per day is paying $290K per year. A team serving 100 TB per month (which is not unusual for an application with an active mobile app or a public API) is paying around $8K per month for egress alone.

The horror stories that circulate in operations channels share a pattern: nobody knew the volume until the bill came. An open S3 bucket served as an unintentional public CDN and racked up tens of thousands of dollars in a weekend. A misconfigured cross-region replication backfilled multiple terabytes nightly. A cron job downloaded a hundred-gigabyte file every minute because someone copy-pasted a one-liner from an old runbook and removed the cache. Each of these is a configuration mistake; each costs more than the rest of the team’s compute combined for the month it ran undetected.

Cross-AZ: the silent multiplier

The second tier of charge is cross-availability-zone traffic within a region. The rate is around $0.01 per GB per direction in most AWS regions, sometimes $0.02. The number looks small. It is not.

Multi-AZ deployments are the standard pattern for resilience: put the application across three availability zones, put the database across three availability zones, run the load balancer across three. The cost of that pattern is that any traffic between the application and the database, between application instances, between application and cache, has a meaningful chance of crossing an AZ boundary and incurring the per-GB charge per direction.

A chatty service moving 1 Gbps cross-AZ is moving 86 TB per day. At $0.01 per GB per direction, that is around $860 per day per direction, or $20K per direction per month. A microservices architecture in which every internal call has a one-in-three chance of crossing an AZ produces costs of this shape almost everywhere. The bill aggregates them under a generic “regional data transfer” line that does not point at any one offender.

The architectural mitigation is locality. Pin chatty pairs of services to the same AZ where the resilience model allows it. Use AZ-aware load balancing so a service in AZ-a prefers a backend in AZ-a. Replicate cache reads cross-AZ but keep cache writes in-AZ. The Discord case study (lesson 32) and the LinkedIn replication posts both touch on this; the pattern shows up in nearly every published case study at scale.

NAT gateway: the per-hour and per-GB tax

The NAT gateway sits between private subnets and the internet, translating outbound traffic so private resources can reach external services without being publicly addressable. AWS prices it at around $0.045 per hour plus $0.045 per GB of data processed, per AZ.

The hourly charge alone is $32 per month per gateway. Multi-AZ deployments running three NAT gateways pay $96 per month before any traffic. Multiple environments (dev, staging, prod) multiply that. A medium-sized organisation can spend $1K per month on NAT-gateway baseline charges before processing a single byte.

The per-GB charge is what catches teams. Routing AWS-service traffic (S3 reads, DynamoDB queries, secrets fetches) through the NAT gateway means paying $0.045 per GB on top of any service charges. A pipeline that reads a terabyte of S3 data through a NAT gateway pays $45 in NAT charges for that read, on top of the S3 request and storage costs. Multiply by the number of pipeline runs, the number of environments, and the number of pipelines, and the line item becomes serious.

VPC endpoints: the standard fix

VPC endpoints are the correct architectural answer to the NAT-routed AWS-service traffic. They provide a private path from the VPC to specific AWS services that does not traverse the NAT gateway and does not count as internet egress. Two flavours exist:

Gateway endpoints: free, available for S3 and DynamoDB. There is no excuse for not using them. A VPC routing S3 traffic without a gateway endpoint is paying NAT charges that disappear with a one-line route table change.

Interface endpoints: charged by the hour and per-GB processed but typically cheaper than NAT, available for most other AWS services (ECR, Secrets Manager, KMS, CloudWatch, and many more). Pricing as of 2026 is around $0.01 per hour per AZ per endpoint, plus around $0.01 per GB. The arithmetic favours interface endpoints whenever the NAT-routed volume is more than a few hundred GB per month.

The audit pattern is straightforward. List the AWS services your private resources call. Check whether each has an endpoint available. Compare the endpoint cost (hourly + per-GB) to the equivalent NAT cost. Switch the noisy ones. The work is mostly route-table and security-group plumbing; the savings show up on the next bill.

CDN: the counterintuitive saving

The instinct “put a CDN in front of public content” is partly performance (cached responses are closer to users) and partly cost. The cost piece is less obvious until the rates are compared.

Direct S3 egress to the public internet runs at the standard $0.05 to $0.09 per GB tier. CloudFront’s distribution from S3 to its edge locations costs the same as S3-to-EC2 traffic (much cheaper, often free for the same-region path), and CloudFront’s egress to end users runs at a different tier with negotiated discounts that scale aggressively above 10 TB. For a high-volume public asset (images, videos, JavaScript bundles, downloadable files), CloudFront is typically 30 to 60 percent cheaper than direct S3 serving once the volume is meaningful, and the cache-hit rate further reduces origin load.

The pattern: any S3 bucket directly serving public traffic at scale is overpaying. Stick a CloudFront distribution in front, point the DNS at CloudFront, and the bill drops while latency improves. Cloudflare and Fastly have similar economics. The exact crossover volume depends on the negotiated rates, but the direction of the saving is consistent and the change is mechanical.

Egress as a primary metric

The thread running through the patterns above is that egress is a first-class operational metric, on the same level as CPU, memory, and request rate. A team treating it as a finance concern that looks at it monthly is a team that finds out about a problem long after it has become expensive.

The instrumentation that makes the difference:

Per-service egress dashboards. VPC flow logs, processed through a tool that bins traffic by source service, destination, and AZ pair, and presents a daily bill estimate. The first time a team sees this dashboard, the worst offender is usually obvious within a minute.
Anomaly alerts on the bill. Cost Explorer’s anomaly detection, or a third-party tool like Vantage or CloudHealth, alerts when daily spend deviates from the trend. A misconfigured replication or runaway cron triggers the alert hours after it starts, not weeks.
Egress budgets in the SLO framework. Lesson 60’s error-budget concept applies here: a service has a “monthly egress budget” the same way it has a “monthly downtime budget”. Going over is a signal worth investigating.

Diagram to create: a side-by-side cost map of two architectures. The left side shows a naive deployment: services in private subnets routing AWS-service traffic through a NAT gateway, multi-AZ chatter without locality awareness, and an S3 bucket serving public traffic directly. Each path is labelled with a per-GB rate. The right side shows the optimised version: gateway endpoints for S3 and DynamoDB, interface endpoints for the other AWS services, AZ-aware load balancing, and CloudFront in front of the public bucket. Same paths, much smaller per-GB labels.

A reasonable target architecture

For a team building or auditing a deployment in 2026, the cost-aware defaults are:

Gateway endpoints for S3 and DynamoDB on every VPC. Free; saves NAT charges; should be in the standard module.
Interface endpoints for any AWS service called at meaningful volume from private subnets. Audit annually as new services come into use.
AZ-aware service-to-service communication where the resilience model allows. Pin chatty pairs to the same AZ.
CloudFront (or equivalent CDN) in front of any S3 bucket serving public traffic above a few hundred GB per month.
VPC flow logs on, parsed into a per-service egress dashboard, reviewed monthly as a normal platform-team task.
Cost-anomaly alerts wired into the on-call channel from lesson 63, treating an unexpected egress spike as an incident worth paging on if the magnitude warrants it.

The compounding effect of these is large. A team that audits its NAT and AZ patterns and migrates to endpoints typically cuts the network bill by 40 to 70 percent without changing anything user-visible. The work is unglamorous and the savings recur every month for as long as the architecture stays. It is the kind of change that pays for the engineer’s salary several times over and never wins an architecture-review-board award.

What the next two lessons set up

This lesson covered the cost angle of network architecture. Lesson 69 takes the scaling angle: when the load grows ten times, which of these patterns survive and which become bottlenecks? Lesson 70 takes the latency angle, with caching as the main lever. The three together form the cost-performance triangle that Module 9 keeps coming back to: every architectural choice has implications on the bill, on the throughput ceiling, and on the response time, and the platform team’s job is to keep all three in view.

Citations and further reading

AWS, “EC2 On-Demand Pricing”, https://aws.amazon.com/ec2/pricing/on-demand/ (retrieved 2026-05-01). The data-transfer section is the canonical source for the egress and cross-AZ rates referenced in this lesson.
AWS, “VPC Pricing”, https://aws.amazon.com/vpc/pricing/ (retrieved 2026-05-01). NAT gateway and interface-endpoint rates.
AWS, “Amazon CloudFront Pricing”, https://aws.amazon.com/cloudfront/pricing/ (retrieved 2026-05-01). The CDN tiers and the per-region rates that drive the CloudFront-versus-S3 comparison.
Cloudflare, “AWS’s Egregious Egress” series on the Cloudflare blog, https://blog.cloudflare.com/aws-egregious-egress/ (retrieved 2026-05-01). A vendor-perspective but well-documented breakdown of how AWS’s egress pricing compares to alternatives, useful for the macro picture.
Google Cloud, “Network pricing”, https://cloud.google.com/vpc/network-pricing (retrieved 2026-05-01). For comparison; the structure parallels AWS.
Microsoft Azure, “Bandwidth pricing”, https://azure.microsoft.com/en-us/pricing/details/bandwidth/ (retrieved 2026-05-01). For comparison; again, similar structure.
Corey Quinn’s “Last Week in AWS” newsletter, https://www.lastweekinaws.com/ (retrieved 2026-05-01). Long-running commentary on AWS billing surprises, including many of the egress-bill stories that circulate in the community.