How I designed a CDN-backed AEM content delivery system for 50M+ page views

When you’re serving a global automotive brand’s owner manual to millions of drivers across dozens of markets, “it works on my machine” doesn’t cut it. This is the story of how I architected a CDN-backed content delivery system on top of AEM 6.5 that improved page load speed by 25% and grew organic traffic by 15%.

The starting point and what was wrong

The platform was using AEM publish instances as the primary content delivery mechanism. Every user request hit the Dispatcher first, and cache misses went straight to the publish cluster. During marketing campaigns — new model launches, service bulletins — traffic would spike 10x and the publish tier would buckle under the load.

The immediate instinct was “just add more publish instances.” I pushed back. Scaling horizontally buys time, it doesn’t fix the architecture. The real fix was to reduce origin dependency entirely.

The architecture decision: hybrid static + SSR

I evaluated three approaches:

  • Full SSR on every request — simplest to implement, worst performance under load
  • Full static generation — fastest delivery, but breaks personalisation and dynamic content
  • Hybrid: static + edge SSR for dynamic paths — best of both, highest implementation complexity

We chose hybrid. AEM author activates content → replication agent pushes static HTML to S3 → CloudFront serves static from S3 edge → Lambda@Edge handles dynamic personalisation (language, region, model year) without origin roundtrip for 90% of requests.

// Content delivery decision tree
Request arrives at CloudFront
  ├── Cache HIT (static S3)     → serve, TTFB ~180ms, no origin touch
  ├── Dynamic path (lang/region) → Lambda@Edge → inject personalisation
  └── Cache MISS                → Dispatcher → Publish → JCR fetch → cache + serve

The tradeoffs we accepted

Nothing in system design is free. Hybrid static + SSR introduced cache invalidation complexity. When an author updates content in AEM, we need to invalidate the correct CloudFront paths. We solved this with a custom AEM replication agent that fires CloudFront invalidation API calls on activation events.

We also accepted eventual consistency — between an author publishing content and CloudFront serving the new version, there’s a propagation window of 30–90 seconds depending on edge location. For an owner manual, this was acceptable. For breaking news, it wouldn’t be.

Results after 6 months in production

+25%page load speed improvement
+15%organic traffic growth
-60%origin hits on peak days
~180msTTFB on cached paths

The system has now handled multiple major campaign spikes without any scaling interventions. The publish tier runs at consistently low CPU. AWS costs are predictable. And the content team can publish globally without worrying about infrastructure.

The lesson: don’t scale the problem, architect around it.

Leave a Comment

Your email address will not be published. Required fields are marked *