Multi-region Event Platform

A globally distributed event processing system handling 10k+ events per second with a 12ms p99 latency SLA.

Problem

A large-scale event-driven platform needed to process 10k+ events/second across three AWS regions while maintaining sub-20ms p99 latency — even during regional degradation.

Approach

The solution uses CQRS with an asynchronous event mesh built on EventBridge. Each region runs an independent processing stack with cross-region replication for event durability.

Key decisions:

  • EventBridge for decoupled event routing — removes tight coupling between producers and consumers
  • Lambda for processing — scales to zero, no idle capacity costs
  • DynamoDB Global Tables for state — automatic multi-region replication

Outcome

  • 12ms p99 latency (down from 40ms target)
  • 99.98% availability across a 6-month window
  • Zero regional failovers required during that period