Multi-region Event Platform
A globally distributed event processing system handling 10k+ events per second with a 12ms p99 latency SLA.
Problem
A large-scale event-driven platform needed to process 10k+ events/second across three AWS regions while maintaining sub-20ms p99 latency — even during regional degradation.
Approach
The solution uses CQRS with an asynchronous event mesh built on EventBridge. Each region runs an independent processing stack with cross-region replication for event durability.
Key decisions:
- EventBridge for decoupled event routing — removes tight coupling between producers and consumers
- Lambda for processing — scales to zero, no idle capacity costs
- DynamoDB Global Tables for state — automatic multi-region replication
Outcome
- 12ms p99 latency (down from 40ms target)
- 99.98% availability across a 6-month window
- Zero regional failovers required during that period