One of the most meaningful projects I delivered recently was a complete notification system for an international client whose application already ran inside AWS. With a backend on EC2, RDS for data, and S3 for storage, the challenge was to design a system that could reliably send notifications across multiple channels while handling dozens of unique trigger points and complex scheduling rules.
The Challenge: Orchestrating Chaos
The application had around 30-40 different triggers linked to user actions, system workflows, and business logic. Each trigger came with its own unique requirements:
- Real-time Delivery: Booking updates, payment confirmations, and account security alerts.
- Scheduled Follow-ups: Reminders, abandoned cart recovery, and inactivity nudges.
- Multi-Channel Support: Push notifications (Firebase), Email (Brevo), and WhatsApp (Brevo).
The goal was to design a single, unified system that could manage all of this cleanly, consistently, and reliably.
The Solution: Event-Driven Architecture
To meet these requirements, I architected a robust event-driven pipeline using AWS EventBridge, Event Scheduler, Lambda, and SQS. Here’s how the data flows through the system:
1. Event Ingestion
The backend (Express.js) emits a standardized notification event whenever a trigger fires. It determines whether the notification needs instant processing or delayed delivery. This decoupling allows the core application to "fire and forget" without worrying about delivery mechanics.
2. Intelligent Routing
We used a split-routing strategy to handle timing:
- Immediate Notifications: Sent directly to AWS EventBridge, which forwards them instantly to our SQS queue.
- Scheduled Notifications: Sent to AWS Event Scheduler, which holds the event and releases it into the same SQS queue at the exact scheduled time.
This approach allowed us to treat both instant and delayed messages uniformly once they reached the processing layer.
3. The Processing Brain (AWS Lambda)
An SQS trigger activates our core Lambda function, which acts as the brain of the operation. It performs several critical steps:
- Context Enrichment: Fetches user preferences and merges dynamic data into payloads.
- Template Resolution: Lookups correct template IDs for Brevo or Firebase.
- Channel Dispatch: Routes the payload to the appropriate provider (Firebase for Push, Brevo for Email/WhatsApp).
Resilience: Handling Failures Gracefully
In distributed systems, failures are inevitable. To ensure reliability, we implemented a robust retry mechanism. If a notification fails due to an external API outage (e.g., Brevo is down), the event is pushed into a Deque (Dead Letter Queue equivalent strategy) and then re-injected into SQS with an exponential backoff. This ensures no message is ever lost.
Impact
This solution transformed the client's communication strategy. It replaced scattered cron jobs and inline API calls with a centralized, scalable infrastructure. We improved user engagement through timely nudges, increased completion rates, and most importantly, gave the client the flexibility to add new notification rules without touching the core codebase.