Error Handling
All pipeline components share a common error handling module located in
src/lib/error-handling/.
Overview
When an error occurs during message processing, the error handling policy classifies it and takes one of three actions:
-
Retry — delay the message via
me-pubsub-delayerfor later redelivery -
Escalate — publish to the
sending-mg-errorstopic for investigation -
Drop — log and acknowledge silently (for known non-actionable errors)
For a visual flow, see the error handling diagram.
Retry with Exponential Backoff
Transient errors trigger a retry via the Pub/Sub Delay Service. The message is republished with an incremented retry count and a delay from the following schedule:
| Retry # | Delay |
|---|---|
1 |
1 second |
2 |
1 second |
3 |
2 seconds |
4 |
3 seconds |
5 |
7 seconds |
6 |
30 seconds |
Configuration: RETRY_MESSAGE_DELAYS_SECONDS (default: [1, 1, 2, 3, 7, 30])
Service-Retryable Errors (Infinite Retry)
For infrastructure-level errors (HTTP 423, 429, 500, 502, 503, 504), the retry count is reset to 0 after exhausting the delay schedule. This means these errors are retried indefinitely — they represent transient infrastructure problems that are expected to resolve.
Message Expiry
Messages are considered expired when:
-
The retry count exceeds the number of configured delays and the error is not service-retryable, OR
-
The time since the original
eventTimeexceeds 36 hours (configurable viaEXPIRED_MESSAGE_AFTER_HOURS)
Expired messages are logged and dropped. If the expiry error is not a permanent or internal
normal error, it is also published to sending-mg-errors.
Error Classification
Retriable Errors
Errors that trigger the retry policy:
-
HTTP status codes:
423(Locked),429(Too Many Requests),500(Internal Server Error),502(Bad Gateway),503(Service Unavailable),504(Gateway Timeout) -
TopicPublishError— Pub/Sub publish failures -
ClientClosedError— gRPC client closed
Permanent Errors
Errors that are logged and dropped silently (no retry, no escalation):
-
HTTP status codes:
404(Not Found),410(Gone)
Code Location
-
Policy:
src/lib/error-handling/policy.ts -
Classification:
src/lib/error-handling/is-error-retriable.ts -
Custom errors:
src/lib/error-handling/errors.ts