Error Handling

Workers

inapp-audience-contact-deletion

  • Check inapp-audience.contact-deletion Pub/Sub subscription.

  • Check me-inapp-audience-contact-deletion horizontal pod autoscaler (HPA) in k9s. Maybe maximum number of pods is not enough to keep up with the number of messages in the subscription. In that case edit the HPA and increase maxReplicas value.

  • Check logs in Kibana for warning and errors. Maybe there are many errors, or workers are stuck and are not able to make progress.

Error Queues

All queues related to inapp have an error which messages that can’t be be processed end up in. These queues are always supposed to be empty and they are being monitored and will trigger an alert if this is not the case.

When inspecting the message in the AdminUI of RabbitMQ you should see the cause of the error.
Any message can be retried by moving the messages back into the original queue which is the same as the error queue without the -error suffix. Use the Move Messages functionality in the RabbitMQ Admin UI.

batch-inapp-error

Failed writing contact-ids from a segment to an in-app audience.

tx-inapp-error

Failed writing contact-ids from an RTI program to an in-app audience. This is typically time critical and should be handled as quick as possible.

inapp-remove-error

Failed removing contact-ids from an in-app audience. Request comes from either an RTI-program or from audience update. This is typically time critical and should be handled as quick as possible.

inapp-update-audience-error

The update of an audience failed with an unexpected error. When inspecting the message in the admin ui of rabbit-mq you should see the cause of the error.

In-app audiences are updated on a daily basis so this is not so urgent.

inapp-init-audience-error

The init of an audience failed with an unexpected error. When inspecting the message in the admin ui of rabbit-mq you should see the cause of the error.

inapp-cleanup-campaign-error

The cleanup of a campaign has failed in an unexpected way. The message contains the customerId and the campaignId along with the error message.

This is just maintenance and not urgent.

init-audience-sxi

  • Scale up me-inapp-init-audience-sxi deployment in k9s or similar tool

  • Check logs for recurring errors

Known errors

decryption failed or bad record

When you see an error with "decryption failed or bad record" then know, this will be retried. If it came from the UI the customer will see an error message and retry manually. Here is an example error message:

Application: me-inapp-web (gap-production)
Error Name: Gap Mobile Engage Application Errors Node.JS
Error Code: GENERIC_ERROR
Error Message: with "campaign_priorities" as (
          SELECT
            id AS cp_id,
            ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY rank DESC) AS priority
          FROM campaign
          WHERE customer_id = $1
        ) select "dbid", "id", "customer_id", "application_id", "application_code", "type", "name", "status", "source", "segment_id", "trigger_type", "trigger_id", "device_filter", "event_attributes_filter", "rank", "exit_on_click", "min_interval_between_clicks", "max_impressions", "min_interval_between_impressions", "personalized", "created_at", "updated_at", "launched_at", "scheduled_at", "started_at", "ended_at", "paused_at", "canceled_at", "cleaned_up_at", "default_language", "time_zone", "triggerable_by_push", "version", "update_from_segment", "on_event_actions", "sync_interval_in_hours", "business_area_id", steps IS NOT NULL AS is_multi_step,
        CASE WHEN (
          ended_at IS NULL OR
          NOW() <= (ended_at AT TIME ZONE (CASE WHEN time_zone IS NULL THEN 'Pacific/Pago_Pago' ELSE time_zone END))
        ) AND canceled_at IS NULL
        THEN false ELSE true END
        AS is_archived
      , "campaign_priorities".* from "campaign" inner join "campaign_priorities" on "campaign"."id" = "campaign_priorities"."cp_id" where "customer_id" = $2 order by "rank" desc - 40B2858DCB7D0000:error:0A000119:SSL routines:tls_get_more_records:decryption failed or bad record mac:../deps/openssl/openssl/ssl/record/methods/tls_common.c:869:
fastify-error-handler