Miscellaneous Alerts not Covered by Other Sections

PagerDuty AWS Dynamo Alerts

In the description of the alert, you can figure out if it is an AWS Dynamo Alert. In this case, please have a look at the alert description

[SDS] SDS error on ems-segment-diff

If this error occurs the laas log for segment-diff service has to be examined.

Error log entries can be found by filtering for level: ERROR

Sometimes a temperary error to get a data base connection occurs:

exception:  java.sql.SQLTransientConnectionException: slick.dbs.default.db - Connection is not available, request timed out after 3001ms.
            ...

If the error only occurs once but does not persist, it can just be resolved.

update of bq table push_token_states failed

There are 2 things which could have failed:

  1. the update of the states table ems-mobile-engage.segmentation_states.push_token_states`in BigQuery which is done by scheduled query `push-token-states-updater, run every day at 2:00 AM UTC

  2. the script which checks if the new partition of the states table exists failed, which is server/processes/scripts/run-check-bq-push_token_states-update.js in push-notification-service, scheduled every day at 10 UTC

To find out if the script has run correctly, check the history of the scheduled query in BigQuery. You can additionally check if the new partition by issueing (with date set to today)

SELECT COUNT(*) AS num
FROM `ems-mobile-engage.segmentation_states.push_token_states`
WHERE DATE(_PARTITIONTIME) = CURRENT_DATE() -- today

Run this query also for yesterday(DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) , the number of today should be roughly the same like yesterday.

Additionally there should be some entries with event time of yesterday in the partition of today:

If the scheduled query ran correctly and the new partition exists with roughly the same entries as yesterday (correct the dates:

SELECT COUNT(*) AS new_num
FROM `ems-mobile-engage.segmentation_states.push_token_states`
WHERE DATE(_PARTITIONTIME) = CURRENT_DATE() -- today
AND event_time > TIMESTAMP(DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) -- yesterday

If new_num is also reasonable, the scheduled query run perfectly and you should check why the check script in heroku scheduler didn’t run.

  1. If the script didn’t run (or the number of entries today seems suspicious), you can remove todays partition by issuing in BIgQuery CLI (with correct date of today instead of 20200309)

bq rm ems-mobile-engage.segmentation_states.push_token_states$20200309

and rerun the scheduled query and save the result to the states table (with correct date in the sql file (if it is not for today) and in the destination_table parameter) :

cat tools/gcloud/bigquery/push-token-states/pushtoken_states_updater.sql \
 | bq query --destination_table ems-mobile-engage.segmentation_states.push_token_states$2020039 --use_legacy_sql=false --replace=true

After the push_token_states table is corrected, you must rerun the scheduled query for contact_states table as described below. Additionally you have to rerun the scheduled query for kpi_summary table as described below.