Failing jobs
Inbox has three cron jobs running different database maintenance work.
Partition creator
Triggers Mon - Thu at 13:00 (Vienna Summer Time/CEST). Will try to create weekly partitions for the upcoming week in the next month (in 4 weeks time) for inbox_messages and inbox_tags tables.
In case this job fails it should be checked what was the cause on database in any case. Some steps to keep in mind:
-
if partition was not created job can be triggered manually or checked again tomorrow for another run
-
there is default partition so in any case no data will ever be lost
If for any reason the job is not able to run, you can create a partition manually by running the following SQL query:
-- inbox_messages
BEGIN;
CREATE TABLE inbox_messages_2026_06
PARTITION OF inbox_messages
FOR VALUES FROM ('2026-02-02') TO ('2026-02-09');
CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS
inbox_messages_2026_06_contact_id_run_id_unique
ON inbox_messages_2026_06
(contact_id, source_run_id);
COMMIT;
-- inbox_tags
BEGIN;
CREATE TABLE inbox_tags_2026_06
PARTITION OF inbox_tags
FOR VALUES FROM ('2026-02-02') TO ('2026-02-09');
COMMIT;
Mind to update the partition name and the date boundaries accordingly.
Just keep in mind that should we have already run out of weekly partitions, the DB would have started writing data into the default partition. This data will need to be moved into the newly created partition too if possible.
Partition deleter
Triggers every Monday at 14:00 (Vienna Summer Time/CEST). Will try to delete partition that have expired (6 months old at this time) for inbox_messages and inbox_tags tables.
In case this job fails it should be re-triggered manually (or otherwise manually delete partitions) and checked what caused it to fail.
If for any reason the job is not able to run, you can delete a partition manually by running the following SQL query:
-- inbox_messages BEGIN; DROP TABLE IF EXISTS inbox_tags_2024_01 CASCADE; COMMIT; -- inbox_tags BEGIN; DROP TABLE IF EXISTS inbox_messages_2024_01 CASCADE; COMMIT;
Mind to update the partition name and the date boundaries accordingly, should you decide to run these manually.
V3 maintenance
Triggers every day at 10:00. Will try to delete expired messages based on expired_at date and as well overflowing messages (> 25 or >100 per contact).
In case this job fails it should be checked what caused the issue and create JIRA ticket for fixing it. Does not need to be re-triggered manually.
Tag Updater
In case of too many messages and deployment is already autoscaled to maximum amount of workers you should see if there is some error we are not handling otherwise more workers can be added manually.
kubectl scale --replicas=<value> deployment/your-deployment-name -n your-namespace
Recalling a campaign in inbox service
In case of customer facing issue campaign can be recalled in inbox service alone by adding new row containing multichannel_id and customer_id to campaign_recall table in inbox PG instance.
Handling long running transactions or queries on the inbox database
Long running queries can cause issues on the database when we run critical operations, such as adding or dropping partitions. It can cause the database to lock up and become unresponsive.
Log into the affected database using your favourite tool, and analyse which query is doing this. You can use the following query to find all the queries that are currently running for longer than 5 minutes on the database:
SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes' and state != 'idle';
(if you see nothing try remove the state clause, or ensure you are connected to the correct database)
Determine which query is causing the issue. Determine whether this query should be running or not. (eg. think - maintenances might take longer. But if it is a regular SELECT query, it should never take longer than a couple of seconds). While this should not happen, we have observed queries getting stuck in the database before. If you have determined, this query should indeed not be running, you can cancel it using the following command.
SELECT pg_cancel_backend(pid);
(Where pid is the process id of the query you want to cancel, determined in the previous query).
Should you find this command cannot cancel the query, you can try to terminate the query using the following command:
SELECT pg_terminate_backend(pid);
Should this also not work, the last option is to restart the Database. See the next section on how to restart the databases safely.
Restarting the database
You can restart the primary database in the Google Cloud Console.
It is important that you shovel all the messages that went into the inbox-errors-sub pubsub subscription, after the database is back up and running. This is because the inbox-service will not be able to process these messages while the database is restarting. Make sure you do this right after the database restarts to avoid delaying our customer’s messages unnecessarily.
Use your favourite tool to do so. Here are the commands on how to do it using me-cli:
me-cli pubsub subscriptions dump # will dump all messages from the subscription into a local file
me-cli pubsub publish -a -f /tmp/dump-inbox-errors-sub-1731848007-- # -a for auto select topic, -f for file, ensure you use the correct file from the dump
me-cli pubsub subscriptions purge # will delete the shovelled messages from the error-sub
(If prompted, select the inbox-errors-sub subscription)