Campaign History in BigQuery

Each In-app campaign change made by customers through UI, or by automatic workers running in the background, is stored in BigQuery with complete data snapshot of the campaign at the time of change, and the event timestamp.

This allows us to use BigQuery to go back in time, and see the exact state of customer’s campaigns at a particular date and time in the past, which can be especially useful for handling support tickets.

Data in BigQuery is stored in ems-mobile-engage project, and inapp_campaigns dataset.

Architecture

Campaign History BigQuery Architecture

Explanation

Campaign change events which are dispatched from Campaigns model are published to Pub/Sub for asynchrounous processing by a separate worker. Decision to process campaign change events asynchronously was done in order to keep the request processing time as low as possible.

For example, when a customer creates a new In-app campaign, we put it on top of the campaign list by setting its priority to 1. We also update priorities of all other campaigns by lowering their priority numbers by one. This means that we have to publish campaign change event to BigQuery for each campaign that customer has. But to do so, we would need to fetch all campaign fields from the DB for all customers' campaigns.

To avoid doing that, workers (those which modify campaign table, e.g. web worker) simply publish messages to Pub/Sub containing campaignId of a campaign which was touched in the DB. Only in case of campaign deletion, we publish the campaign content in the message.

Then a separate worker (campaign-big-query-publish) picks up those messages, fetches the complete campaign data from the DB for the given campaignId, and publishes the data to BigQuery (via Pub/Sub and Dataflow).

Drawback

It could happen that there were two or more update events for the same campaign in the very short time span (e.g. customer pressing Save button in UI for every small change). Since these events are processed asynchronously by the worker, when it fetches the campaign content from the DB, it could happen that it only sees the last change, and we miss the changes which happened in between.

This is the trade-off that we accept, and shouldn’t pose a significant problem when looking back in time during customer support handling.