Caching Handling in Device Event Service

Overview

The device event service (DES) needs to make sure that following types of entities are up to date:

Active In-app campaigns

the list of active campaigns is cached and needs to be updated if a campaign changes status.

Non-terminal campaign ids

set of non-terminal campaign ids is cached and needs to be updated if a campaign changes status.

Message-content

static content for a specific campaign is cached and needs to be updated if the content of a campaign changes.

Campaign details

campaign details without the content which is needed for push2inapp to determine if the content is personalized.

Applications

the applications used are also cached and even if they don’t change much we still need to update the cache if an app is changed in any way.

First Level Cache (in-memory)

The first level cache is stored in-memory, i.e. each web pod has it’s own cache.

Second Level Cache (Redis)

Only active In-app campaigns, non-terminal campaign ids and applications are currently stored in the second level cache (Redis). If there is a cache-miss in the first level cache the second level cache is used before querying the database (PostgreSQL).

Problem With Current Behaviour

Right now each instance of the device event service listens to entity events and invalidates the cache when an app or campaign are modified. When very many campaigns are modified at once this leads to significant load on the DB and sometimes to the service being unavailable for short time.

New Concept Approach 1

We would change the responsibility of the management of the cache to a single worker instead of every process. One worker would manage the Redis cache and the in-memory cache would use such a low expiry that it won’t need to be invalidated apart from that.

This would remove the need for the processes themselves to subscribe to the entity events and we could have a single worker consuming the events and making sure that the cache is up to date.

The worker could also do more than just invalidating the cache, it could pre-fill it with the updated value so that the web workers won’t have any cache misses.

Modified Entity Properties

In order to decide whether specific caches needs to be updated or not we could add a list of properties that were modified of an entity when sending the entity event.

Batched Processing of Events

To make processing and more specifically the queries against the DB more efficient we could batch events so that the change of priority of all the campaigns of an app would not lead to a large number of queries.

Entity Events in PubSub

This could also be a good time to start publishing the events to a pubsub topic. This would more inline with our current infrastructure.

New Concept Approach 2

Approach 1 uses time-to-live in the in-memory cache to keep it fresh and does not use active cache invalidation of the in-memory caches which has following constraints:

  • If the time-to-live is low the load to Redis increases

  • If the time-to-live is too high consumers might see campaign content which is already outdated

Therefore approach 2 actively invalidates the in-memory caches after the Redis cache has been refreshed:

Concept Cache Invalidation

Pub/Sub Subscriptions

Every web pod needs a subscription to have the same effect as the fanout exchange in RabbitMQ. Each pod has to create the subscription on startup and delete it on shutdown. The subscription name can be created in a similar way as for the RabbitMQ queues, e.g.

${prefix}-des-web-${uuid}

Subscription expiration period can be used to ensure the subscription is cleaned up even when the deletion of the subscription fails during shutdown of the pod.

Dedicated Events

In In-app there is a global priority across all campaigns. What actually matters is the priority scoped per application since every request comes from a certain application. I.e. there are two cases:

  • A new campaign is created an gets priority 1 and the priority of all other campaigns is increased, i.e. The relative priority changes just for the application of the new campaign. But since the a new campaign is in design the priority of active campaigns is not affected.

  • When changing priorities in the UI multiple campaigns with different applications can be affected, i.e.the cache has to be invalidated for the applications which change priority since the relative priority for the unaffected applications stays the same.

With dedicated events for update of priorities including the affected applications DES can clear the affected caches of active campaigns.

Enrich Events With Campaign Properties

Currently the CampaignEventHandler does a PostgreSQL query when the entity event is handled. The properties needed are:

  • status to ignore campaigns which are in design

  • applicationId to invalidate the caches just for that application id

By providing these fields in the event these PostgreSQL queries can be avoided. Additional fields could be provided during the implementation to skip invalidation for caches which are not affected by the event.

Second Level For All Caches

Currently active campaigns and non-terminal campaign ids use a second level cache. When web pods need to serve the message-content for the same campaign it results in two queries to PostgreSQL since these are stored only in the first level in-memory cache. I.e. to further reduce the number of queries to PostgreSQL the message-content and campaign details could also be stored in Redis.