Contact-Id based Audiences

Motivation

Currently the audiences in in-app are based on contact-references. The rest of Emarsys does not deal with contact-references but instead uses contact-ids which means that we need to translate back and forth between contact-ids and contact-references.

This translation is done using a mapping table in dynamodb which leads to a bottle neck since we can only do batch lookups on the primary index.

Needed Changes

Database Changes

The Audience Contact References Table Structure concept shows the current database structure of the audiences. The new structure basically would be the same just containing a contact_id column instead of a contact_reference column.

Client Registration

Since we’re not guaranteed to have the contact-id available for the client when setting the contact via the API we need to make sure we get the contact-id as soon as possible. The contact-id is stored in the contact-token which is created by the me-client-service and this token has a expiration time. If we let this expiration time depend on if we have a contact-id available we could minimise the time without a contact-id.

Once the contact-token expires it will refreshed against via the API and when doing this it will find the contact-id if the contact was successfully logged in.

Adding/Removing contacts to/from an Audience

When adding/removing contacts to/from we don’t need to look up the contact-references anymore but can simply use the contact-id directly which allows us to simplify the workers and even remove the one worker which does the lookup.

Compared to the previous which would not target a contact that was not logged in when the contact was added or the audience was initialized from a segment. The new approach would include all contacts even if they logged in later.

Migration of existing audiences

Since we’re changing the database structure and we have long running campaigns we will need to migrate at least some tables.

Migrating all audiences would cause a huge increase of storage usage. One way of avoiding this could be to have both table structures for a certain time (e.g. 1-2 months) and after that migrate all the remaining tables in the old structure.

We would need to create the master tables for all existing customers and for all new campaigns the audiences would be created in the new structure.

In 2 months (2023-09-05) there would be 73 audience based still active and the majority of have less than 1000 contacts with a handful of audiences with more than a million. This of course subject to change in the future.

Querying Audience membership in DES

The query in DES would be very similar to now, the only difference would be that the new contact-id column would be used.

During the time that both table structures are present both tables will need to queried. After the migration the part using the campaign_contact_reference_master can be removed.

SELECT campaign_id, rti_program_runs, ac_program_runs
FROM audience_data.campaign_contact_reference_master
WHERE contact_reference = $contactReference
AND campaign_id = ANY(${campaignIds})
UNION ALL
SELECT campaign_id, rti_program_runs, ac_program_runs
FROM audience_data.campaign_contact_id_master
WHERE contact_id = $contactId
AND campaign_id = ANY(${campaignIds})
In DES we could lookup the contact-id from Dynamo in the time where it’s not available due to the contact-token not being refreshed. Based on logs this is quite rare.