Database
The service uses PostgreSQL for storing snapshots of segments as well as the calculated diffs. The whole service never stores any sensitive recipient data, only the internal contact ids are stored locally.
This document only contains an overview of the table structure, for the details please refer the the migrations. Changes to the database structure are automatically applied on [deployment](Deployment).
Structure
Registered Segments
All segments which have been queried through the service are available in the segments table. This table among other things contains info on which is the current version and when the segment was last used.
Segment Snapshots
In order to calculate the differences of a segment overtime snapshots are stored locally. Each snapshot is stored in its own table. The table consists of a single column containing the contact ids.
Clean Up Segments
It is not straight-forward to cleanup segments (e.g. from an inactive account) since multiple tables are involved are explained above. By using the maintenance which is running daily we can make our life easier:
-
Set
last_used_atinsegmentstable more than 32 days into the past,update segments set last_used_at=now() - interval '35 days' where customer_id=CUSTOMER_ID; -
Wait till maintenance is triggered by the Google Cloud or run it manually
Autovacuum
Reindex
If indexes become bloated it is good to occasionally reindex them. There are several ways to achieve this, see Resources section for more details.
'segment_diffs' table index
Using REINDEX command
REINDEX INDEX segment_diffs_segment_id_version_contact_id_key
Reindexing took 9 minutes on production with ~120 million rows in the table.
REINDEX command locks the table for writing and obtains an exclusive lock on the index, which will block reads that attempt to use that index.
|
Therefore:
-
REINDEXshould not be run while SDS maintenance or consumer are operating -
sync_worker`s in `ems-audience-exchangeshould be stopped whileREINDEXis running -
REINDEXwill block SELECT queries which try to fetch added and removed contacts (segment diff) from thesegment_diffstable.
If this is not acceptable, approach with concurrent indexes should be used instead. See Resources for more details.
Using CREATE INDEX CONCURRENTLY
Following steps are needed: - Create new index concurrently, i.e. no reads/writes/updates are blocked by the index creation - Create constraint from new index =⇒ now the table has two constraints on the same columns
2019-01-24
Bloat of index for segment_diffs table is ~9GB:
$ heroku pg:bloat HEROKU_POSTGRESQL_YELLOW --app ems-segment-diff
type | schemaname | object_name | bloat | waste
-------+------------+----------------------------------------------------------------+-------+------------
index | public | segment_diffs::segment_diffs_segment_id_version_contact_id_key | 2.0 | 9174 MB
table | public | segment_diffs | 1.0 | 345 MB
374,534,455 rows in the table (select count(*) from segment_diffs)
Turn off consumers so that the concurrent index creation can be faster.
create unique index concurrently segment_diffs_segment_id_version_contact_id_key_20190123 on segment_diffs (segment_id, version, contact_id);
alter table segment_diffs add constraint segment_diffs_segment_id_version_contact_id_key_20190123 unique using index segment_diffs_segment_id_version_contact_id_key_20190123;
alter table segment_diffs drop constraint segment_diffs_segment_id_version_contact_id_key;
ALTER TABLE segment_diffs RENAME CONSTRAINT segment_diffs_segment_id_version_contact_id_key_20190123 TO segment_diffs_segment_id_version_contact_id_key;
Bloat of index afterwards:
type | schemaname | object_name | bloat | waste
-------+------------+----------------------------------------------------------------+-------+------------
index | public | segment_diffs::segment_diffs_segment_id_version_contact_id_key | 1.3 | 1086 MB
Issues
-
[SDS: REINDEX "segment_diffs" table](https://emarsys.jira.com/browse/EDREI-34984)
-
[SDS alerts](https://emarsys.jira.com/browse/SUITEDEV-16160)