Investigations

Investigations were done end of August 2024.

Query

SELECT
  campaign_id,
  rti_program_runs,
  ac_program_runs
FROM
  audience_data.campaign_contact_reference_master
WHERE
  contact_reference = $1
  AND campaign_id = ANY($2)

Partitions

Currently, ~6000 partitions partitioned by campaign_id

Would be ~400 partitions partitioned by application_code

Query Plan

query plan audience data

Postgres has to do a lot of index scans which needs more IO.

Examples

Let’s compare a query with one campaign and a query with five campaigns to see how the number is index scans is growing:

One Campaign

inapp=> explain (analyze)
select campaign_id,
       rti_program_runs,
       ac_program_runs
from audience_data.campaign_contact_reference_master
where contact_reference = 'cr:c3bdcb49eb841c89:eu-central-1' and campaign_id = any (array [2480015]);
                                                                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using campaign_contact_reference_2480015_contact_reference_idx on campaign_contact_reference_2480015 campaign_contact_reference_master  (cost=0.42..8.44 rows=1 width=138) (actual time=0.039..0.041 rows=1 loops=1)
   Index Cond: ((contact_reference)::text = 'cr:c3bdcb49eb841c89:eu-central-1'::text)
   Filter: (campaign_id = ANY ('{2480015}'::integer[]))
 Planning Time: 0.155 ms
 Execution Time: 0.108 ms----

Five Campaigns

inapp=> explain (analyze)
select campaign_id,
       rti_program_runs,
       ac_program_runs
from audience_data.campaign_contact_reference_master
where contact_reference = 'cr:c3bdcb49eb841c89:eu-central-1' and campaign_id = any (array [2480015, 2480016, 2463849, 2463897, 2465950]);
                                                                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.41..162.27 rows=5 width=113) (actual time=0.087..0.098 rows=1 loops=1)
   ->  Index Scan using campaign_contact_reference_2463849_contact_reference_idx on campaign_contact_reference_2463849 campaign_contact_reference_master_1  (cost=0.41..8.44 rows=1 width=125) (actual time=0.030..0.030 rows=0 loops=1)
         Index Cond: ((contact_reference)::text = 'cr:c3bdcb49eb841c89:eu-central-1'::text)
         Filter: (campaign_id = ANY ('{2480015,2480016,2463849,2463897,2465950}'::integer[]))
   ->  Index Scan using campaign_contact_reference_2463897_contact_reference_idx on campaign_contact_reference_2463897 campaign_contact_reference_master_2  (cost=0.41..8.43 rows=1 width=125) (actual time=0.009..0.009 rows=0 loops=1)
         Index Cond: ((contact_reference)::text = 'cr:c3bdcb49eb841c89:eu-central-1'::text)
         Filter: (campaign_id = ANY ('{2480015,2480016,2463849,2463897,2465950}'::integer[]))
   ->  Index Scan using campaign_contact_reference_2465950_contact_reference_idx on campaign_contact_reference_2465950 campaign_contact_reference_master_3  (cost=0.40..128.49 rows=1 width=34) (actual time=0.008..0.009 rows=0 loops=1)
         Index Cond: ((contact_reference)::text = 'cr:c3bdcb49eb841c89:eu-central-1'::text)
         Filter: (campaign_id = ANY ('{2480015,2480016,2463849,2463897,2465950}'::integer[]))
   ->  Index Scan using campaign_contact_reference_2480015_contact_reference_idx on campaign_contact_reference_2480015 campaign_contact_reference_master_4  (cost=0.42..8.44 rows=1 width=138) (actual time=0.039..0.039 rows=1 loops=1)
         Index Cond: ((contact_reference)::text = 'cr:c3bdcb49eb841c89:eu-central-1'::text)
         Filter: (campaign_id = ANY ('{2480015,2480016,2463849,2463897,2465950}'::integer[]))
   ->  Index Scan using campaign_contact_reference_2480016_contact_reference_idx on campaign_contact_reference_2480016 campaign_contact_reference_master_5  (cost=0.42..8.44 rows=1 width=140) (actual time=0.010..0.010 rows=0 loops=1)
         Index Cond: ((contact_reference)::text = 'cr:c3bdcb49eb841c89:eu-central-1'::text)
         Filter: (campaign_id = ANY ('{2480015,2480016,2463849,2463897,2465950}'::integer[]))
 Planning Time: 0.627 ms
 Execution Time: 0.176 ms

Partitioning by Application

Table

create table application_contact_reference_master (
    application_id integer not null,
    campaign_id       integer      not null,
    contact_reference varchar(255) not null,
    rti_program_runs  program_run[],
    ac_program_runs program_run[]
) partition by list (application_id);
create table alex_audience_data_investigation.application_contact_reference_398
    partition of alex_audience_data_investigation.application_contact_reference_master for values in (398);

Index

create unique index on alex_audience_data_investigation.application_contact_reference_398 (contact_reference, campaign_id);

Copying Data

insert into alex_audience_data_investigation.application_contact_reference_398 (application_id, campaign_id, contact_reference, rti_program_runs, ac_program_runs)
select 398 as application_id, campaign_id, contact_reference, rti_program_runs, ac_program_runs
from audience_data.campaign_contact_reference_2480015;

Also, audience data for campaigns 2480016, 2463849, 2463897, 2465950 (the five campaigns from above).

Explain Analyze

inapp=> explain (analyze)
select campaign_id,
       rti_program_runs,
       ac_program_runs
from alex_audience_data_investigation.application_contact_reference_master
where application_id = 398 and contact_reference = 'cr:c3bdcb49eb841c89:eu-central-1' and campaign_id = any (array [2480015, 2480016, 2463849, 2463897, 2465950]);
                                                                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using application_contact_reference_contact_reference_campaign_id_idx on application_contact_reference_398 application_contact_reference_master  (cost=0.42..8.45 rows=1 width=130) (actual time=0.025..0.026 rows=1 loops=1)
   Index Cond: ((contact_reference)::text = 'cr:c3bdcb49eb841c89:eu-central-1'::text)
   Filter: ((application_id = 398) AND (campaign_id = ANY ('{2480015,2480016,2463849,2463897,2465950}'::integer[])))
 Planning Time: 0.130 ms
 Execution Time: 0.069 ms

The estimated total cost is significantly lower: 8 < 162 (query with five campaigns).

Pros and Cons

Pros:

  • less query cost, i.e. smaller Postgres instance is sufficient

  • fewer partitions, i.e. no problem anymore with max_locks_per_transaction

Cons:

  • autovacuum far more important (but has fewer tables to process)

Queries per Second

130,252,959 queries in the last 7 days ⇒ 215 queries/second. Peak in the last days is ~350 queries/second.

Combined with the mean of campaigns in the audience membership query of 20 the database does 7k index scans/second.

When partitioning by application it would be just 215 index scans/second.