knowledge is power

0%

Global Index, benchmark with pgbench

Featured image

1. Overview

Followed my previous blog, Global Index, a different approach, we posted our initial Global Unique Index POC to Postgres community for open discussion about this approach. Now, this blog is trying to explain how the benchmark was performed using pgbench on this initial Global Unique Index POC.

2. Build global index

Before running the performance benchmark tests, here is steps about how to build Global Unique Index from the initial POC.

clone the repository from github,

git clone https://github.com/postgres/postgres.git

checkout a particular commit to apply global index patches

git check 85d8b30724c0fd117a683cc72706f71b28463a05 -b global-index

apply global index patches from POC

git apply 0001-support-global-unique-index-with-non-partition-key.patch
git apply 0002-support-global-unique-index-create.patch
git apply 0003-support-global-unique-index-attach-and-detach.patch
git apply 0004-support-global-unique-index-insert-and-update.patch

configure, compile and install

./configure –prefix=/media/disk1/globalindex –enable-tap-tests –enable-debug “CFLAGS=-g3 -O0 -fno-omit-frame-pointer” “CC=gcc -std=gnu99”
./make clean && make && make install

3. Test with pgbench

Our Global Unique Index is designed to address the requirement to access a partitioned table using non-partition column with an unique index. In another words, users can build a cross-partition (globally) unique index without partition-key for select, insert, update and delete.

The goal is to test the performance using pgbench, however, only aid in pgbench_accounts is designed to be unique, and it is also used as partition-key. To test unique with pgbench, we prepare the data with below 5 steps,

1) initiate the data without partition;
2) recreate four pgbench tables after the original tables has renamed;
3) create partitions using bid as partition-key;
4) attach all partitions to pgbench_accounts table;
5) import the data from renamed tables correspondingly.
Here is a simple script to achieve the above 5 steps, cat init-gi.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/bash

## default scale, 100
SCALE=${1:-100}

## 1,000,000 tuples per partition
PART=$(( ${SCALE}/10 ))

## pgbench init
pgbench -i -s $SCALE postgres

## backup tables and data
psql -d postgres -c "alter table pgbench_accounts rename to pgbench_accounts_bak; alter table pgbench_branches rename to pgbench_branches_bak; alter table pgbench_history rename to pgbench_history_bak; alter table pgbench_tellers rename to pgbench_tellers_bak;"

## create partitions using bid as partition key so that aid can be used for global unique index test
psql -d postgres -c "CREATE TABLE public.pgbench_accounts (aid integer NOT NULL, bid integer, abalance integer, filler character(84)) PARTITION BY RANGE (bid);"
for ((i=0; i<$PART; i++))
do
psql -d postgres -c "CREATE TABLE public.pgbench_accounts_$i (aid integer NOT NULL, bid integer, abalance integer, filler character(84));"
psql -d postgres -c "ALTER TABLE ONLY public.pgbench_accounts ATTACH PARTITION public.pgbench_accounts_$i FOR VALUES FROM ($(( $i*10+1 ))) TO ($(( ($i+1)*10+1 )));"
done

## add default partition
psql -d postgres -c "CREATE TABLE public.pgbench_accounts_default(aid integer NOT NULL, bid integer, abalance integer, filler character(84));"
psql -d postgres -c "ALTER TABLE ONLY public.pgbench_accounts ATTACH PARTITION public.pgbench_accounts_default FOR VALUES FROM ($(( $i*10+1 ))) TO (MAXVALUE);"

## recreate other tables
psql -d postgres -c "CREATE TABLE public.pgbench_branches (bid integer NOT NULL, bbalance integer, filler character(88)); CREATE TABLE public.pgbench_history (tid integer, bid integer, aid integer, delta integer, mtime timestamp without time zone, filler character(22)); CREATE TABLE public.pgbench_tellers (tid integer NOT NULL, bid integer, tbalance integer, filler character(84));"

## import data from backup
psql -d postgres -c "insert into pgbench_accounts select * from pgbench_accounts_bak; insert into pgbench_branches select * from pgbench_branches_bak; insert into pgbench_history select * from pgbench_history_bak; insert into pgbench_tellers select * from pgbench_tellers_bak;"

Where, the only input parameter SCALE is the scale for pgbench to initialize data. The internal variable PART is used to manage how many tuples per partition. As pgbench generates 100,000 records for one scale, so we use scale divided by 10 to load 1,000,000 records to each partition. It can always be changed to different values according to the benchmark requirement. Here, we simply set it to 10 and load 1 million records to each partition. Therefore, the number of partitions is equal to scale/10 + 1 (a default partition is added).

To compare the queries performance, we run the performance in below 3 scenarios,

1) query data without any index, randomly query a record from many partitions using a non-partition column without index
pgbench -n -c8 -j8 -T60 postgres -b select-only

2) query data with regular unique index, randomly query a record from many partitions using a non-partition column with unique index (Postgres existing feature, to build such an unique index, the partition-key has to be involved).

1
2
psql -d postgres -c "create unique index local_idx_pgbench_accounts_aid on pgbench_accounts (bid, aid);"
pgbench -n -c8 -j8 -T60 postgres -b select-only

3) query data with global unique index, randomly query a record from many partitions using a non-partition column with global unique index which was built on this non-partition column.

1
2
psql -d postgres -c "create unique index global_idx_pgbench_accounts_aid on pgbench_accounts using btree(aid) global;"
pgbench -n -c8 -j8 -T60 postgres -b select-only

With the setup above, you can experience select-only, simple-update and tpcb-like using pgbench built-in script with different numbers of partitions. For example, to compare the performance among no index, regular index, and global index with 100 partitions for select-only and tpcb-like, we can run below commands,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# no index
./init-gi.sh 100
pgbench -n -c8 -j8 -T5 postgres -b select-only | grep tps
pgbench -n -c8 -j8 -T5 postgres -b tpcb-like | grep tps

# regular unique index
./init-gi.sh 100
psql -d postgres -c "create unique index local_idx_pgbench_accounts_aid on pgbench_accounts (bid, aid);"
pgbench -n -c8 -j8 -T5 postgres -b select-only | grep tps
pgbench -n -c8 -j8 -T5 postgres -b tpcb-like | grep tps

# global unique index
./init-gi.sh 100
psql -d postgres -c "create unique index global_idx_pgbench_accounts_aid on pgbench_accounts using btree(aid) global;"
pgbench -n -c8 -j8 -T5 postgres -b select-only | grep tps
pgbench -n -c8 -j8 -T5 postgres -b tpcb-like | grep tps

4. Performance benchmark

Here is the performance benchmark charts after repeated the tests above for 1, 10, 100, and 1000 partitions.

As you can see, with this approach: keep storage separately but enforcement uniqueness check across all of a partitioned table’s indexes, the performance is related with the number of partitions. There are probably some parts we potentially can look into, such as 1) using cache to optimize the disk I/O usage; 2) find out a way to avoid unnecessary partitions check, etc.

5. Summary

In this blog, I explained how the performance benchmark was performed on the initial Global Unique Index POC for random select and update, and I hope it can help analyze the good or bad of this POC.