1. Overview
Followed my previous blog, Global Index, a different approach, we posted our initial Global Unique Index POC to Postgres community for open discussion about this approach. Now, this blog is trying to explain how the benchmark was performed using pgbench on this initial Global Unique Index POC.
2. Build global index
Before running the performance benchmark tests, here is steps about how to build Global Unique Index from the initial POC.
clone the repository from github,
git clone https://github.com/postgres/postgres.git
checkout a particular commit to apply global index patches
git check 85d8b30724c0fd117a683cc72706f71b28463a05 -b global-index
apply global index patches from POC
git apply 0001-support-global-unique-index-with-non-partition-key.patch
git apply 0002-support-global-unique-index-create.patch
git apply 0003-support-global-unique-index-attach-and-detach.patch
git apply 0004-support-global-unique-index-insert-and-update.patch
configure, compile and install
./configure –prefix=/media/disk1/globalindex –enable-tap-tests –enable-debug “CFLAGS=-g3 -O0 -fno-omit-frame-pointer” “CC=gcc -std=gnu99”
./make clean && make && make install
3. Test with pgbench
Our Global Unique Index is designed to address the requirement to access a partitioned table using non-partition column with an unique index
. In another words, users can build a cross-partition (globally) unique index without partition-key for select, insert, update and delete.
The goal is to test the performance using pgbench, however, only aid
in pgbench_accounts
is designed to be unique, and it is also used as partition-key. To test unique with pgbench, we prepare the data with below 5 steps,
1) initiate the data without partition;
2) recreate four pgbench tables after the original tables has renamed;
3) create partitions using bid
as partition-key;
4) attach all partitions to pgbench_accounts table;
5) import the data from renamed tables correspondingly.
Here is a simple script to achieve the above 5 steps, cat init-gi.sh
1 | #!/bin/bash |
Where, the only input parameter SCALE
is the scale for pgbench to initialize data. The internal variable PART
is used to manage how many tuples per partition. As pgbench generates 100,000
records for one scale, so we use scale divided by 10 to load 1,000,000
records to each partition. It can always be changed to different values according to the benchmark requirement. Here, we simply set it to 10 and load 1 million records to each partition
. Therefore, the number of partitions
is equal to scale/10 + 1
(a default partition is added).
To compare the queries performance, we run the performance in below 3 scenarios,
1) query data without any index, randomly query a record from many partitions using a non-partition column without index
pgbench -n -c8 -j8 -T60 postgres -b select-only
2) query data with regular unique index, randomly query a record from many partitions using a non-partition column with unique index (Postgres existing feature, to build such an unique index, the partition-key has to be involved).
1 | psql -d postgres -c "create unique index local_idx_pgbench_accounts_aid on pgbench_accounts (bid, aid);" |
3) query data with global unique index, randomly query a record from many partitions using a non-partition column with global unique index which was built on this non-partition column.
1 | psql -d postgres -c "create unique index global_idx_pgbench_accounts_aid on pgbench_accounts using btree(aid) global;" |
With the setup above, you can experience select-only
, simple-update
and tpcb-like
using pgbench built-in script with different numbers of partitions. For example, to compare the performance among no index
, regular index
, and global index
with 100 partitions for select-only
and tpcb-like
, we can run below commands,
1 | # no index |
4. Performance benchmark
Here is the performance benchmark charts after repeated the tests above for 1, 10, 100, and 1000 partitions.
As you can see, with this approach: keep storage separately but enforcement uniqueness check across all of a partitioned table’s indexes, the performance is related with the number of partitions. There are probably some parts we potentially can look into, such as 1) using cache to optimize the disk I/O usage; 2) find out a way to avoid unnecessary partitions check, etc.
5. Summary
In this blog, I explained how the performance benchmark was performed on the initial Global Unique Index POC for random select and update, and I hope it can help analyze the good or bad of this POC.