1. Overview
A few years ago, there was a proposal about adding the global index support to PostgreSQL for partitioned table. Following that proposal, there were many discussions and also an initial version POC to demonstrate the possibility, the technical challenges and the potential benefits, etc. However, the global index feature is still not available in PostgreSQL official release yet. This blog is trying to share a different approach and hopefully it is can to be implemented to benefit some users.
2. Basic global index requirement
There are many reasons to have this global index to be available on Postgres partitioned table, such as the performance enhancement for read-only queries across multiple partitions, the uniqueness of using non-partition key as index across multiple partitions, etc. For example, if users find out a table grow too fast and they have to split this table into partitions based on key1 while the applications are using key2 to retrieve data. In this case, with the global index available, they potentially can avoid the unnecessary changes to the applications.
3. Different approaches
To address this global
index requirement, one POC in PostgreSQL community discussion was trying to store the global index in a single index relation. In this way, it definitely will have better performance as all the index tuples are stored in one index relation. However, one issue is that it will encounter the physical size limitation on one relation file. Another issue is that each detach needs to clean up this single global index relation and it is kind of violating one of the original partitioned table design ideas. As it was mentioned that one of the partitioned table design ideas is to cheaply add and remove partitions
.
Another approach is that we can consider to keep the global
index relation stored separately based on partition key and add the logic to allow globally access the separated global
index relation with uniqueness restriction on a non-partitioned key. In this approach, we keep the benefits of the original partitioned table design. One is the size limitation of one relation as we have global
index relation stored separately, the other one is that it is easy to maintain the detach performance. For attaching partition, it will depend on whether it is an empty table or table with data or even index.
The main idea of this approach is to use the existing feature by removing the restriction of must having partition key involved, and adds logic to handle the global uniqueness check on non-partition key and crossing partitions sort during index build.
4. Some initial tests
Based on the second approach with some basic changes, here are some simple test results to share.
4.1. Setup partitions
First, using pgbench to create 12 partitions and load data with scale 1000.
1 | $ pgbench -i -s 1000 --partitions=12 --partition-method=range -d postgres |
4.2. Setup global
index
Second, create the global
index without partition key (aid) restriction, but with global uniqueness check on non-partition key (bid).
1 | postgres=# create unique index gidx on pgbench_accounts using btree(bid) global; |
Here is how the schema looks like after a global
index has been created.
1 | postgres=# \d+ pgbench_accounts |
To distinguish the global
index relation from normal index relation, here we use g
to replace i
.
1 | postgres=# select oid, relname, relnamespace, reltype, reloftype, relam, relfilenode, relpages, reltuples, relhasindex, relkind from pg_class where relnamespace=2200 order by oid; |
4.3. Query with global
index
Now, let’s run a simple query using non-partition key (bid) to compare the performance
1 | postgres=# select * from pgbench_accounts where bid=75000007; |
As the explain shows above, the index scan has been used.
4.4. Query without index
Then, let’s drop the global index, and run the same query again,
1 | postgres=# drop index gidx; |
Here, we can see a big performance difference between with and without global index (2.243 ms vs. 8345.590 ms)
4.5. Query with index having partition-key restriction
Now, let’s build an index with partition key (aid),
1 | postgres=# create unique index lidx on pgbench_accounts using btree(aid, bid); |
With the same query, we still can see a big difference between global
index and original index with partition key restriction (2.243 ms vs. 3312.177 ms).
4.6. Uniqueness on non-partition key
Below examples are trying to demonstrate that without the uniqueness check on non-partition key offered by global
index, a duplicated bid record insertion can’t not be blocked.
1 | postgres=# insert into pgbench_accounts values(100000001, 75000007, 0, ''); |
However, with the uniqueness check on non-partition key offered by global
index, the insertion of a duplicated bid record can be detected and blocked out.
postgres=# insert into pgbench_accounts values(100000001, 75000007, 0, ‘’);
ERROR: duplicate key value violates unique constraint “pgbench_accounts_10_bid_idx”
DETAIL: Key (bid)=(75000007) already exists.
5. Summary
In this blog, I explained a different approach to achieve logical global
index features but keep the physical storage separately, which can potential keep the original partitioned table design idea on PostgreSQL, and demonstrated the benefit of query performance and the uniqueness check on non-partition key.