knowledge is power

0%

A quick test for postgres_fdw batch insertion

Featured image

1. Overview

In my previous blog, I briefly walked through how the bulk/batch insertion was done for postgres_fdw in PG14. In this blog, I am going to run some basic tests to compare the performance for before and after the batch insertion was introduced in postgres_fdw, so that we can have a general idea about whether this feature makes any difference.

2. PG Servers Setup

The key of the blog is to see if there is any difference for batch insertion. To make the testing simple, here is how I set up a simple environment.

As this bulk/batch insertion was introduced for PG14, so we need to switch to the stable PG14 branch, i.e. REL_14_STABLE. After checked out the source code, simply run the commands: configure, make and make install. Here are the commands used in this blog.

1
2
3
4
5
6
7
./configure --prefix=$HOME/sandbox/postgres/pgapp --enable-tap-tests --enable-debug CFLAGS="-g3 -O0"
make && make install
cd contrib/postgres_fdw/
make && make install
export PGDATA=$HOME/sandbox/postgres/pgdata
initdb -D $PGDATA
pg_ctl -D $PGDATA -l logfile start

In order to test Foreign Data Wrapper, we need to start another PG Server. To make it easy, I simply start a PG Server on a Foreign data cluster and change the default port to a different one, for example, 5433. Below are the commands used to setup Foreign Server.

1
2
export FPGDATA=$HOME/sandbox/postgres/pgdata2
initdb -D $FPGDATA

After the Foreign data cluster has been initialized, change the port to 5433, then start the Foreign PG Server.

1
2
vim $FPGDATA/postgresql.conf 
pg_ctl -D $FPGDATA -l logfile-f start

3. Foreign Tables Setup

Now, we can setup the basic Foreign Data Wrapper testing environment like below.

On the Local PG Server:

3.1. Create a Foreign Server using default batch settings
1
2
postgres=# create server fs foreign data wrapper postgres_fdw options (dbname 'postgres', host '127.0.0.1', port '5433');
CREATE SERVER
3.2. Create the user mapping
1
2
postgres=# create user mapping for david server fs options( user 'david');
CREATE USER MAPPING
3.3. Create Foreign Tables on Local PG Server
1
2
postgres=# create foreign table ft (id int, test text) server fs options(table_name 't');
CREATE FOREIGN TABLE

By default, the batch insertion size has been set to 1 as you can see from the source code.

1
2
3
4
5
6
7
8
9
10
/*
* Determine batch size for a given foreign table. The option specified for
* a table has precedence.
*/
static int
get_batch_size_option(Relation rel)
{
...
/* we use 1 by default, which means "no batching" */
int batch_size = 1;

Now, repeate the process to create another two Foreign Tables with different batch size correspondingly, i.e. 10 and 100.

1
2
3
4
5
postgres=# create foreign table ft_batch10 (id int, test text) server fs options(batch_size '10', table_name 't10');
CREATE FOREIGN TABLE

postgres=# create foreign table ft_batch100 (id int, test text) server fs options(batch_size '100', table_name 't100');
CREATE FOREIGN TABLE
3.4. Create Tables on Foreign Server

On the Foreign PG Server side, create corresponding tables like below. Notes, you need to make the table names match the ones used in Local PG Server.

1
2
3
4
5
6
7
8
9
psql -d postgres -p 5433
postgres=# create table t (id int, test text);
CREATE TABLE

postgres=# create table t10 (id int, test text);
CREATE TABLE

postgres=# create table t100 (id int, test text);
CREATE TABLE

4. Run the tests

Now, enable the timeing on Local PG Server, and simply run the commands below, and then record the timing for comparison.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
postgres=# \timing 
Timing is on.

insert into ft values(generate_series(1,1000),'hello, world');
select count(*) from ft;
delete from ft ;
insert into ft values(generate_series(1,1000000),'hello, world');
select count(*) from ft;
delete from ft ;
insert into ft values(generate_series(1,100000000),'hello, world');
select count(*) from ft;
delete from ft ;


insert into ft_batch10 values(generate_series(1,1000),'hello, world');
select count(*) from ft_batch10;
delete from ft_batch10 ;
insert into ft_batch10 values(generate_series(1,1000000),'hello, world');
select count(*) from ft_batch10;
delete from ft_batch10 ;
insert into ft_batch10 values(generate_series(1,100000000),'hello, world');
select count(*) from ft_batch10;
delete from ft_batch10 ;


insert into ft_batch100 values(generate_series(1,1000),'hello, world');
select count(*) from ft_batch100;
delete from ft_batch100 ;
insert into ft_batch100 values(generate_series(1,1000000),'hello, world');
select count(*) from ft_batch100;
delete from ft_batch100 ;
insert into ft_batch100 values(generate_series(1,100000000),'hello, world');
select count(*) from ft_batch100;
delete from ft_batch100 ;

5. Check the results

Here are results from about tests.

1
2
3
4
1                               10                          100                         
55.634 ms 9.996 ms 25.545 ms
32155.960 ms (00:32.156) 5927.090 ms (00:05.927) 4754.158 ms (00:04.754)
3237972.881 ms (53:57.973) 623204.920 ms (10:23.205) 332998.911 ms (05:32.999)

6. Summary

In this blog, we simply run some basic tests to see if there is any performance improvement in postgres_fdw for the batch/bulk insertion. The results are very impressive: for 1k insertion, batch size 10 and 100 are 5 and 2 times and faster relatively; for 1 million insertion, batch size 10 and 100 are 5.5 and 6.5 times faser; for 100 millions insertion, batch size 10 and 100 are 5 and 9.5 times better.