1. Overview
postgres_fdw has been existing in PostgreSQL for many years, and it was one of the most popular interfaces used in many extensions. such as the PostgreSQL foreign data wrappers wiki page, and PostgreSQL Extension Network/PGXN. However, I had not touched this postgres_fdw in deep until I got a task about combing multiple columns from multiple tables using this amazing Foreign Data Wrapper interface. After some hack, I was be able to combine multiple columns from multiple tables into one table based on postgres_fdw, but I was just curious about how many enhancements has been done recently. After a simple git pull, I found actually there are some new features was introduced in PG14, such as Asynchronous Execution, Bulk Inserts, Truncate as well as some network connection related enhancement. In this blog, I will take a detail look at one of the enhancements, i.e. Bulk insertion, to see how it was implemented.
2. A detail looks at bulk insertion
I haven’t performed any benchmark test for bulk inserts in postgres_fdw to compare before and after yet, but you can find a very nice blog here as your reference, if I have time I will definitely run some benchmark test on a production server. For now, let’s take a look to see how it was implemented.
The first commit was done early 2021, and here is the details information about this features.
1 | commit b663a4136331de6c7364226e3dbf7c88bfee7145 |
static TupleTableSlot *
postgresExecForeignInsert(EState *estate,
ResultRelInfo *resultRelInfo,
TupleTableSlot *slot,
TupleTableSlot *planSlot)
1 | vs. batch rows insertion |
static TupleTableSlot **
postgresExecForeignBatchInsert(EState estate,
ResultRelInfo *resultRelInfo,
TupleTableSlot *slots,
TupleTableSlot **planSlots,
int *numSlots)
1 |
|
static int
postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
1 | The function basically allows to retrieve the number of row remote server will support in bulk insertion. Here, as the function has a note, if there is trigger defined for remote table with AFTER ROW or with RETURNING required, the bulk insertion is not allowed, otherwise the rule will be broken on remote server side. |
if (operation == CMD_INSERT && fmstate->num_slots != *numSlots)
1 | and another important change is `convert_prep_stmt_params` which has a loop to build the number of rows that server can accept. |
for (i = 0; i < numSlots; i++)
Then rest of the changes are mainly to support the bulk insertion features, such as parse the `batch_size` in option. Another new function, `rebuildInsertSql` in `deparse.c` is used in `execute_foreign_modify` to build the actual number of rows before sending the query to remote server.
#### 3. Limitation
As it was mentioned in the commit message, only insertion is supported for bulk operations, delete and update are not supported yet. Hopefully, these bulk operations and interfaces can be added soon. So that, there will be even more extensions using FDW interface in the future.
#### 4. Summary
In this blog, we discussed a key performance improvement in postgres_fdw, i.e. bulk insertion and take a detail look at the source code level. I hope it can help when someone want to add a new feature to postgres_fdw or any continue improvement on bulk insertion.
_**Reference:**_
1. [F.35. postgres_fdw](https://www.postgresql.org/docs/14/postgres-fdw.html)
2. [Faster Bulk Insertion to Foreign Tables - Introduction to PostgreSQL 14 Committed Features](http://postgres-road.blogspot.com/2021/03/faster-bulk-insertion-to-foreign-tables.html)
3. [New Features in PostgreSQL 14: Bulk Inserts for Foreign Data Wrappers](https://www.percona.com/blog/2021/05/27/new-features-in-postgresql-14-bulk-inserts-for-foreign-data-wrappers/)