1. Overview
When I was working on some backup and recovery related features for a project based on Postgres, I noticed that there is file called backup_label
. By quickly google search, you can find some very nice blogs or books which discussed this topic, such as, The Internals of PostgreSQL, one of my favourite books. In this blog, I am going to talk it a little more based on my experience.
2. What is backup_label?
The backup_label
is a file created in $PGDATA folder when there is an exclusive backup
triggered by pg_start_backup()
and the backup is in progress. This backup_label
file will be removed once the pg_stop_backup()
is executed. Here, the exclusive backup
is one of the backup methods introduced to Postgres early, and as the name indicated, it does not support multiple backup activities at the same time. Because of this limitation, a frontend backup tool pg_basebackup
is added to the Postgres later. This pg_basebackup
client does allow multiple backup activities performed at the same time. Therefore, this kind of backup is called as non-exclusive backup
. Both backup methods use the backup_label
but in a different way.
In exclusive basebackup, the backup_label
will be generated automatically on the source server side. To see how this file looks like, you can run a command like, select pg_start_backup('first backup');
from a psql console. Then you should be able to find a backup_label
file in $PGDATA folder with the content like below,
1 | START WAL LOCATION: 0/6000028 (file 000000010000000000000006) |
3. How does it work?
In exclusive backup mode, the Postgres source server will generate this file when pg_sart_backup()
is executed, and removed after pg_stop_backup()
, however, in non-executive backup mode, such as using pg_basebackup
client to perform a base backup, the backup_label
is only streamed to the client side but not physical saved to the source Postgres server.
As you can see in above baseup_label
file, it contains a similar checkpoint information compared to pg_controldata file. If a backup is used in recovery with this backup_label file present, then Postgres will use the checkpoint in backup_label to start the REDO process. The reason is that there could be multiple checkpoints happening during the backup process. After the recovery process is done, this backup_label
file will be renamed as backup_label.old
to indelicate the recovery finished properly. In simple words, with the backup_label
file, the database has a consistent checkpoint to recover from a proper archive.
4. Does it impact any frontend tool?
The answer is yes
. Some frontend tools will perform differently if a backup_label
file is present. For example, if pg_ctl
sees a backup_label
file during smart shutdown process, it will wait for it to be removed by providing a waring message to the end user with something like,
1 | WARNING: online backup mode is active |
Another example is the frontend tool pg_rewind
which creates a backup_label
to force a recovery to start from the last common checkpoint.
5. Summary
In this blog, I explained the backup_label
file in Postgres. I believe the end users won’t pay attention to it most of the time, but if you do encounter some issues related with backup_label
then I hope this blog can give you some clues.