Wednesday, 23 August 2017

PostgreSQL Point-in-Time Recovery (PITR)

After a crash, Postgres replays the WAL to correct inconsistencies in the main data files, then opens for business as usual. It stands to reason that we can perform a backup in the same way, by having a copy of the main data files and the current WAL segments. Postgres even makes this really easy, by archiving copies of the WAL segments for you.
In this document we will discuss point in time recovery process. Point in time recovery can perform on same server or different server.
Here we use different server for recovery..

Pre-process recommendations

§  Set up your recovery host's environment and directory structure exactly the same as your main server. Otherwise you'll need to spend time changing any symlinks you've created on the primary for xlogs, tablespaces, or whatnot which is really just opportunity for error.
§  Follow all of the advice in the manual with respect to handling errors.
Here Main server (test_db01) and Recovery server (test_db02).
Archive location --  /u01/psqlwarch
PGDATA Location -- /var/lib/pgsql/data/

1 -- Setting up WAL archiving Main server (test_db01)

Modify the /var/lib/pgsql/data/postgresql.conf. Set the following parameters
archive_mode = on
# command to use to archive a logfile segment , here this archive move on recovery server
archive_command = '/var/lib/pgsql/data/archive_command.sh %p %f'              
#archive_command = 'rsync -a %p postgres@test_db02:/u01/psqlwarch/%f'
#archive_timeout = 3600
# Only present in Postgres 9.0 or later
#wal_level = archive
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
script for archive mode  (cat /var/lib/pgsql/data/archive_command.sh)
*********************************************************
cat /var/lib/pgsql/data/archive_command.sh
#!/bin/sh
# archive command to archive the WAL files to the archive directory
# It also writes the name to a file which can later be picked up by a backup command
# First argument is the full path to the WAL file and the second argument is the filename

#REPLICANTS="test_db01"
REPLICANTS2="test_db02"

FULLPATH=$1
FILENAME=$2
echo $FULLPATH $FILENAME >> /usr/tmp/debug
export ARCHIVE_DIR=/u01/psqlwarch
cp -i $FULLPATH ${ARCHIVE_DIR}/${FILENAME}

for REPLICA in $REPLICANTS2; do
        scp $FULLPATH $REPLICA:/u01/psqlwarch

done

echo ${FILENAME}>>${ARCHIVE_DIR}/wal_files.txt
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Restart postgres to apply the changes
(with out downtime)
 /usr/bin//pg_ctl restart -D /var/lib/pgsql/data  
(with downtime)
(stop postgres) /usr/bin/pg_ctl stop  -D /var/lib/pgsql/data
(start postgres)   /usr/bin/pg_ctl -D /var/lib/pgsql/data -l logfile start
Note -- Verify that the WALs are being shipped to their destination.
postgres=# select pg_switch_xlog();
 pg_switch_xlog
----------------
 0/7000088
(1 row)

2-- Performing a backup run & DDL/DML Main server (test_db01)


2.1 -> ( By postgres user ) Here we create some tables for testing purpose
postgres=# \c testdb
psql (8.4.13)
You are now connected to database "testdb".
testdb=# \dt
            List of relations
 Schema |    Name     | Type  |  Owner
--------+-------------+-------+----------
 public | company     | table | postgres
 public | department  | table | postgres
 public | n_company   | table | postgres
 public | new_company | table | postgres
(4 rows)
testdb=# create table new_department as select * from department;
SELECT
testdb=# \dt
             List of relations
 Schema |      Name      | Type  |  Owner
--------+----------------+-------+----------
 public | company        | table | postgres
 public | department     | table | postgres
 public | n_company      | table | postgres
 public | new_company    | table | postgres
 public | new_department | table | postgres
(5 rows)
testdb=# select pg_switch_xlog();
 pg_switch_xlog
----------------
 0/16015D30
(1 row)

2.2 -> ( By postgres user )  start the hot backup 
echo "SELECT pg_start_backup('mybackup');" | psql -U postgres
-bash-4.1$ echo "SELECT pg_start_backup('mybackup');" | psql -U postgres
 pg_start_backup
-----------------
 0/17000020
(1 row)

2.3 -> ( By postgres user ) Create base backup  by tar or  rsync command.
-bash-4.1$ pwd
/var/lib/pgsql
-bash-4.1$ tar -cvf /u01/psqlwarch/mybackup.tar data

2.4 -> ( By postgres user ) With the backup completed, log back into the primary system and clear the backup flag
echo "SELECT pg_stop_backup();" | psql -U postgres
-bash-4.1$ echo "SELECT pg_stop_backup();" | psql -U postgres
 pg_stop_backup
----------------
 0/17000088
(1 row)

2.5-> ( By postgres user ) Copy base backup from Main server to recovery server
Cd /u01/psqlwarch/
Scp mybackup.tar postgres@test_db02:/var/lib/pgsql/

2.6-> ( By postgres user ) Here we Perform DDL/DML for testing purpose
postgres=# \c testdb
psql (8.4.13)
You are now connected to database "testdb".
testdb=# \dt
             List of relations
 Schema |      Name      | Type  |  Owner
--------+----------------+-------+----------
 public | company        | table | postgres
 public | department     | table | postgres
 public | n_company      | table | postgres
 public | new_company    | table | postgres
 public | new_department | table | postgres
(5 rows)
testdb=# drop table new_company ;
DROP TABLE
testdb=# drop table new_department;
DROP TABLE
testdb=#  \dt
           List of relations
 Schema |    Name    | Type  |  Owner
--------+------------+-------+----------
 public | company    | table | postgres
 public | department | table | postgres
 public | n_company  | table | postgres
(3 rows)

testdb=# select pg_switch_xlog();
 pg_switch_xlog
----------------
 0/18008EF8
(1 row)
testdb=# create table manoj_company as select * from company;
SELECT
testdb=#  create table man as select * from company;
SELECT
testdb=# select pg_switch_xlog();
 pg_switch_xlog
----------------
 0/1901BDA8
(1 row)

Due to some issue (power/hardware failure) main server complete shutdown. Now we require , to perform point in time recovery on recovery server
for business continuity.

3 -- Recovering from backups Recovery server (test_db02)


Note -- > On this server we already stop postgres processes and removed/renamed data directory.

3.1-> ( By postgres user ) Restore backup dump which in created in step 2.3 & 2.5
-bash-4.1$ pwd
/var/lib/pgsql
-bash-4.1$ ls -lrt
total 72116
drwx------.  2 postgres postgres     4096 Mar 30 13:59 backups
-rw-------.  1 postgres postgres     1152 Apr 17 15:49 pgstartup.log
drwx------. 12 postgres postgres     4096 Jun 12 10:47 old_data
-rw-r--r--.  1 postgres postgres 73830400 Jun 15 17:56 mybackup.tar

-bash-4.1$ Cd /var/lib/pgsql/
-bash-4.1$ tar -xvf mybackup.tar
-bash-4.1$ chown -R postgres:postgres data

-bash-4.1$ ls -lrt
total 72120
drwx------.  2 postgres postgres     4096 Mar 30 13:59 backups
-rw-------.  1 postgres postgres     1152 Apr 17 15:49 pgstartup.log
drwx------. 12 postgres postgres     4096 Jun 12 10:47 old_data
drwx------. 12 postgres postgres     4096 Jun 15 17:50 data
-rw-r--r--.  1 postgres postgres 73830400 Jun 15 17:56 mybackup.tar
-bash-4.1$ cd data
-bash-4.1$ ls -lrt
total 120
-rw-------. 1 postgres postgres     4 Apr 17 15:49 PG_VERSION
drwx------. 2 postgres postgres  4096 Apr 17 15:49 pg_twophase
drwx------. 2 postgres postgres  4096 Apr 17 15:49 pg_tblspc
drwx------. 4 postgres postgres  4096 Apr 17 15:49 pg_multixact
drwx------. 2 postgres postgres  4096 Apr 17 15:49 pg_subtrans
-rw-------. 1 postgres postgres  1631 Apr 17 15:49 pg_ident.conf
drwx------. 2 postgres postgres  4096 Apr 17 15:49 pg_clog
drwx------. 6 postgres postgres  4096 Apr 21 13:35 base
-rw-------. 1 postgres postgres 16886 Apr 21 13:42 copy_postgresql.conf
-rw-------. 1 postgres postgres  3411 Apr 21 13:42 copy_pg_hba.conf
-rw-------. 1 postgres postgres  1631 Apr 21 13:43 copy_pg_ident.conf
-rw-------. 1 postgres postgres     0 Jun  9 16:37 logfile
-rw-------. 1 postgres postgres  3411 Jun 11 15:56 old_pg_hba.conf
-rw-------. 1 postgres postgres  3531 Jun 11 17:46 pg_hba.conf
-rwxr-xr-x. 1 postgres postgres   613 Jun 12 11:11 archive_command.sh
-rw-------. 1 postgres postgres 17126 Jun 12 11:21 postgresql.conf
drwx------. 2 postgres postgres  4096 Jun 15 16:13 pg_log
-rw-------. 1 postgres postgres    45 Jun 15 17:48 postmaster.pid
-rw-------. 1 postgres postgres    45 Jun 15 17:48 postmaster.opts
drwx------. 2 postgres postgres  4096 Jun 15 17:48 global
drwx------. 3 postgres postgres  4096 Jun 15 17:50 pg_xlog
-rw-------. 1 postgres postgres   147 Jun 15 17:50 backup_label
drwx------. 2 postgres postgres  4096 Jun 15 17:52 pg_stat_tmp

3.2-> ( By postgres user ) Cross check archives those shipped from main server till  crash.
create a recovery.conf file in /var/lib/pgsql/data/ with the following contents. This tells postgres to start up implement the log files.
bash-4.1$ cd /u01/psqlwarch/
-bash-4.1$ ls -lrt

cd /var/lib/pgsql/data/
vi recovery.conf
restore_command = 'cp /u01/psqlwarch/%f %p' 

3.3-> Start up postgres . It will start restoring data right away till point of  crash ( depend on available archive).
-bash-4.1$ /usr/bin/pg_ctl start  -D /var/lib/pgsql/data

3.4-> Crosscheck point in time recovery successful or not.   ( WE GOT ALL Tables & Data!)
-bash-4.1$ psql
psql (8.4.20)
Type "help" for help.
postgres=# \c testdb
psql (8.4.20)
You are now connected to database "testdb".
testdb=#  \dt
             List of relations
 Schema |     Name      | Type  |  Owner
--------+---------------+-------+----------
 public | company       | table | postgres
 public | department    | table | postgres
 public | man           | table | postgres
 public | manoj_company | table | postgres
 public | n_company     | table | postgres
(5 rows)

Note-- Once you’re done, you should remove the archived WAL segments again. Once you’ve successfully recovered, you don’t need them any more.
Note : Postgres will want to rename the recovery.conf file to recovery.done once it’s finished.
*********************************************************************************


No comments:

Post a Comment