Sunday 13 August 2017

PostgreSQL Warm Standby

Like a good relational database, PostgreSQL maintains a set of transactional log files known as write-ahead-logs (WAL) in the pg_xlog directory.  These logs are written to for every change in the database files and are used to recovery from a crash condition.  If you crash, replay all the WAL files since the last backup and you will be back in business right at the point of failure.
I required high availability but not real time fail over so I opted to use the warm standby which is a pseudo-built in feature of postgres as of version 8.3 that provides easy failover that can be automated if necessary.
Well, if you have this capability, what about keeping a warm-standby system and feeding it all the WAL files.  If you teach the warm-standby how to continuously process the incoming write ahead logs from the live system you will have a system ready to go at a moments notice.  When you read about this setup in other places online the primary server is known as ‘master’ and the secondary the ‘slave’.
NOTE : Both your primary and your secondary need to be running the same major version of the postgreSQL database.
You can follow these steps for both a brand new postgres installation as well as an existing heavy traffic installation. Furthermore, no downtime is required to set this up.

Pre-process recommendations

§  Use pg_standby for your restore_command in the recovery.conf file on the standby. pg_standby is included in PostgreSQL 8.3, and you can copy the source from there to compile it for 8.2 yourself. It isn't compatible with 8.1.
§  Set up your standby host's environment and directory structure exactly the same as your primary. Otherwise you'll need to spend time changing any symlinks you've created on the primary for xlogs, tablespaces, or whatnot which is really just opportunity for error.
§  Pre-configure both the postgresql.conf and recovery.conf files for your standby. I usually keep all of my different config files for all of my different servers in a single, version-controlled directory that I can then check out and symlink to. Again, consistent environment & directory setups make symlinks your best friend.
§  Use ssh keys for simply, and safely, transferring files between hosts. Configure password postgres user primary to  secondary  by below link..  
§  Follow all of the advice in the manual with respect to handling errors.
Here Primary server(master) test_db01 and secondary server (slave) test_db02.
Archive location --  /u01/psqlwarch
PGDATA Location -- /var/lib/pgsql/data/

Primary Setup 

All we need to do on the primary server is set it up to push archive WAL files to our NFS mount or archive location on slave server .

1 -- (Primary server(master) test_db01) ( By postgres user ) Modify the /var/lib/pgsql/data/postgresql.conf. Set the following parameters
archive_mode = on
archive_command = '/var/lib/pgsql/data/archive_command.sh %p %f'               # command to use to archive a logfile segment
#archive_command = 'rsync -a %p postgres@test_db02:/u01/psqlwarch/%f'
#archive_timeout = 3600

script for archive mode  (cat /var/lib/pgsql/data/archive_command.sh)
*********************************************************
cat /var/lib/pgsql/data/archive_command.sh
#!/bin/sh
# archive command to archive the WAL files to the archive directory
# It also writes the name to a file which can later be picked up by a backup command
# First argument is the full path to the WAL file and the second argument is the filename

#REPLICANTS="test_db01"
REPLICANTS2="test_db02"

FULLPATH=$1
FILENAME=$2
echo $FULLPATH $FILENAME >> /usr/tmp/debug
export ARCHIVE_DIR=/u01/psqlwarch
cp -i $FULLPATH ${ARCHIVE_DIR}/${FILENAME}

for REPLICA in $REPLICANTS2; do
        scp $FULLPATH $REPLICA:/u01/psqlwarch

done

echo ${FILENAME}>>${ARCHIVE_DIR}/wal_files.txt

2 -- (Primary server(master) test_db01) ( By postgres user ) Restart postgres to apply the changes

(with out downtime)
 /usr/bin//pg_ctl restart -D /var/lib/pgsql/data  

(with downtime)

(stop postgres) /usr/bin/pg_ctl stop  -D /var/lib/pgsql/data
(start postgres)   /usr/bin/pg_ctl -D /var/lib/pgsql/data -l logfile start
Note -- Verify that the WALs are being shipped to their destination.
postgres=# select pg_switch_xlog();
 pg_switch_xlog
----------------
 0/7000088
(1 row)

Standby Setup

1 -- (secondary server (slave) test_db02) ( By root user )Before we configure stuff we'll need to build /install pgstandby from source/RPM. It's not provided with the default posgres installation.
You can verify that it works by running which pgstandby. If we are not able to get pgstandby need to build /install  from source/RPM.Here I install by RPM.
-> List out installed postgres RPM
[root@test_db02 ~]# rpm -qa postgres*
postgresql-docs-8.4.13-1.el6_3.x86_64
postgresql-8.4.13-1.el6_3.x86_64
postgresql-odbc-08.04.0200-1.el6.x86_64
postgresql-libs-8.4.13-1.el6_3.x86_64
postgresql-server-8.4.13-1.el6_3.x86_64
-> Install crontrib RPM
[root@test_db02 ~]# yum install postgresql-contrib.x86_64
-> verify pgstandby
[root@test_db02 ~]# which pg_standby
/usr/bin/pg_standby

2 -- (secondary server (slave) test_db02) ( By postgres user ) (Optional ) If postgres running then we'll stop the standby server to prepare for a checkpoint backup from our primary system.

3 -- (Primary server(master) test_db01)  ( By postgres user )  start the hot backup 

 3.1-> echo "SELECT pg_start_backup('mybackup');" | psql -U postgres

-bash-4.1$ echo "SELECT pg_start_backup('mybackup');" | psql -U postgres
 pg_start_backup
-----------------
 0/A000020
(1 row)

3.2 -> Create base backup  by tar or  rsync command.
-bash-4.1$ pwd
/var/lib/pgsql
-bash-4.1$ tar -cvf /u01/psqlwarch/mybackup.tar data

3.3 -> Copy base backup from primary to secondary server
Cd /u01/psqlwarch/
Scp mybackup.tar postgres@test_db02:/var/lib/pgsql/

4 -- (secondary server (slave) test_db02) ( By postgres user )   Restore base backup
 4.1-> remove old data directory
Cd /var/lib/pgsql/
Rm - rf data
4.2-> Restore base backup
Cd /var/lib/pgsql/
tar -xvf mybackup.tar
chown -R postgres:postgres data

5 -- (Primary server(master) test_db01)  ( By postgres user ) With the backup completed, log back into the primary system and clear the backup flag
 echo "SELECT pg_stop_backup();" | psql -U postgres
-bash-4.1$ echo "SELECT pg_stop_backup();" | psql -U postgres
 pg_stop_backup
----------------
 0/A000088
(1 row)

6 -- (secondary server (slave) test_db02) ( By postgres user ) 
6.1-> turn off Archive,  Modify the /var/lib/pgsql/data/postgresql.conf. Set the following parameters
archive_mode = off

6.2-> create a recovery.conf file in /var/lib/pgsql/data/ with the following contents. This tells postgres to start up implement the log files.
Cd /var/lib/pgsql/data/

vi recovery.conf
restore_command = 'pg_standby -d -s 10 -t /tmp/pgsql.trigger.5432 /u01/psqlwarch %f %p %r 2>>standby.log'

6.3-> Start up postgres on the standby server. It will start restoring data right away..
-bash-4.1$ /usr/bin/pg_ctl start  -D /var/lib/pgsql/data

6.4-> Take a peek at the restore to make sure there aren't too many errors. It may complain about missing files. Ignore these warnings
Cd /var/lib/pgsql/data/
-bash-4.1$ tail -l standby.log
Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 00000001000000000000000E
WAL file path           : /u01/psqlwarch/00000001000000000000000E
Restoring to            : pg_xlog/RECOVERYXLOG
Sleep interval          : 10 seconds
Max wait interval       : 0 forever
Command for restore     : cp "/u01/psqlwarch/00000001000000000000000E" "pg_xlog/RECOVERYXLOG"
Keep archive history    : 00000001000000000000000C and later
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
running restore         : OK
removing "/u01/psqlwarch/00000001000000000000000A"
removing "/u01/psqlwarch/00000001000000000000000B"
Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 00000001000000000000000F
WAL file path           : /u01/psqlwarch/00000001000000000000000F
Restoring to            : pg_xlog/RECOVERYXLOG
Sleep interval          : 10 seconds
Max wait interval       : 0 forever
Command for restore     : cp "/u01/psqlwarch/00000001000000000000000F" "pg_xlog/RECOVERYXLOG"
Keep archive history    : 00000001000000000000000D and later
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...

Note  : -- Execute below command on Primary server(master) test_db01  and check new archive entry come in standby.log 
postgres=# select pg_switch_xlog();

6.5-> Verify that it's still in standby mode. It should complain with this message: 
-bash-4.1$ psql -l
psql: FATAL:  the database system is starting up


Doing the Failover
what do we do when the primary fails? Here's my process for flipping over to a standby.
1.     Log into the primary server and make sure that postgresql is all the way down.
2.     If the primary server is not totally shut down, take it down the rest of the way. kill -9 the primary postgres process if necessary. 
sudo kill -9 sudo cat /var/pgsql/data/postmaster.pid | head -n 1
3.     log into the warm standby
4.     switch to the postges user sudo -u postgres bash
5.     Verify that the server is still in standby mode. It should report 'psql: FATAL: the database system is starting up'.psql
6.     Assuming that it (warm standby) is not up, create the trigger file. This tells our standby server that it's time to become a primary node. It's very important that only one server is primary at a time so i hope you followed steps 1 and 2.
touch /tmp/pgsql.trigger.5432
7.     Start postgres into warm standby and Verify that the standby server is running psql
8.     Get your clients pointing to the new server (change DNS, change IPs in configuration files, etc.)
Note : Postgres will want to rename the recovery.conf file to recovery.done once it’s finished.
*************************************************************************

No comments:

Post a Comment