Streaming Replication (SR) provides the
capability to continuously ship and apply the WAL XLOG records
to some number of standby servers in order to keep them current.This feature
was added to PostgreSQL 9.0.
XLOG records generated in the primary are periodically shipped to the
standby via the network. XLOG records shipped are replayed as soon as possible
without waiting until XLOG file has been filled. The combination of Hot Standby
and SR would make the latest data inserted into the primary visible in the
standby almost immediately. So minimal data loss(almost only open transactions
will be lost if its async rep, 0 loss if it is sync rep)
NOTE : Both your
primary and secondary need to be running the same major version of the
postgreSQL database.
Pre-process
recommendations
§
Set up your standby host's
environment and directory structure exactly the same as your primary. Otherwise
you'll need to spend time changing any symlinks you've created on the primary
for xlogs, tablespaces, or whatnot which is really just opportunity for error.
§
Pre-configure both the
postgresql.conf and recovery.conf files for your standby. I usually keep all of
my different config files for all of my different servers in a single,
version-controlled directory that I can then check out and symlink to. Again,
consistent environment & directory setups make symlinks your best friend.
§
Use ssh keys for simply,
and safely, transferring files between hosts. Configure password postgres user primary to secondary
by below link..
§
Follow all of the advice
in the manual with respect to handling errors.
Here Primary server(master) test_db01 and
secondary server (slave) test_db02.
Archive location -- /u01/psqlwarch
PGDATA Location -- /var/lib/pgsql/9.4/data/
1 -- (Primary server(master) test_db01) ( By postgres user )
1.1-> Create an useModify the /var/lib/pgsql/9.4/data/postgresql.conf. Set the following
parameters
archive_mode = on
archive_command = ' /var/lib/pgsql/9.4/data/archive_command.sh
%p %f'
# command to use to archive a logfile
segment
#archive_command = 'rsync -a %p
postgres@test_db02:/u01/psqlwarch/%f'
#archive_timeout = 3600
wal_level = hot_standby
listen_addresses = '*'
max_wal_senders = 3
checkpoint_segments = 8
wal_keep_segments = 8
script for archive mode (cat /var/lib/pgsql/9.4/data/archive_command.sh)
*********************************************************
cat /var/lib/pgsql/9.4/data/archive_command.sh
#!/bin/sh
# archive command to archive the WAL files to the archive
directory
# It also writes the name to a file which can later be
picked up by a backup command
# First argument is the full path to the WAL file and the
second argument is the filename
#REPLICANTS="test_db01"
REPLICANTS2="test_db02"
FULLPATH=$1
FILENAME=$2
echo $FULLPATH $FILENAME >> /usr/tmp/debug
export ARCHIVE_DIR=/u01/psqlwarch
cp -i $FULLPATH ${ARCHIVE_DIR}/${FILENAME}
for REPLICA in $REPLICANTS2; do
scp $FULLPATH
$REPLICA:/u01/psqlwarch
done
echo ${FILENAME}>>${ARCHIVE_DIR}/wal_files.txt
1.2-> Create an user
named replication with REPLICATION privileges
-bash-4.1$ psql -U postgres -d
postgres -c "CREATE USER replication WITH replication ENCRYPTED PASSWORD
'changeme' LOGIN"
Note : Mention entry for replication user in pg_hba.conf
host replication replication (IP of slave/secondary server)/32 md5
2 -- (Primary server(master) test_db01) ( By postgres user ) Restart postgres to apply the changes
(with out
downtime)
/usr/pgsql-9.4/bin/pg_ctl restart -D /var/lib/pgsql/9.4/data
(with
downtime)
(stop postgres)
/usr/pgsql-9.4/bin/pg_ctl stop -mf -D /var/lib/pgsql/9.4/data
(start postgres) /usr/pgsql-9.4/bin/pg_ctl -D /var/lib/pgsql/9.4/data -l logfile start
Note -- Verify that the WALs are
being shipped to their destination.
postgres=# select pg_switch_xlog();
pg_switch_xlog
----------------
0/7000088
(1 row)
3 -- Make a base backup by copying the primary server's data directory to the
standby server.
3.1-> (secondary server (slave) test_db02) ( By postgres user ) (Optional ) If postgres running then we'll stop the standby server to prepare standby system.
-bash-4.1$ /usr/pgsql-9.4/bin/pg_ctl stop -mf -D
/var/lib/pgsql/9.4/data
3.2-> (secondary server (slave) test_db02) ( By postgres user )
remove old data directory
Cd /var/lib/pgsql/9.4/
Rm - rf data
3.3 ->(Primary server(master) test_db01) ( By postgres user )
start the hot backup
echo "SELECT
pg_start_backup('mybackup');" | psql -U postgres
-bash-4.1$ echo "SELECT
pg_start_backup('mybackup');" | psql -U postgres
pg_start_backup
-----------------
0/A000020
(1 row)
3.3.1-> Create base backup by
tar or rsync command.
-bash-4.1$ pwd
/var/lib/pgsql/9.4/
-bash-4.1$ tar -cvf /u01/psqlwarch/hotbackup.tar data
3.3.2 -> Copy base backup from primary to secondary server
Cd /u01/psqlwarch/
scp hotbackup.tar postgres@test_db02:/var/lib/pgsql/9.4/
3.3.3 -> Restore base backup
Cd /var/lib/pgsql/9.4/
tar -xvf hotbackup.tar
chown -R postgres:postgres data
3.4 ->(Primary server(master) test_db01) ( By postgres user ) With the backup completed, log back into the primary system and clear
the backup flag
echo "SELECT pg_stop_backup();" |
psql -U postgres
-bash-4.1$ echo "SELECT
pg_stop_backup();" | psql -U postgres
pg_stop_backup
----------------
0/A000088
(1 row)
Note : here we can use below method to Make a base backup of the primary
server's data directory to the standby server. (In this case don't follow step
3.3 & 3.4)
-bash-4.1$
/usr/bin/pg_basebackup -h test_db01 -D /var/lib/pgsql/9.4/data -P -U
replication --xlog-method=stream
4 -- (secondary server (slave) test_db02) ( By postgres user )
4.1-> Modify the /var/lib/pgsql/9.4/data/postgresql.conf. Set the following
parameters
archive_mode = off
hot_standby = on
4.2-> create a recovery.conf file in /var/lib/pgsql/9.4/data/ with the following
contents. This tells postgres to start up implement the log files.
Cd /var/lib/pgsql/9.4/data/
vi recovery.conf
standby_mode =on
restore_command = 'cp -i
/u01/psqlwarch/%f %p'
primary_conninfo = 'host=test_db01 port=5432 user=replication
password=changeme'
trigger_file = '/tmp/trigger'
Note : -- we can create this file without
trigger_file value/option.
4.3-> Start up postgres on
the standby server. It will start restoring data right away..
-bash-4.1$/usr/pgsql-9.4/bin/pg_ctl
start -D /var/lib/pgsql/9.4/data
server starting
4.4-> Verify standby server
-bash-4.1$ ps -ef | grep postgres
postgres 4065
1 0 16:36 pts/0 00:00:00 /usr/pgsql-9.4/bin/postgres -D
/var/lib/pgsql/9.4/data
postgres 4066
4065 0 16:36 ? 00:00:00 postgres: logger process
postgres
4067 4065 0 16:36 ? 00:00:00 postgres: startup process recovering 000000010000000000000036
postgres 4069
4065 0 16:36 ? 00:00:00 postgres: checkpointer process
postgres 4070
4065 0 16:36 ? 00:00:00 postgres: writer process
postgres 4071
4065 0 16:36 ? 00:00:00 postgres: stats collector
process
postgres
4073 4065 0 16:36 ? 00:00:00 postgres: wal receiver process streaming 0/360002E8
Cd /var/lib/pgsql/9.4/data/pg_log
-bash-4.1$ tail -l Tuesday.log
< 2015-07-02 14:29:05.063 IST
>LOG: restored log file
"00000001000000000000002B" from archive
cp: cannot stat
`/u01/psqlwarch/00000001000000000000002C': No such file or directory
cp: cannot stat
`/u01/psqlwarch/00000001000000000000002C': No such file or directory
< 2015-07-02 14:29:45.175 IST
>LOG: restored log file
"00000001000000000000002C" from archive
< 2015-07-02 14:29:45.230 IST
>LOG: restored log file "00000001000000000000002D"
from archive
cp: cannot stat
`/u01/psqlwarch/00000001000000000000002E': No such file or directory
< 2015-07-02 14:29:45.241 IST
>LOG: unexpected pageaddr 0/29000000
in log segment 00000001000000000000002E, offset 0
Note : -- Execute below command on Primary server(master)
test_db01 and check new archive entry
come in standby log.
postgres=# select pg_switch_xlog();
(Primary
server(master) test_db01)
testdb=# create table ho as select * from company;
SELECT 5
testdb=# \dt
List of relations
Schema |
Name | Type |
Owner
--------+---------------+-------+----------
public | company | table | postgres
public | department | table | postgres
public | ho | table | postgres
public | hot
| table | postgres
public | hot1 | table | postgres
public | man | table | postgres
public | manoj_company | table | postgres
public | n_company | table | postgres
(8 rows)
postgres=# select pg_switch_xlog();
pg_switch_xlog
----------------
0/30019428
(1 row)
testdb=# SELECT txid_current_snapshot();
txid_current_snapshot
-----------------------
1240:1240:
(1 row)
testdb=# select * from company;
id | name
| age |
address |
salary | join_date
----+-------+-----+----------------------------------------------------+--------+------------
1 | Paul | 32 | California | 20000 | 2001-07-13
2 | Allen | 25 | Texas | | 2007-12-13
3 | Teddy | 23 | Norway
| 20000 |
4 | Mark | 25 | Rich-Mond | 65000 | 2007-12-13
5 | David | 27 | Texas | 85000 | 2007-12-13
(5 rows)
testdb=# update company set
name='manoj' where id=1;
UPDATE 1
testdb=# select * from company;
id | name | age | address | salary | join_date
----+-------+-----+----------------------------------------------------+--------+------------
2 | Allen | 25 | Texas
| | 2007-12-13
3 | Teddy | 23 | Norway
| 20000 |
4 | Mark | 25 | Rich-Mond | 65000 | 2007-12-13
5 | David | 27 | Texas
| 85000 | 2007-12-13
1 | manoj | 32 | California |
20000 | 2001-07-13
testdb=# SELECT
pg_current_xlog_location();
pg_current_xlog_location
--------------------------
0/360013F8
-bash-4.1$ ps -ef | grep sender
postgres 4535
1862 0 16:36 ? 00:00:00 postgres: wal sender process
replication 192.168.90.23(12728) streaming 0/360013F8
postgres 4604
3337 0 16:46 pts/0 00:00:00 grep sender
(secondary
server (slave) test_db02)
postgres=# SELECT txid_current_snapshot();
txid_current_snapshot
-----------------------
1240:1240:
(1 row)
postgres=# SELECT
pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
postgres=# \c testdb
You are now connected to database
"testdb" as user "postgres".
testdb=# \dt
List of relations
Schema |
Name | Type | Owner
--------+---------------+-------+----------
public | company | table | postgres
public | department | table | postgres
public | ho | table | postgres
public | hot | table | postgres
public | hot1 | table | postgres
public | man | table | postgres
public | manoj_company | table | postgres
public | n_company | table | postgres
(8 rows)
testdb=# select * from company;
id | name
| age |
address |
salary | join_date
----+-------+-----+----------------------------------------------------+--------+------------
2 | Allen | 25 | Texas
| | 2007-12-13
3 | Teddy | 23 | Norway | 20000 |
4 | Mark | 25 | Rich-Mond | 65000 | 2007-12-13
5 | David | 27 | Texas
| 85000 | 2007-12-13
1 | manoj | 32 | California | 20000 | 2001-07-13
(5 rows)
testdb=# select
pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
0/360013F8
testdb=# select
pg_last_xlog_replay_location();
pg_last_xlog_replay_location
------------------------------
0/360013F8
-bash-4.1$ ps -ef | grep receiver
postgres 4073
4065 0 16:36 ? 00:00:00 postgres: wal receiver process streaming 0/360013F8
postgres 4106
2333 0 16:47 pts/0 00:00:00 grep receiver
Doing the Failover
(Promoting a Standby to
the Main server)
Option 1 -- ( If create recovery.conf file without trigger_file value/option ) use
pg_ctl promote
-bash-4.1$ /usr/pgsql-9.4/bin/pg_ctl
promote -D /var/lib/pgsql/9.4/data
Option 2 -- ( If create recovery.conf file with trigger_file value/option ) use
touch command
-bash-4.1$ touch /tmp/trigger
-bash-4.1$ ps -ef | grep postgres
postgres 2483 1 0
14:26 pts/0 00:00:00
/usr/pgsql-9.4/bin/postgres -D /var/lib/pgsql/9.4/data
postgres 2484 2483 0
14:26 ? 00:00:00 postgres: logger
process
postgres 2487 2483 0
14:26 ? 00:00:00 postgres:
checkpointer process
postgres 2488 2483 0
14:26 ? 00:00:00 postgres: writer
process
postgres 2489 2483 0
14:26 ? 00:00:00 postgres: stats
collector process
postgres 3854 2483 0
15:56 ? 00:00:00 postgres: wal
writer process
postgres 3855 2483 0
15:56 ? 00:00:00 postgres:
autovacuum launcher process
1.
Get your clients pointing to the new server
(change DNS, change IPs in configuration files, etc.)
Note
: Postgres will want to rename the recovery.conf file to recovery.done once it’s finished.
Functions to Control Streaming Replication
replay:
A.
pg_xlog_replay_pause():
Using this function user can pause recovery of Standby and would be able to take consistent backup of Standby Data Directory. Example is given below:
Using this function user can pause recovery of Standby and would be able to take consistent backup of Standby Data Directory. Example is given below:
postgres=# select pg_xlog_replay_pause();
pg_xlog_replay_pause
----------------------
(1 row)
b.
pg_is_xlog_replay_paused():
Using this function user would be able to check the Standby/Streaming Replication is paused or not. Example is given below:
Using this function user would be able to check the Standby/Streaming Replication is paused or not. Example is given below:
postgres=# select pg_is_xlog_replay_paused();
pg_is_xlog_replay_paused
--------------------------
t
(1 row)
c.
pg_xlog_replay_resume():
Using this function user would be able resume replication of standby/streaming replication, if its recovery/replay is paused. Example is given below:
Using this function user would be able resume replication of standby/streaming replication, if its recovery/replay is paused. Example is given below:
postgres=# select
pg_xlog_replay_resume();
pg_xlog_replay_resume
-----------------------
(1 row)
postgres=# select pg_is_xlog_replay_paused();
pg_is_xlog_replay_paused
--------------------------
f
(1 row)
Replication Monitoring:
a.View:
pg_stat_replication:
This view displays information on WAL sender processes. View contains one row for each WAL sender process shows, information on processid, user (oid), username, application name, host name (if available) and port number, time at which the server process began execution, and the current WAL sender state and transaction log location.
This view displays information on WAL sender processes. View contains one row for each WAL sender process shows, information on processid, user (oid), username, application name, host name (if available) and port number, time at which the server process began execution, and the current WAL sender state and transaction log location.
Note: pg_stat_replication view will give
information on Master. Executing Query against pg_stat_replication view on
Standby/streaming will return zero rows.
Example is given below:
Example is given below:
postgres=# select * from
pg_stat_replication;
pid | usesysid |
usename | application_name
| client_addr | client_hostname | client_port | backend_start | backend_xmin | state
| sent_l
ocation | write_location | flush_location | replay_location |
sync_priority | sync_state
------+----------+-------------+------------------+---------------+-----------------+-------------+----------------------------------+--------------+-----------+-------
--------+----------------+----------------+-----------------+---------------+------------
1793 | 24645 | replication | walreceiver | 192.168.90.23 | | 43403 | 2015-07-17 10:46:00.897899+05:30
| | streaming | 0/4D00
1358 | 0/4D001358 | 0/4D001358 | 0/4D001358 | 0 | async
(1 row)
b.
Function: pg_last_xact_replay_timestamp():
Above function gives time stamp of last transaction replayed during recovery, time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL. Example is given below:
Above function gives time stamp of last transaction replayed during recovery, time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL. Example is given below:
postgres=# select * from
pg_last_xact_replay_timestamp();
pg_last_xact_replay_timestamp
----------------------------------
2015-07-16 17:50:53.849232+05:30
(1 row)
c. view:
pg_stat_database_conflicts
In 9.1, pg_stat_database_conflicts view has been added for monitoring and finding the cancelled queries due dropped tablespaces/ lock timeouts/old snapshots/pinned buffers/deadlocks. This view contains one row per database, which gives information on database OID, database name and the number of queries that have been canceled in this database due to dropped tablespaces, lock timeouts, old snapshots, pinned buffers and deadlocks. Example is given below:
In 9.1, pg_stat_database_conflicts view has been added for monitoring and finding the cancelled queries due dropped tablespaces/ lock timeouts/old snapshots/pinned buffers/deadlocks. This view contains one row per database, which gives information on database OID, database name and the number of queries that have been canceled in this database due to dropped tablespaces, lock timeouts, old snapshots, pinned buffers and deadlocks. Example is given below:
Note: User has to run queries against this view
on Standby, since conflicts occurs on Standby.
postgres=# select * from pg_stat_database_conflicts;
datid | datname
| confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin |
confl_deadlock
-------+-----------+------------------+------------+----------------+-----------------+----------------
13003 | postgres | 0 | 0 | 0 | 0 | 0
1 | template1 | 0 | 0 | 0 | 0 | 0
16420 | testdb | 0 | 0 | 0 | 0 | 0
12998 | template0 | 0 | 0 | 0 | 0 | 0
(4 rows)
************************************************************************
No comments:
Post a Comment