Manoj Simar - The Technology Expert: Interview Q and A for PostgreSQL Part

171. Using multiple schemas

Schemas can be created easily by using the following:

CREATE SCHEMA finance;

CREATE SCHEMA sales;

We can then create objects directly within those schemas by using "fully qualified" names, for example:

CREATE TABLE finance.month_end_snapshot (….)

The default schema into which an object is created is known as the current schema. We can find out which is our current schema by using the following query:

postgres=# select current_schema;

current_schema

----------------

public

(1 row)

When we access database objects, we use the user-settable parameter search_path to identify which schemas to search. The current schema is the first schema in the search_ path—there is no separate parameter for the current schema.

So, if we want to have only a specific user look at certain sets of tables, we can modify his/her search_path. The parameter can be set for each user, so that the value will be set when he/she connects. The SQL for this would be something like the following:

ALTER ROLE fiona SET search_path = 'finance';

ALTER ROLE sally SET search_path = 'sales';

Note that the "public" schema is not mentioned on the search_path, and so would not be searched. All tables created by fiona would go into the finance schema by default, whereas all tables created by sally would go into the sales schema by default.

The users for finance and sales would be able to see that the other schema existed, though we would be able to grant/revoke privileges such that they could neither create objects nor read data in the others' schema.

REVOKE ALL ON SCHEMA finance FROM public;

GRANT ALL ON SCHEMA finance TO fiona; REVOKE ALL ON SCHEMA sales FROM public;

GRANT ALL ON SCHEMA sales TO sally;

An alternate technique would be to allow one user to create privileges on only one schema, but usage rights on all other schemas. We would set up that arrangement like the following:

REVOKE ALL ON SCHEMA finance FROM public;

GRANT USAGE ON SCHEMA finance TO public;

GRANT CREATE ON SCHEMA finance TO fiona; REVOKE ALL ON SCHEMA sales FROM public;

GRANT USAGE ON SCHEMA sales TO sally;

GRANT CREATE ON SCHEMA sales TO sally;

Note that you need to grant the privileges for usage on the schema, as well as specific rights on the objects in the schema. So, you will also need to issue specific grants for objects, such as:

GRANT SELECT ON month_end_snapshot TO public;

or set default privileges so that they are picked up when objects are created using:

ALTER DEFAULT PRIVILEGES FOR USER fiona IN SCHEMA finance GRANT SELECT ON TABLES TO PUBLIC;

172. Giving users their own private database

As a superuser, these actions would be as follows:

postgres=# create user fred;

CREATE ROLE

postgres=# create database fred owner = fred;

CREATE DATABASE

As the database owner:

Users have login privilege, so can connect to any database by default. There is a command named ALTER DEFAULT PRIVILEGES though, that does not currently apply to databases, tablespaces, or languages. ALTER DEFAULT PRIVILEGES also currently applies only to roles (that is, users) that already exist.

So, we need to revoke privilege to connect to our new database from everybody except the designated user. There isn't a REVOKE … FROM PUBLIC EXCEPT command, so we need to revoke everything, and then just re-grant everything we need all in one transaction, such as the following:

postgres=# BEGIN;

postgres=# REVOKE connect ON DATABASE fred FROM public;

postgres=# GRANT connect ON DATABASE fred TO fred;

postgres=# COMMIT;

postgres=# create user bob;

Then, try to connect as bob to the fred database

os $ psql -U bob fred

psql: FATAL: permission denied for database "fred"

DETAIL: User does not have CONNECT privilege.

which is exactly what we wanted.

If you didn't catch it before, PostgreSQL allows transactional DDL in most places, so the REVOKE and GRANT in the preceding either both work or neither actions take place. So user fred never at any point loses the ability to connect to the database. Note that CREATE DATABASE cannot be performed as part of a transaction, though nothing serious happens as a result.

Note that superusers can still connect to the new database, and there is no way to prevent them from doing so. No other users can see tables created in the new database, nor can they find out the names of any of the objects. The new database can be seen to exist by other users, and they can also see the name of the user who owns the database.

173. Set up a Connection Pool (pgbouncer)

A Connection Pool is the term used for a collection of already connected sessions that can be used to reduce the overhead of connection and reconnection.

There are various ways that connection pools can be provided, depending upon the software stack in use. Probably the best option is to look at "server side" connection pool software, because that works for all connection types, not just within a single software stack.

Here we're going to look at pgbouncer, which is designed as a very lightweight connection pool. The name comes from the idea that the pool can be paused/resumed to allow the server to be restarted or bounced.

First of all, decide where you're going to store the pgbouncer parameter files, log files, and pid files.

pgbouncer can manage more than one database server's connections at same time, though that probably isn't wise. If you keep pgbouncer files associated with the database server, then it should be easy to manage.

Carry out the following steps to configure pgbouncer:

1. Create a pgbouncer.ini file

;

; pgbouncer configuration example

;

[databases]

postgres = port=5432 dbname=postgres

[pgbouncer]

listen_port = 6543

listen_addr = 127.0.0.1

admin_users = postgres

;stats_users = monitoring userid

auth_type = trust

; put these files somewhere sensible

auth_file = users.txt

logfile = pgbouncer.log

pidfile = pgbouncer.pid

; required for 9.0

ignore_startup_parameters = application_name

server_reset_query = DISCARD ALL;

; default values

pool_mode = session

default_pool_size = 20

log_pooler_errors = 0

2. Create a users.txt file. This must exist, and must contain at least the minimum users mentioned in admin_users and stats_users. For example:

"postgres" ""

pgbouncer also supports md5 authentication. To use that effectively, you need to copy the encrypted passwords from the database server.

You may wish to create the users.txt file by directly copying the details from the server. That can be done using the following psql script:

postgres=> \o users.txt

postgres=> \t

postgres=> SELECT ‹»›||rolname||›» «›||rolpassword||›»› postgres-> FROM pg_authid;

postgres=> \q

3. Launch pgbouncer:

pgbouncer -d pgbouncer.ini

4. Test the connection—it should respond reload:

psql -p 6543 -U postgres pgbouncer -c "reload"

Also, check that pgbouncer›s max_client_conn parameter does not exceed max_connections parameter on PostgreSQL.

pgbouncer is a great piece of software. It's feature set is very carefully defined to ensure that it is simple, robust, and very fast. pgbouncer is not multi-threaded, so it runs in a single process, and thus on a single CPU. It is very efficient, though very large data transfers will tie up more time and reduce concurrency, so make those data dumps using a direct connection.

pgbouncer doesn't support SSL connections. If it did, then all of the encryption/decryption would need to take place in the single process, which would make that solution perform poorly. If you need secure communications, then you should use stunnel.

pgbouncer provides connection pooling. If you set:

pool_mode = transaction

then pgbouncer will also provide connection concentration. This allows hundreds or thousands of incoming connections to be managed while only a few server connections are made.

As new connections/transactions/statements arrive, the pool will increase in size up to the defined user maximums. Those connections will stay around for at most server_idle_timeout before the pool releases those connections.

pgbouncer also releases sessions every server_lifetime. This allows the server to free backends in rotation to avoid issues with very long-lived session connections.

It's possible to connect to pgbouncer itself to issue commands. This can be done interactively as if you were entering psql, or it can be done using single commands or scripts.

To shut down the server, we can just type SHUTDOWN, or enter a single command as follows:

psql -p 6543 pgbouncer -c "SHUTDOWN"

You can also use the command RELOAD to make the server reload (that means reread) the parameter files.

If you are using pool_mode = transaction or pool_mode = statement, then you can use the PAUSE command. This allows the current transaction to complete before holding further work on that session. This allows you to perform DDL more easily or restart the server.

pbouncer also allows you to use SUSPEND mode. This waits for all server-side buffers to flush.

PAUSE or SUSPEND should eventually be followed by RESUME when the work is done.

In addition to the pgbouncer control commands, there are also a selection of SHOW commands:

Show command	Result set
SHOW STATS	Traffic stats. Total and avg requests, query duration, bytes sent/received.
SHOW SERVERS	One row per connection to database server
SHOW CLIENTS	One row per connection from client
SHOW POOLS	One user per pool of users
SHOW LISTS	Gives a good summary of resource totals
SHOW USERS	Lists uses in user.txt
SHOW DATABASES	Lists databases in pgbouncer.ini
SHOW CONFIG	Lists configuration parameters
SHOW FDS	Show file descriptors
SHOW SOCKETS	Show file sockets
SHOW VERSION	pgbouncer version

174. Planning a new database

Write a document that covers the following items:

ffDatabase design: Plan your database design.

..Calculate the initial database sizing

ffTransaction analysis: How will we access the database?

..Look at the most frequent access paths

..What are the requirements for response times?

ffHardware configuration

..Initial performance thoughts—will all data fit into RAM?

ffLocalization plan

..Decide server encoding, locale, and time zone

ffAccess and security plan

..Identify client systems and specify required drivers

..Create roles according to a plan for access control

..Specify pg_hba.conf

ffMaintenance plan: Who will keep it working? How?

ffAvailability plan: Consider the Availability requirements

..checkpoint_timeout

..Plan your backup mechanism, and test them

ffHigh-availability plan

.Decide what form of replication you'll need, if any

175. Changing parameters in your programs

PostgreSQL allows you to set some parameter settings for each session or for each transaction.

You can change the value of a setting during your session, such as the following:

SET work_mem = '16MB';

This value will then be used for every future transaction. You can also change it only for the duration of the "current transaction"

SET LOCAL work_mem = '16MB';

The setting will last until or if you issue the following:

RESET work_mem;

RESET ALL;

SET and RESET are SQL commands that can be issued from any interface. They apply only to PostgreSQL server parameters, by which we mean parameters that affect the server, but not necessarily the whole server. There may be other parameters, such as JDBC driver parameters, that cannot be set in this way

When you change the value of a setting during your session, such as:

SET work_mem = '16MB';

Then this will show up in the catalog view pg_settings as follows:

postgres=# SELECT name, setting, reset_val, source FROM pg_settings WHERE source = 'session';

name | setting | reset_val | source

----------+---------+-----------+---------

work_mem | 16384 | 1024 | session

until you issue:

RESET work_mem;

after which the setting returns to the reset_val, and the source returns to default.

name | setting | reset_val | source

---------+---------+-----------+---------

work_mem | 1024 | 1024 | default

You can change the value of a setting during your transaction as well, as follows:

SET LOCAL work_mem = '16MB';

then this will show up in the catalog view pg_settings as follows:

postgres=# SELECT name, setting, reset_val FROM pg_settings WHERE source = 'session';

name | setting | reset_val | source

----------+---------+-----------+---------

work_mem | 1024 | 1024 | session

Huh? What happened to my parameter setting? SET LOCAL takes effect only for the transaction in which it was executed, which, in our case, was just the SET LOCAL command. We need to execute it inside a transaction block to be able to see the setting take hold as follows:

BEGIN;

SET LOCAL work_mem = '16MB';

Then this will show up in the catalog view pg_settings as follows:

postgres=# SELECT name, setting, reset_val, source FROM pg_settings WHERE source = 'session';

name | setting | reset_val | source

----------+---------+-----------+---------

work_mem | 16384 | 1024 | session

You should also note that the value of source is "session" rather than "transaction", as you might have been expecting.

176. Which parameters are at non-default Settings?

Often, we need to check which parameters have been changed already or whether our changes have correctly taken effect.

postgres=# SELECT name, source, setting FROM pg_settings WHERE source != 'default AND source != 'override' ORDER by 2, 1;

…..........name …..........|........source.......|..setting

---------------------------+---------------------+----------

application_name | client | psql

log_timezone | command line | GB

TimeZone | command line | GB

timezone_abbreviations | command line | Default

archive_command | configuration file | (disabled)

archive_mode | configuration file | off

archive_timeout | configuration file | 5

bgwriter_delay | configuration file | 10

checkpoint_timeout | configuration file | 30

log_checkpoints | configuration file | on

log_destination | configuration file | stderr

log_filename | configuration file | log%Y

logging_collector | configuration file | on

log_line_prefix | configuration file | %t[%p]

log_min_messages | configuration file | log

max_prepared_transactions | configuration file | 5

max_standby_delay | configuration file | 90

port | configuration file | 5443

max_stack_depth | environment variable | 2048

work_mem | session | 204800

(29 rows)

('Override' is excluded just for display purposes.)

You can see from pg_settings which values have non-default values, and what the source of the current value is.

The SHOW command doesn't tell you whether a parameter is set at a non-default value. It just tells you the value, which isn't much help if you're trying to understand what is set and why.

If the source is a configuration file, then the two columns sourcefile and sourceline are also set. These can be useful in understanding where the configuration came from.

The setting column of pg_settings shows the current value, though you can also look at boot_val and reset_val; boot-val, which show the value assigned when the PostgreSQL database cluster was initialized ("initdb"), while reset_val shows the value that the parameter will return to if you issue the RESET command.

max_stack_depth is an exception because pg_settings says it is set by the environment variable, though it is actually set by ulimit -s on Linux/Unix systems. max_stack_depth only needs to be set directly on Windows.

The timezone settings are also picked up from the OS environment, so you shouldn't need to set those directly. pg_settings shows this as a "command-line" setting.

Another way of finding current settings is to access a PostgreSQL catalog view named pg_settings.

postgres=# \x

Expanded display is on.

postgres=# SELECT * FROM pg_settings WHERE name = 'work_mem';

[ RECORD 1 ] -------------------------------------------------------- ----

name | work_mem

setting | 1024

unit | kB

category | Resource Usage / Memory

short_desc | Sets the maximum memory to be used for query workspaces.

extra_desc | This much memory can be used by each internal sort operation and hash table before switching to temporary disk files.

context | user

vartype | integer

source | default

min_val | 64

max_val | 2147483647

enumvals |

boot_val | 1024

reset_val | 1024

sourcefile |

sourceline |

So, you can use the SHOW command to retrieve the value for a setting, or you can access the full detail via the catalog table

177. Setting parameters for particular groups of users

PostgreSQL supports a variety of ways of defining parameter settings for various user groups.

For all users in database saas:

ALTER DATABASE saas SET configuration_parameter = value1;

For a user named simon connected to any database, use the following:

ALTER ROLE saas SET configuration_parameter = value2;

or set a parameter for a user only when connected to a specific database, as follows:

ALTER ROLE simon IN DATABASE saas SET configuration_parameter = value3;

The user won't know that these have been executed specifically for him. These are default settings, and in most cases can be overridden if the user requires non-default values.

You can set parameters for each of the following:

ffDatabase

ffUser (named "Roles" by PostgreSQL)

ffDatabase / User combination

Each of the parameter defaults is overridden by the one below it.

In the preceding three SQL statements if:

ffuser hannu connects to database saas, then value1 will apply

ffuser simon connects to a database other than saas, then value2 will apply

ffuser simon connects to database saas, then value3 will apply

PostgreSQL implements this in exactly the same way as if the user had manually issued the equivalent SET statements immediately after connecting.

178. Basic server configuration checklist

PostgreSQL arrives configured for use on a shared system, though many people want to run dedicated database systems. The PostgreSQL project wishes to ensure that PostgreSQL will play nicely with other server software, and should not assume it has access to the full server resources.

Before we start, we need to know two sets of information:

First, we need to know the size of the physical RAM that will be dedicated to PostgreSQL.

Second, we need to know something about the types of applications for which PostgreSQL will be used.

If your database is larger than 32MB, then you'll probably benefit from increasing shared_buffers. You can increase this to much larger values, though remember that on Linux systems this memory can be swapped out if not in use, so it's better to be conservative. A new value can be set in your postgresql.conf and incremented slowly to ensure you get benefits from each change.

If you increase shared_buffers, and you're running on a non-Windows server, you will almost certainly need to increase the value of the OS parameter SHMMAX (and on some platforms others as well).

On Linux/Mac OS/FreeBSD, you will need to either edit the /etc/sysctl.conf file or use sysctl -w with the following values:

ffLinux: kernel.shmmax=value

ffMac OS: kern.sysv.shmmax=value

ffFreeBSD kern.ipc.shmmax=value

For example on Linux, add the following line to /etc/sysctl.conf:

kernel.shmmax=value

Don't worry about setting effective_cache_size. It is much less important a parameter than you might think; no need for too much fuss selecting the value.

If you're doing heavy write activity, then you may want to set wal_buffers to a much higher value than the default.

If you're doing heavy write activity and/or large data loads, you then may want to set checkpoint_segments higher than the default.

If your database has many large queries, you may wish to set work_mem to a value higher than the default.

Make sure autovacuum is turned on, unless you have a very good reason to turn it off.

You can change most of them later, so you can take an iterative approach to improving things.

179. Running server in power saving mode

If your PostgreSQL server is only used very sporadically, or has periods of total inactivity, then you may be able to benefit from some of the advice given here. That could be a laptop, or it could be a somewhat inactive virtual server.

PostgreSQL is a server-based database, so it mostly does nothing at all if there are no active clients. To minimize server activity, set the following parameters in the postgresql.conf file:

ffautovacuum = off

ffwal_writer_delay = 10000

ffbgwriter_delay = 10000

These settings are not optimal for many uses and should only be used when it is known that the server will be quiet. They should be reset to previous values when the server becomes busy again.

There are a couple of processes that stay active continually, on the expectation that they will be needed should clients become active. These processes are as follows:

ffWriter process (also known as the "Background writer")

ffWAL writer process

ffArchiver, which will be active if WAL archiving is enabled.

ffWAL receiver process, which will be active if streaming replication is in use.

ffAutovacuum process

The Background writer process wakes up by default every 200ms to do its work. The maximum setting is 10s, which isn't very long, though the Background writer can be disabled by the setting, bgwriter_lru_maxpages = 0.

The WAL writer process wakes up by default every 200ms. The maximum setting is also 10s. This cannot be disabled. If there is no write activity, then no work will be performed, other than the wakeup and check.

The Archiver process will wake up every 15s and check whether any new WAL files have been written. This will cause some activity against the filesystem directory. That time cannot be changed by a parameter.

The WAL receiver process will wake up every 100ms to check if new replication data has arrived. If no new data has arrived, it will sleep again. That time cannot be changed by a parameter.

Autovacuum will wake up every 60s by default. This can be changed by altering the setting of autovacuum_naptime. Autovacuum can be disabled completely by setting autovacuum = off.

So, if you are using Streaming Replication, then the server will wake up every 100ms.

180. Enabling access for network/remote users

PostgreSQL comes in a variety of distributions. In many of these, you will find that remote access is initially disabled as a security measure.

ffAdd/edit the following line in your postgresql.conf:

listen_addresses = '*'

ffAdd the following line as the first line of pg_hba.conf, to allow access to all databases for all users with an encrypted password:

# TYPE DATABASE USER CIDR-ADDRESS METHOD

host all all 0.0.0.0/0 md5

The listen_addresses parameter specifies on which IP addresses to listen. This allows you to have more than one network card (NICs) per system. In most cases, we want to accept connections on all NICs, so we use "*", meaning "all IP addresses".

The pg_hba.conf contains a set of host-based authentication rules. Each rule is considered in sequence until one rule fires, or the attempt is specifically rejected with a reject method.

The preceding rule means a remote connection that specifies any user, or any database, on any IP address will be asked to authenticate using an md5 encrypted password.

Type = host means a remote connection.

Database = all means "for all databases". Other names match exactly, except when prefixed with a plus (+) symbol, in which case we mean a "group role" rather than a single user. You can also specify a comma-separated list of users, or use the @ symbol to include a file with a list of users. You can also specify "sameuser", so that the rule matches when we specify the same name for the username and database name.

User = all means "for all users." Other names match exactly, except when prefixed with a plus (+) symbol, in which case we mean a "group role" rather than a single user. You can also specify a comma-separated list of users or use the @ symbol to include a file with a list of users.

CIDR-ADDRESS consists of two parts: IP-address/sub-net mask. The subnet mask is specified as the number of leading bits of the IP-address that make up the mask. Thus /0 means 0 bits of the IP address, so that all IP addresses will be matched For example, 192.168.0.0/24 would mean match the first 24 bits, so any IP address of the form 192.168.0.x would match. You can also use "samenet" or "samehost".

Don't use the setting "password", as this allows a password in plain text.

181. Changing your password securely

If you are using password authentication, then you may wish to change your password from time to time.

The most basic method is to use psql. The \password command will prompt you for a new password, and then again to confirm. Connect to psql, and type the following:

\password

Enter new password.

This causes psql to send an SQL statement to the PostgreSQL server containing an already encrypted password string. An example of the SQL statement sent is as follows:

ALTER USER postgres PASSWORD ' md53175bce1d3201d16594cebf9d7eb3f9d';

182. Explain password file in postgres?

A password file contains the usual five fields that we need to connect, so that we can use file permissions to make the password more secure:

host:port:dbname:user:password

such as myhost:5432:postgres:sriggs:moresecure

The password file is located using an environment variable named PGPASSFILE. If PGPASSFILE is not set, then a default filename and location is searched, which:

ffOn *nix systems, check for ~/.pgpass.

ffOn Windows systems, check %APPDATA%\postgresql\pgpass.conf, where %APPDATA% is the Application Data subdirectory in the path. (For me, that would be C:\)

Don't forget: Set the file permissions on the file, so that security is maintained. The file permissions are not enforced on Windows, though the default location is secure. On *nix systems, you must issue the following:

chmod 0600 ~/.pgpass

If you forget to do this, the PostgreSQL client will ignore the .pgpass file silently. So don't forget!

Many people name the password file as .pgpass, whether or not they are on Windows, so don't get confused if they do this.

The password file can contain multiple lines. Each line is matched against the requested host:port:dbname:user combination until we find a line that matches, and then we use that password.

Each item can be a literal value or * a wildcard that matches anything. There is no support for partial matching. With appropriate permissions, a user could potentially connect to any database. Using the wildcard in the dbname and port fields makes sense, though is less useful in other fields.

Here are a few examples:

ffmyhost:5432:*:sriggs:moresecurepw

ffmyhost:5432:perf:hannu:okpw

ffmyhost:*:perf:gabriele:maggioresicurezza

183. Explain connection service file in postgres?

When the number of connection options gets too much, you may want to think about using a connection service file.

The connection service file allows you to give a single name to a set of connection parameters. This can be accessed centrally to avoid the need for individual users to know the host and port of the database, and is more resistant to future change.

First, create a file named pg_service.conf with the following contents:

[dbservice1] host=postgres1 port=5432 dbname=postgres

You can then either copy it into place at /etc/pg_service.conf or another agreed central location. You can then set the environment variable PGSYSCONFDIR to that directory location.

Now, you can then specify a connection string like the following:

service=dbservice1 user=sriggs

The service can also be set using an environment variable named PGSERVICE.

This applies to libpq connections only, so does not apply to JDBC.

The connection service file can also be used to specify the user, though that would mean that the username would be shared.

pg_service.conf and .pgpass can work together, or you can use just onr or the other, as you choose. Note that the pg_service.conf file is shared, and so is not a suitable place for passwords.

184. Troubleshooting a failed connection

Bear in mind that 90% of problems are just misunderstandings, and you'll be on track again fairly quickly.

ffCheck whether the database name and username are accurate:

You may be requesting a service on one system when the database you require is on another system. Recheck your credentials. Check especially that you haven't mixed things up so that you are using the database name as the username and/or the username as the database name. If you receive "too many connections", then you may need to disconnect another session before you can connect, or wait for the administrator to re-enable the connections.

ffCheck for explicit rejections:

If you receive the following error message: pg_hba.conf rejects connection for host …

then your connection attempt has been explicitly rejected by the database administrator for that server. You will not be able to connect from the current client system using those credentials. There is little point attempting to contact the administrator, as you are violating an explicit security policy in what you are attempting to do.

ffCheck for implicit rejections:

If the error message you receive is: no pg_hba.conf entry for …

then there is no explicit rule that matches your credentials. This is likely an oversight on the part of the administrator, and is common in very complex networks. Please contact the administrator, and request a ruling on whether your connection should be allowed (hopefully) or explicitly rejected in the future.

ffCheck whether the connection works with psql:

If you're trying to connect to PostgreSQL from anything other than the psql command-line utility, switch to that now. If you can make psql connect successfully, yet cannot make your main connection work correctly, then the problem may be in the local interface you are using.

ffCheck whether the server is up:

If a server is shut down, then you cannot connect. The typical problem here is simply mixing up to which server you are connecting. You need to specify the hostname and port, so it's possible you are mixing up those details.

ffCheck whether the server is up and accepting new connections:

A server that is shutting down will not accept new connections, apart from superusers. Also, a standby server may not have the hot_standby parameter enabled, preventing you from connecting.

ffCheck whether the server is listening correctly.

ffCheck the port on which the server is actually listening:

Confirm that the incoming request is arriving on interface listed in the listen_addresses parameter, or whether it is set to * for remote connections, or localhost for local connections.

ffCheck whether the database name and username exist:

It's possible the database or user no longer exists.

ffCheck the connection request:

Check whether the connection request was successful, yet was somehow dropped after connection. You can confirm this by looking at the server log when the following parameters are enabled:

log_connections = on

log_disconnections = on

ffCheck for other disconnection reasons:

If you are connecting to a standby server, it is possible that you have been disconnected because of hot standby conflicts.

185. Explain crash recovery of postgres?

Crash recovery is the PostgreSQL subsystem that saves us if the server should crash, or fail as a part of a system crash.

If PostgreSQL crashes there will be a message in the server log with severity-level PANIC. PostgreSQL will immediately restart and attempt to recover using the transaction log or Write Ahead Log (WAL).

The WAL consists of a series of files written to the pg_xlog subdirectory of the PostgreSQL data directory. Each change made to the database is recorded first in WAL, hence the name "write-ahead" log. When a transaction commits, the default and safe behavior is to force the WAL records to disk. If PostgreSQL should crash, the WAL will be replayed, which returns the database to the point of the last committed transaction, and thus ensures the durability of any database changes.

Note that the database changes themselves aren't written to disk at transaction commit. Those changes are written to disk sometime later by the "background writer" on a well-tuned server.

Crash recovery replays the WAL, though from what point does it start to recover?

Recovery starts from points in the WAL known as "checkpoints". The duration of crash recovery depends upon the number of changes in the transaction log since the last checkpoint. A checkpoint is a known safe starting point for recovery, since at that time we write all currently outstanding database changes to disk. A checkpoint can become a performance bottleneck on busy database servers because of the number of writes required. There are a number of ways of tuning that, though please also understand the effect on crash recovery that those tuning options may cause. Two parameters control the amount of WAL that can be written before the next checkpoint. The first is checkpoint_segments, which controls the number of 16 MB files that will be written before a checkpoint is triggered. The second is time-based, known as checkpoint_timeout, and is the number of seconds until the next checkpoint. A checkpoint is called whenever either of those two limits is reached.

It's tempting to banish checkpoints as much as possible by setting the following parameters:

checkpoint_segments = 1000

checkpoint_timeout = 3600

though if you do you might give some thought to how long the recovery will be if you do and whether you want that.

Also, you should make sure that the pg_xlog directory is mounted on disks with enough disk space for at least 3 x 16 MB x checkpoint_segments. Put another way, you need at least 32 GB of disk space for checkpoint_segments = 1000. If wal_keep_segments > 0 then the server can also use up to 16MB x (wal_keep_segments + checkpoint_segments).

Recovery continues until the end of the transaction log. We are writing this continually, so there is no defined end point; it is literally the last correct record. Each WAL record is individually CRC checked, so we know whether a record is complete and valid before trying to process it. Each record contains a pointer to the previous record, so we can tell that the record forms a valid link in the chain of actions recorded in WAL. As a result of that, recovery always ends with some kind of error reading the next WAL record. That is normal.

Recovery performance can be very fast, though it does depend upon the actions being recovered.

186. Explain pg_resetxlog in postgres?

It's possible for a problem to be caused replaying the transaction log, and for the database server to fail to start.

Some people's response to this is to use a utility named pg_resetxlog, which removes the current transaction log files and tidies up after that surgery has taken place.

pg_resetxlog destroys data changes and that means data loss. If you do decide to run that utility, make sure you take a backup of the pg_xlog directory first. My advice is to seek immediate assistance rather than do this. You don't know for certain that doing this will fix a problem, though once you've done it, you will have difficulty going backwards.

187. What is planning of backups?

The key thing to understand is that you should plan your recovery, not your backup. The type of backup you take influences the type of recovery that is possible, so you must give some thought to what you are trying to achieve beforehand.

If you want to plan your recovery, then you need to consider the different types of failures that can occur. What type of recovery do you wish to perform?

You need to consider the following main aspects:

ffFull/Partial database?

ffEverything or just object definitions only?

ffPoint In Time Recovery

ffRestore performance

188. What are the backup options in postgres?

Main backup options are

fflogical backup—using pg_dump

ffphysical backup—file system backup

pg_dump comes in two main flavors: pg_dump and pg_dumpall. pg_dump has a -F option to produce backups in various file formats. The file format is very important when it comes to restoring from backup, so you need to pay close attention to that.

Manoj Simar - The Technology Expert

About Me

Thursday, 7 June 2018

Interview Q and A for PostgreSQL Part - 8

1 comment: