Manoj Simar - The Technology Expert: Interview Q and A for PostgreSQL Part

126. Difference between Warm, hot standby and Streaming Replication:

We all know about replication and its types in postgresql. There are basic 3 types of replication in postgresql i.e Warm, hot standby and Streaming Replication.

Warm Standby:

==========

Its introduced in PostgreSQL 8.3(IIRC).

1. It is based on WAL log shipping which typically means WAL archives generated on Master will be transferred and applied at Standby side. So Warm Standby always waits for the WAL archive in which Master is currently writing and keeps throw messages like "cp: cannot stat <archive>: No such file or directory". So it is always one archive behind than Master and data loss will be max of 16MB(assuming a healthy warm standby by :-) )

2. In postgresql.conf file, you would need to change just three parameters in master; wal_level to archive, archive_mode and archive_command, however nothing in postgresql.conf file at standby side.

On Master:

wal_level = archive

archive_mode = on

archive_command = 'cp %p /path_to/archive/%f'

3. In recovery.conf file, three parameters; standby_mode, restore_command and trigger_file.

4. You cannot connect to Standby, so database is not even open for read operations (or read operations are not permitted on db).

Detailed explanation and related docs are here
Hot Standby:
========
Its introduce in PostgreSQL 9.0.
1. It is also based on WAL log shipping(same as warm standby). And of-course, WALs will be transferred and applied at Standby, so one WAL behind and always waits for the WAL archive in which Master is currently writing.
2. In postgresql.conf file, you would need to change wal_level to hot_standby, archive_mode and archive_command. Since you'll likely want to use pg_basebackup you should also set max_wal_senders to at least 2 or 3. And hot_stanby = on in standby conf file.
On Master:

wal_level = hot_standby

max_wal_senders = 5

wal_keep_segments = 32

archive_mode = on

archive_command = 'cp %p /path_to/archive/%f'

On Slave:

hot_standby = on

3. In recovery.conf file, three parameters; standby_mode, restore_command and trigger_file.

4. You can connect to Standby for read queries(you should set hot_stanby to ON in standby postgresql.conf file).
Detailed explanation and related docs are here.
Steaming Replication:
==============
Its introduced in PostgreSQL 9.0.
1. XLOG records generated in the primary are periodically shipped to the standby via the network. XLOG records shipped are replayed as soon as possible without waiting until XLOG file has been filled. The combination of Hot Standby and SR would make the latest data inserted into the primary visible in the standby almost immediately. So minimal data loss(almost only open transactions will be lost if its async rep, 0 loss if it is sync rep)

2. In postgresql.conf file, this time 5 parameters, streaming related params like below:
On Master:

wal_level = hot_standby

max_wal_senders = 5

wal_keep_segments = 32

archive_mode = on

archive_command = 'cp %p /path_to/archive/%f'

On Slave:

hot_standby=on

3. In recovery.conf file, you would need to an extra parameter including three which you add in hot/warm standby. i.e primary_conninfo, so below are four parameters:

standby_mode = 'on'

primary_conninfo = 'host=192.168.0.10 port=5432 user=postgres'

trigger_file = '/path_to/trigger'

restore_command = 'cp /path_to/archive/%f "%p"'

4. You can connect to Standby for read queries(you should set hot_stanby to ON in standby postgresql.conf file).
Detailed explanation and related docs are here: && http://bajis-postgres.blogspot.in/2013/12/step-by-step-guide-to-setup-steaming.html

127. Drop user which objects are in different databases of PG Instance.

While dropping user/role, DBA/Admin will get following error messages:

postgres=# drop user test;

ERROR: role "test" cannot be dropped because some objects depend on it

DETAIL: 2 objects in database test

Above error messages gives the sufficient information to Admin that there are some objects depend/Owned by the user exists in other database.
For dropping such user, there are two methods:
1. Reassign all the objects owned by the user to some other user and then drop the user.
Above is very useful, if employee, who left the company, has written some Procedure/objects, which is getting used in Application/process.
Command Which can be are following:

REASSIGN OWNED BY old_role to new_role;

DROP USER old_role;

Note:: reassign command need to be executed for all the databases under one PG instance.

2. First Drop all the objects owned by the user and then drop the user.
This is useful if admin don't want to keep the users objects and wants to drop all the objects owned by user:
Command which can be use are following:

DROP OWNED BY name [, ...] [ CASCADE | RESTRICT ];

DROP user username;

Note:: DROP OWNED BY NAME need to be executed for all the database.

To make it further better, I have written following Function, which uses the dblink to make connections to other database and execute the command to drop the user:

CREATE OR REPLACE function reassign_drop_user(text,text) returns integer

Declare

db record;

cmd text;

status text;

BEGIN

cmd:='REASSIGN OWNED BY '||$1||' TO '||$2||';';

for db in select datname from pg_database where datname!='template0'

Loop

execute 'select dblink_connect('||''''||'dbname='||db.datname||''''||')' into status;

RAISE NOTICE 'CONNCETION STATUS % :- %',db.datname,status;

execute 'select dblink_exec('||''''||cmd||''''||')' into status;

RAISE NOTICE 'COMMAND " % " STATUS :- %',cmd,status;

execute 'select dblink_disconnect()' into status;

END Loop;

execute 'DROP USER '||$1;

exception when others then

return 1;

Return 0;

END;

$$ language plpgsql;

I have kept return value integer, so that I can verify if the function is successful (0) or unsuccessful (1)
Output:

select reassign_drop_user('fromuser','touser');

NOTICE: CONNCETION STATUS template1 :- OK

NOTICE: COMMAND " REASSIGN OWNED BY fromuser TO touser; " STATUS :- REASSIGN OWNED

NOTICE: CONNCETION STATUS template_postgis :- OK

NOTICE: COMMAND " REASSIGN OWNED BY fromuser TO touser; " STATUS :- REASSIGN OWNED

NOTICE: CONNCETION STATUS test :- OK

NOTICE: COMMAND " REASSIGN OWNED BY fromuser TO touser; " STATUS :- REASSIGN OWNED

NOTICE: CONNCETION STATUS postgres :- OK

NOTICE: COMMAND " REASSIGN OWNED BY fromuser TO touser; " STATUS :- REASSIGN OWNED

reassign_drop_user

--------------------

(1 row)

Similarly for Dropping Objects with User following function can be use:

REATE OR REPLACE function drop_user_withobjects(text) returns integer

Declare

db record;

cmd text;

status text;

BEGIN

cmd:='DROP OWNED BY '||$1||' CASCADE;';

for db in select datname from pg_database where datname!='template0'

Loop

execute 'select dblink_connect('||''''||'dbname='||db.datname||''''||')' into status;

RAISE NOTICE 'CONNCETION STATUS % :- %',db.datname,status;

execute 'select dblink_exec('||''''||cmd||''''||')' into status;

RAISE NOTICE 'COMMAND " % " STATUS :- %',cmd,status;

execute 'select dblink_disconnect()' into status;

END Loop;

execute 'DROP USER '||$1;

exception when others then

return 1;

Return 0;

END;

$$ language plpgsql;

128. Controlling automatic database maintenance

Exercising control requires some thinking about what you actually want:

ffWhich are the best times of day to do things? When are system resources more available?

ffWhich days are quiet, and are which not?

ffWhich tables are critical to the application, and which are not?

The first thing to do is to make sure autovacuum is switched on. You must have both the following parameters enabled in your postgresql.conf:

ffautovacuum = on

fftrack_counts = on

PostgreSQL controls autovacuum with 35 individually tunable parameters.

The following parameters can be set in postgresql.conf:

ffautovacuum

ffautovacuum_analyze_scale_factor

ffautovacuum_analyze_threshold

ffautovacuum_freeze_max_age

ffautovacuum_max_workers

ffautovacuum_naptime

ffautovacuum_vacuum_cost_delay

ffautovacuum_vacuum_cost_limit

ffautovacuum_vacuum_scale_factor

ffautovacuum_vacuum_threshold

fflog_autovacuum_min_duration

Individual tables can be controlled by "storage parameters", which are set using the following:

ALTER TABLE mytable SET (storage_parameter = value);

Example -- ALTER TABLE big_table SET (autovacuum_enabled = off);

The storage parameters that relate to maintenance are as follows:

ffautovacuum_enabled

ffautovacuum_vacuum_cost_delay

ffautovacuum_vacuum_cost_limit

ffautovacuum_vacuum_scale_factor

ffautovacuum_vacuum_threshold

ffautovacuum_freeze_min_age

ffautovacuum_freeze_max_age

ffautovacuum_freeze_table_age

ffautovacuum_analyze_scale_factor

ffautovacuum_analyze_threshold

and "toast" tables can be controlled with the following parameters:

fftoast.autovacuum_enabled

fftoast.autovacuum_vacuum_cost_delay

fftoast.autovacuum_vacuum_cost_limit

fftoast.autovacuum_vacuum_scale_factor

fftoast.autovacuum_vacuum_threshold

fftoast.autovacuum_freeze_min_age

fftoast.autovacuum_freeze_max_age

fftoast.autovacuum_freeze_table_age

fftoast.autovacuum_analyze_scale_factor

fftoast.autovacuum_analyze_threshold

If autovacuum is set, then autovacuum will wake up every autovacuum_naptime seconds, and decide whether to run VACUUM and/or ANALYZE commands. There will never be more than autovacuum_max_workers maintenance processes running at any one time.

increasing autovacuum_vacuum_cost_delay will slow down each VACUUM to reduce the impact on user activity. Autovacuum will run an ANALYZE command when there have been at least autovacuum_analyze_threshold changes, and a fraction of the table defined by autovacuum_ analyze_scale_factor has been inserted, updated, or deleted.

Autovacuum will run a VACUUM command when there have been at least autovacuum_vacuum_ threshold changes, and a fraction of the table defined by autovacuum_vacuum_scale_factor has been updated or deleted.

If you set log_autovacuum_min_duration, then any autovacuum that runs for longer than this value will be logged to the server log,

129. What is toast table ?

The toast table is the location where oversize column values get placed, which the documents refer to as "supplementary storage tables". If there are no oversize values, then the toast table will occupy little space. Tables with very wide values often have large toast tables. Toast (stands for the oversize attribute storage technique) is optimized for UPDATE. If you have a heavily updated table, the toast table is untouched, so it may makes sense to turn off auto vacuuming of the toast table as follows:

ALTER TABLE pgbench_accounts

SET ( autovacuum_vacuum_cost_delay = 20

,toast.autovacuum_enabled = off);

which will turn off autovacuuming of the "toast" table.

Note that autovacuuming of the toast table is performed completely separately from the main table, even though you can't ask for an explicit include/exclude of the toast table yourself when running VACUUM.

130. What is difference between VACUUM and VACUUM FULL?

VACUUM allows inserts, updates, and deletes while it runs, though it prevents actions such as ALTER TABLE and CREATE INDEX. Autovacuum can detect if a user requests a conflicting lock on the table while it runs, and will cancel itself if it is getting in the user's way.

Note that VACUUM does not shrink a table when it runs, unless there is a large run of space at the end of a table, and nobody is accessing the table when we try to shrink it. To properly shrink a table, you need VACUUM FULL. That locks up the whole table for a long time, and should be avoided, if possible. VACUUM FULL will literally rewrite every row of the table, and completely rebuild all indexes.

131. We have a website that is busy, mainly during the daytime, with some occasional night time use. How we can configure auto vacuum in this situation?

We decide to have two profiles, one for daytime, when we want only less aggressive autovacuuming, and another at night, where we allow more aggressive vacuuming.

postgresql.conf also allows "include directives", which look like the following:

include 'autovacuum.conf'

These specify another file that will be read at that point, just as if those parameters had been included in the main file.

This can be used to maintain multiple sets of files for autovacuum configuration.

We add the following lines to postgresql.conf:

ffautovacuum = on

ffautovacuum_max_workers = 3

ffinclude 'autovacuum.conf'

and remove all other autovacuum parameters.

We then create one file named autovacuum.conf.day, containing the following parameters:

ffautovacuum_analyze_scale_factor = 0.1

ffautovacuum_analyze_threshold = 50

ffautovacuum_vacuum_cost_delay = 30

ffautovacuum_vacuum_cost_limit = -1

ffautovacuum_vacuum_scale_factor = 0.2

ffautovacuum_vacuum_threshold = 50

and another file named autovacuum.conf.night, that contains the following parameters:

ffautovacuum_analyze_scale_factor = 0.05

ffautovacuum_analyze_threshold = 50

ffautovacuum_vacuum_cost_delay = 10

ffautovacuum_vacuum_cost_limit = -1

ffautovacuum_vacuum_scale_factor = 0.1

ffautovacuum_vacuum_threshold = 50

To swap profiles, we would simply do the following actions:

$ ln -sf autovacuum.conf.night autovacuum.conf

$ pg_ctl -D datadir reload # server reload command

(customized depending upon your platform).

This then allows us to switch profiles twice per day without needing to edit the configuration files. You can also tell easily which is the active profile simply by looking at the full details of the linked file (using ls –l). The exact details of the schedule are up to you; night/day was just an example, which is unlikely to suit everybody.

132. What is freezing in postgres ?

PostgreSQL performs regular sweeps to clean out old transaction identifiers, which is known as "freezing". It does this to defer transaction wraparound.

There are two routes that a row can take in PostgreSQL: the row version dies and needs to be removed by VACUUM, or a row version gets old enough to need to be frozen, also performed by the VACUUM process.

Freezing takes place when a transaction identifier on a row becomes more than vacuum_ freeze_min_age transactions older than the current next value. Normal VACUUMs will perform a small amount of freezing as you go, and in most cases, you won't notice at all. As example, large transactions leave many rows with the same transaction identifiers, so those might cause problems at freezing time.

VACUUM is normally optimized to look only at chunks of a table that require cleaning. When a table reaches vacuum_freeze_table_age, we ignore that optimization, and scan the whole table. While it does so, it's fairly likely to see rows that need freezing, which need to be re-written. So that is what causes the great increase in I/O.

133. How to confirm the absence of page corruptions in postgres ?

VACUUM is also an efficient way to confirm the absence of page corruptions, so it is worth scanning the whole database from time-to-time, every block. To do this, you can run the following script on each of your databases:

SET vacuum_freeze_table_age = 0;

VACUUM;

You can do this table-by-table as well.

If you've never had a corrupt block, then you may only need to scan maybe every two-to-three months. If you start to get corrupt blocks, then you may want to increase the scan rate to confirm everything is OK. Corrupt blocks are usually hardware-induced, though they show up as database errors. It's possible but rare that the corruption was instead from a PostgreSQL bug.

There's no easy way to fix page corruptions at present. There are ways to investigate and extract data from corrupt blocks, for example, using the contrib/pageinspect utility.

134. How postgres manage/store transaction internal and what is transaction wraps.

PostgreSQL uses internal transaction identifiers that are four bytes long, so we only have 2^31 transaction ids (about two billion). PostgreSQL wraps around and starts again from the beginning when that wraps around, allocating new identifiers in a circular manner. The reason we do this is that moving to an eight-byte identifier has various other negative effects and costs that we would rather not pay, so we keep the four-byte transaction identifier, which also has costs.

PostgreSQL is designed to continue using ids even after the system wraps. Properly maintained, everything will keep working forever and you'll never notice what happens on the inside. To allow that to happen we need to run regular VACUUMs.

135. How Avoiding transaction wraparound

First: have you seen the following message?

WARNING: database "postgres" must be vacuumed within XXX transactions.

HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.

You might also need to commit or roll back old prepared transactions.

Or even worse, the following message:

ERROR: database is not accepting commands to avoid wraparound data loss in database "template0"

HINT: Stop the postmaster and use a standalone backend to vacuum that database.

You might also need to commit or roll back old prepared transactions.

If not, then you don't need to do anything apart from normal planned maintenance. Those messages are reported to users, and they are also written to the server log.

If you've received the WARNING, then follow both hints. First, let's do the suggested VACUUM on the appropriate database

Either run the following:

$ vacuumdb postgres

or use the following:

psql -c "VACUUM" postgres

or use your admin tool to initiate a VACUUM on the appropriate database

If you received the aforementioned ERROR, and the database is no longer accepting commands you're probably wondering what the phrase use a standalone backend to vacuum that database means.

A "standalone backend" means running the database server from just a single executable process. This is the equivalent of unix run-level 1, also known as single user mode. We restrict access to the database to just a single user.

The command to do this is the following, noting that the -–single must be the very first command on the command line:

$ postgres --single -D /full/path/to/datadir postgres

which then returns the following command line prompt:

PostgreSQL stand-alone backend 9.0

backend>

and you can then run the VACUUM from there, as follows:

PostgreSQL stand-alone backend 9.0

backend> VACUUM;

backend>

when you're finished, type <CTRL>-D (or whatever you have set EOF to be for your terminal window) once or twice if you also used the -j option.

You should also check for old prepared transactions and Remove old prepared transactions.

136. What is prepared transactions? How Removing old prepared transactions

Prepared transactions are persistent across crashes, so you can't just do a fast restart to get rid of them. They have both an internal transaction identifier and an external "global identifier". Either of those can be used to locate locked resources, and decide how to resolve the transactions.

Removing a prepared transaction is also referred to as "resolving in-doubt transactions". The transaction is literally stuck between committing and aborting. The database or transaction manager crashed, leaving the transaction mid-way through the two-phase commit process.

Prepared transactions are part of the "two-phase commit" feature, also known as 2PC. A transaction commits in two stages rather than one, allowing multiple databases to have synchronized commits. It's typical use is to combine multiple "resource managers" using the XA protocol, usually provided by a Transaction Manager (TM), as used by the Java Transaction API (JTA) and others. If none of that meant anything to you, then you probably don't have any prepared transactions.

First, check the setting of max_prepared_transactions. If this is zero, then you don't have any.

SHOW max_prepared_transactions;

If your setting is more than zero, then look to see if you have any. As an example, you may find something like the following:

postgres=# SELECT * FROM pg_prepared_xacts;

-[ RECORD 1 ]------------------------------

transaction | 121083

gid | prep1

prepared | 2010-03-28 15:47:57.637868+01

owner | postgres

database | postgres

where the gid ("global identifier") will usually have been automatically generated

To resolve the transaction, we need to decide whether we want that change, or not. The best way is to check what happened externally to PostgreSQL. That should help you decide.

If you wish to commit the changes, then:

COMMIT PREPARED 'prep1';

or if you want to rollback the changes then:

ROLLBACK PREPARED 'prep1';

The pg_locks view shows locks are held by prepared transactions. You can get a full report of what is being locked using the following query:

postgres=# SELECT l.locktype, x.database, l.relation, l.page, l.tuple,l.classid, l.objid, l.objsubid,

l.mode, x.transaction, x.gid, x.prepared, x.owner FROM pg_locks l JOIN pg_prepared_xacts x

ON l.virtualtransaction = '-1/' || x.transaction::text;

The table locks are listed as being held by a virtual transaction. A simpler query is the following:

postgres=# SELECT DISTINCT x.database, l.relation FROM pg_locks l JOIN pg_prepared_xacts x

ON l.virtualtransaction = '-1/' || x.transaction::text WHERE l.locktype != 'transactionid';

database | relation

----------+----------

postgres | 16390

postgres | 16401

(2 rows)

This tells you which relations in which databases have been touched by the remaining prepared transactions. We can't tell the names because we'd need to connect to those databases to check.

You can then fully scan each of those tables, looking for changes like the following:

SELECT * FROM table WHERE xmin = 121083;

which will show you all the rows in that table inserted by transaction 121083, taken from the transaction column of pg_prepared_xacts.

137. Actions for heavy users of temporary tables

If you are a heavy user of temporary tables in your applications, then there are some additional actions you may need to perform.

There are four main things to check, which are as follows:

1. Make sure you run VACUUM on system tables, or enable autovacuum to do this for you.

2. Monitor running queries to see how many and how large temporary files are active.

3. Tune memory parameters. Think about increasing the temp_buffers parameter, though be careful not to overallocate memory by doing so.

4. Separate temp table I/O. In a query intensive system, you may find that read/write to temporary files exceeds reads/writes on permanent data tables and indexes. In this case, you should create new tablespace(s) on separate disks, and ensure that the temp_tablespaces parameter is configured to use the additional tablespace(s).

In PostgreSQL 9.0, when we create a temporary table, we insert entries into the catalog tables pg_class and pg_attribute. These catalog tables and their indexes begin to grow and to bloat

To control that growth, you can either VACUUM those tables manually, or set autovacuum = on in postgreql.conf.

If you VACUUM the system catalog tables manually, make sure you get all of the system tables. You can get the full list of tables to VACUUM using the following query:

postgres=# SELECT relname, pg_relation_size(oid) FROM pg_class

WHERE relkind in ('i','r') and relnamespace = 11 ORDER BY 2 DESC;

Note -- You cannot run ALTER TABLE against system tables, so it is not possible to set specific autovacuum settings for any of these tables.

138. Explain HOT update and non-HOT update.

HOT updates take place when the UPDATE statement does not change any of the column values that are indexed by any index. If you change even one column that is indexed by just one index then it will be a non-HOT update, and there will be a performance hit. non-HOT updates cause indexes to bloat

139. Identifying and fixing bloated tables and indexes

Non-primary key indexes are also prone to some bloat from normal inserts, as is common in most relational databases. non-HOT updates cause indexes to bloat. Autovacuum does not detect bloated indexes, nor does it do anything to rebuild indexes

CREATE OR REPLACE VIEW av_needed AS

SELECT *,

n_dead_tup > av_threshold AS "av_needed",

CASE WHEN reltuples > 0

THEN round(100.0 * n_dead_tup / (reltuples))

ELSE 0

END AS pct_dead

FROM

(SELECT N.nspname, C.relname,

pg_stat_get_tuples_inserted(C.oid) AS n_tup_ins,

pg_stat_get_tuples_updated(C.oid) AS n_tup_upd,

pg_stat_get_tuples_deleted(C.oid) AS n_tup_del,

pg_stat_get_tuples_hot_updated(C.oid)::real /

pg_stat_get_tuples_updated(C.oid) AS HOT_update_ratio,

pg_stat_get_live_tuples(C.oid) AS n_live_tup,

pg_stat_get_dead_tuples(C.oid) AS n_dead_tup,

C.reltuples AS reltuples,round(

current_setting('autovacuum_vacuum_threshold')::integer

+current_setting('autovacuum_vacuum_scale_factor')::numeric

* C.reltuples)

AS av_threshold, date_trunc('minute',greatest(pg_stat_get_last_

vacuum_time(C.oid),pg_stat_get_last_autovacuum_time(C.oid))) AS last_

vacuum, date_trunc('minute',greatest(pg_stat_get_last_analyze_time(C.

oid),pg_stat_get_last_analyze_time(C.oid))) AS last_analyze

FROM pg_class C LEFT JOIN

pg_index I ON C.oid = I.indrelid

LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)

WHERE C.relkind IN ('r', 't')

AND N.nspname NOT IN ('pg_catalog', 'information_schema') AND

N.nspname !~ '^pg_toast'

) AS av

ORDER BY av_needed DESC,n_dead_tup DESC;

-[ RECORD 1 ]----+------------------------

nspname | public

relname | pgbench_accounts

n_tup_ins | 100001

n_tup_upd | 117201

n_tup_del | 1

hot_update_ratio | 0.123454578032611

n_live_tup | 100000

n_dead_tup | 0

reltuples | 100000

av_threshold | 20050

last_vacuum | 2010-04-29 01:33:00+01

last_analyze | 2010-04-28 15:21:00+01

av_needed | f

pct_dead | 0

Which we can then use to look at individual tables as follows:

postgres=# \x

postgres=# SELECT * FROM av_needed

WHERE relation = 'public.pgbench_accounts';

non-HOT updates cause indexes to bloat. The following query is useful in investigating index size, and how that changes over time. It runs fairly quickly, and can be used to monitor whether your indexes are changing in size over time.

SELECT

nspname,relname,

round(100 * pg_relation_size(indexrelid) /

pg_relation_size(indrelid)) / 100

AS index_ratio,

pg_size_pretty(pg_relation_size(indexrelid))

AS index_size,

pg_size_pretty(pg_relation_size(indrelid))

AS table_size

FROM pg_index I

LEFT JOIN pg_class C ON (C.oid = I.indexrelid)

LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)

WHERE

nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast') AND

C.relkind='i' AND

pg_relation_size(indrelid) > 0;

Another route is to use the contrib/pgstattuple module, supplied with PostgreSQL. This provides overkill statistics about what's happening in your tables and indexes, which it derives by scanning the whole table or index, and literally counting everything.

Just use carefully: if you have time to scan the table, you may as well have VACUUMed the whole table anyway.

Scan tables using pgstattuple() as follows:

test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');

-[ RECORD 1 ]------+-------

table_len | 458752

tuple_count | 1470

tuple_len | 438896

tuple_percent | 95.67

dead_tuple_count | 11

dead_tuple_len | 3157

dead_tuple_percent | 0.69

free_space | 8932

free_percent | 1.95

and scan indexes using pgstatindex() as follows:

postgres=> SELECT * FROM pgstatindex('pg_cast_oid_index');

-[ RECORD 1 ]------+------

version | 2

tree_level | 0

index_size | 8192

root_block_no | 1

internal_pages | 0

leaf_pages | 1

empty_pages | 0

deleted_pages | 0

avg_leaf_density | 50.27

leaf_fragmentation | 0

140. Explain index in postgreSQL ?

Indexes are designed for performance, and in all databases, deleting index entries causes contention and loss of performance. PostgreSQL does not remove index entries for a row when that row is deleted, so an index can fill with dead entries. PostgreSQL does attempt to remove dead entries when a block becomes full, though that doesn't stop small numbers of dead entries accumulating in many data blocks.

PostgreSQL supports a command known as CREATE INDEX CONCURRENTLY, that builds an index without taking a full table lock. CREATE INDEX CONCURRENTLY allows inserts, updates, and deletes while the index is being created. It cannot be executed inside another transaction, and only one per table can be created at any time. Because you can't add a primary index to a table concurrently, in PostgreSQL 9.0 at least.

When we create an index using CREATE INDEX CONCURRENTLY, it is a two-stage process. The first phase builds the index, and then marks it invalid. Inserts, updates, and deletes now begin maintaining the index, but we do a further pass over the table to see if we missed anything before we declare the index valid. User queries don't use the index until it says valid.

Once the index is built and the flag is valid, then if we set the flag to invalid, the index will still be maintained, just not used by queries. This allows us to turn the index off quickly, though with the option to turn it back on again if we realize we actually do need the index after all.

PostgreSQL also supports the ability to have two indexes, with different names, that have exactly the same definition.

PostgreSQL supports commands that will rebuild indexes for you. The client utility reindexdb allows you to execute the REINDEX command in a convenient way from the operating system:

$ reindexdb

This executes the SQL REINDEX command on every table in the default database. If you want to reindex all databases, then use the following:

$ reindexdb –a

REINDEX puts a full table lock (AccessExclusiveLock) on the table while it runs.

141. Planning maintenance

Build a regular cycle of activity around the following tasks

· Capacity planning

· Backups, recovery testing, and emergency planning

· Vacuum and index maintenance

· Server log file analysis

· Security and intrusion detection

· Understanding usage patterns

· Long term performance analysis

142. Why SQL don't use my indexes OR query not using an index

Indexes are not used by every query. Indexes are used only if the table is larger than a minimum size, and the query selects only a small percentage of the rows in the table. This is because the random disk access caused by an index scan can be slower than a straight read through the table, or sequential scan.

To determine if an index should be used, PostgreSQL must have statistics about the table. These statistics are collected using VACUUM ANALYZE, or simply ANALYZE. Using statistics, the optimizer knows how many rows are in the table, and can better determine if indexes should be used. Statistics are also valuable in determining optimal join order and join methods. Statistics collection should be performed periodically as the contents of the table change.

Indexes are normally not used for ORDER BY or to perform joins. A sequential scan followed by an explicit sort is usually faster than an index scan of a large table. However, LIMIT combined with ORDER BY often will use an index because only a small portion of the table is returned.

If you believe the optimizer is incorrect in choosing a sequential scan, use SET enable_seqscan TO 'off' and run query again to see if an index scan is indeed faster.

When using wild-card operators such as LIKE or ~, indexes can only be used in certain circumstances:

The beginning of the search string must be anchored to the start of the string, i.e.

LIKE patterns must not start with % or _.
~ (regular expression) patterns must start with ^.

The search string can not start with a character class, e.g. [a-e].

Case-insensitive searches such as ILIKE and ~* do not utilize indexes. Instead, use expression indexes.

C locale must be used during initdb because sorting in a non-C locale often doesn't match the behavior of LIKE. You can create a special text_pattern_ops index that will work in such cases, but note it is only helpful for LIKE indexing.

It is also possible to use full text indexing for word searches.

The SlowQueryQuestions article contains some more tips and guidance.

143. How do I force a query to use an index

set enable_seqscan to false

If you do:

set enable_seqscan to false;

you tell PostgreSQL that it is really very expensive to do sequential scans. It still does a seqscan (instead of failing) if it is the only way to do the query.

Once you enable seqscans again, it will use a sequential scan instead of the more costly (in this case) bitmap index scan as follows:

set enable_seqscan to true;

Lower random_page_cost

For a softer nudge towards using indexes, set random_page_cost to a lower value, maybe even make it equal to seq_page_cost. This makes PostgreSQL prefer index scans on more occasions, but does still not produce entirely unreasonable plans, at least for cases where data is mostly cached in shared buffers or systems disk cache.

Default values for these parameters are as follows:

random_page_cost = 4;

seq_page_cost = 1;

Try setting:

set random_page_cost = 2;

and see if it helps; if not, set it to 1.

144. Is the user connected?

Make sure that you are logged in as a superuser

SELECT datname FROM pg_stat_activity WHERE usename = 'bob';

If this query returns any rows, then database user bob is connected to database.

SELECT datname,usename,client_addr,client_port FROM pg_stat_activity ;

The client_addr and client_port help you look up the exact computer, and even the process on that computer that has connected to this database.

145. What are SQL running?

Make sure that you are logged in as a superuser or as the same database user you want to check.

Make sure that the parameter track_activities = on is set.

This can be done either in the postgresql.conf file or by the superuser using the following SQL statement:

SET track_activities = on

All connected users are running now, just run the following:

SELECT datname,usename,current_query FROM pg_stat_activity ;

Information for only active queries, exclude the idle ones by running the following:

SELECT datname,usename,current_query FROM pg_stat_activity

WHERE current_query != '<IDLE>' ;

If you want, you can also make the queries being run show up in process titles, by setting the following:

update_process_title = on

Although ps or top output is not the best place for watching the database queries;

146. What is difference between pg_cancel_backend(processid) and pg_terminate_backend(processid)

When a backend executes the pg_terminate_backend(processid) function, it sends a signal SIGQUIT to the backend given as an argument, after checking that the process identified by the argument processid actually is a PostgreSQL backend. The backend receiving this signal stops whatever it is doing, and terminates it in a controlled way. The client using that backend loses the connection to database.

The difference between these two is that pg_cancel_backend() just cancels the current query, whereas pg_terminate_backend() really kills the backend.

Manoj Simar - The Technology Expert

About Me

Thursday, 7 June 2018

Interview Q and A for PostgreSQL Part - 6

126. Difference between Warm, hot standby and Streaming Replication:

127. Drop user which objects are in different databases of PG Instance.

129. What is toast table ?

,toast.autovacuum_enabled = off);

142. Why SQL don't use my indexes OR query not using an index

No comments:

Post a Comment