126. Difference between Warm, hot standby
and Streaming Replication:
We
all know about replication and its types in postgresql. There are basic 3 types
of replication in postgresql i.e Warm, hot standby and Streaming
Replication.
Warm Standby:
==========
Its
introduced in PostgreSQL 8.3(IIRC).
1. It is based on WAL log shipping which typically means WAL archives generated on Master will be transferred and applied at Standby side. So Warm Standby always waits for the WAL archive in which Master is currently writing and keeps throw messages like "cp: cannot stat <archive>: No such file or directory". So it is always one archive behind than Master and data loss will be max of 16MB(assuming a healthy warm standby by :-) )
2.
In postgresql.conf file, you would need to change just three parameters in
master; wal_level to archive, archive_mode and archive_command, however nothing
in postgresql.conf file at standby side.
On Master:
wal_level = archive
archive_mode = on
archive_command = 'cp %p /path_to/archive/%f'
3.
In recovery.conf file, three parameters; standby_mode, restore_command and
trigger_file.
4.
You cannot connect to Standby, so database is not even open for read operations
(or read operations are not permitted on db).
Detailed explanation and
related docs are here
Hot Standby:
========
Its introduce in PostgreSQL 9.0.
1. It is also based on WAL log shipping(same as warm standby). And of-course, WALs will be transferred and applied at Standby, so one WAL behind and always waits for the WAL archive in which Master is currently writing.
2. In postgresql.conf file, you would need to change wal_level to hot_standby, archive_mode and archive_command. Since you'll likely want to use pg_basebackup you should also set max_wal_senders to at least 2 or 3. And hot_stanby = on in standby conf file.
On Master:
Hot Standby:
========
Its introduce in PostgreSQL 9.0.
1. It is also based on WAL log shipping(same as warm standby). And of-course, WALs will be transferred and applied at Standby, so one WAL behind and always waits for the WAL archive in which Master is currently writing.
2. In postgresql.conf file, you would need to change wal_level to hot_standby, archive_mode and archive_command. Since you'll likely want to use pg_basebackup you should also set max_wal_senders to at least 2 or 3. And hot_stanby = on in standby conf file.
On Master:
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
archive_mode = on
archive_command = 'cp %p /path_to/archive/%f'
On
Slave:
hot_standby = on
3. In recovery.conf file,
three parameters; standby_mode, restore_command and trigger_file.
4. You can connect to
Standby for read queries(you should set hot_stanby to ON in standby
postgresql.conf file).
Detailed explanation and related docs are here.
Steaming Replication:
==============
Its introduced in PostgreSQL 9.0.
1. XLOG records generated in the primary are periodically shipped to the standby via the network. XLOG records shipped are replayed as soon as possible without waiting until XLOG file has been filled. The combination of Hot Standby and SR would make the latest data inserted into the primary visible in the standby almost immediately. So minimal data loss(almost only open transactions will be lost if its async rep, 0 loss if it is sync rep)
Detailed explanation and related docs are here.
Steaming Replication:
==============
Its introduced in PostgreSQL 9.0.
1. XLOG records generated in the primary are periodically shipped to the standby via the network. XLOG records shipped are replayed as soon as possible without waiting until XLOG file has been filled. The combination of Hot Standby and SR would make the latest data inserted into the primary visible in the standby almost immediately. So minimal data loss(almost only open transactions will be lost if its async rep, 0 loss if it is sync rep)
2. In postgresql.conf
file, this time 5 parameters, streaming related params like below:
On Master:
On Master:
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32
archive_mode = on
archive_command = 'cp %p /path_to/archive/%f'
On
Slave:
hot_standby=on
3. In recovery.conf file,
you would need to an extra parameter including three which you add in hot/warm
standby. i.e primary_conninfo, so below are four parameters:
standby_mode = 'on'
primary_conninfo = 'host=192.168.0.10 port=5432 user=postgres'
trigger_file = '/path_to/trigger'
restore_command = 'cp /path_to/archive/%f "%p"'
4. You can connect to
Standby for read queries(you should set hot_stanby to ON in standby
postgresql.conf file).
Detailed explanation and related docs are here: && http://bajis-postgres.blogspot.in/2013/12/step-by-step-guide-to-setup-steaming.html
Detailed explanation and related docs are here: && http://bajis-postgres.blogspot.in/2013/12/step-by-step-guide-to-setup-steaming.html
127. Drop user which
objects are in different databases of PG Instance.
While dropping user/role, DBA/Admin will get following
error messages:
postgres=# drop user test;
ERROR: role "test" cannot be dropped because some objects depend on it
DETAIL: 2 objects in database test
Above error messages gives the sufficient information to
Admin that there are some objects depend/Owned by the user exists in other
database.
For dropping such user, there are two methods:
1. Reassign all the objects owned by the user to some other user and then drop the user.
Above is very useful, if employee, who left the company, has written some Procedure/objects, which is getting used in Application/process.
Command Which can be are following:
For dropping such user, there are two methods:
1. Reassign all the objects owned by the user to some other user and then drop the user.
Above is very useful, if employee, who left the company, has written some Procedure/objects, which is getting used in Application/process.
Command Which can be are following:
REASSIGN OWNED BY old_role to new_role;
DROP USER old_role;
Note:: reassign command need to be executed for all the databases under one PG instance.
2. First Drop all the objects owned by the user and then drop the user.
This is useful if admin don't want to keep the users objects and wants to drop all the objects owned by user:
Command which can be use are following:
DROP OWNED BY name [, ...] [ CASCADE | RESTRICT ];
DROP user username;
Note:: DROP OWNED BY NAME need to be executed for all the
database.
To make it further better, I have written following Function, which uses the dblink to make connections to other database and execute the command to drop the user:
To make it further better, I have written following Function, which uses the dblink to make connections to other database and execute the command to drop the user:
CREATE OR REPLACE function reassign_drop_user(text,text) returns integer
AS
$$
Declare
db record;
cmd text;
status text;
BEGIN
cmd:='REASSIGN OWNED BY '||$1||' TO '||$2||';';
for db in select datname from pg_database where datname!='template0'
Loop
execute 'select dblink_connect('||''''||'dbname='||db.datname||''''||')' into status;
RAISE NOTICE 'CONNCETION STATUS % :- %',db.datname,status;
execute 'select dblink_exec('||''''||cmd||''''||')' into status;
RAISE NOTICE 'COMMAND " % " STATUS :- %',cmd,status;
execute 'select dblink_disconnect()' into status;
END Loop;
execute 'DROP USER '||$1;
exception when others then
return 1;
Return 0;
END;
$$ language plpgsql;
I have kept return value integer, so that I can verify if
the function is successful (0) or unsuccessful (1)
Output:
Output:
select reassign_drop_user('fromuser','touser');
NOTICE: CONNCETION STATUS template1 :- OK
NOTICE: COMMAND " REASSIGN OWNED BY fromuser TO touser; " STATUS :- REASSIGN OWNED
NOTICE: CONNCETION STATUS template_postgis :- OK
NOTICE: COMMAND " REASSIGN OWNED BY fromuser TO touser; " STATUS :- REASSIGN OWNED
NOTICE: CONNCETION STATUS test :- OK
NOTICE: COMMAND " REASSIGN OWNED BY fromuser TO touser; " STATUS :- REASSIGN OWNED
NOTICE: CONNCETION STATUS postgres :- OK
NOTICE: COMMAND " REASSIGN OWNED BY fromuser TO touser; " STATUS :- REASSIGN OWNED
reassign_drop_user
--------------------
0
(1 row)
Similarly for Dropping Objects with User following function can be use:
REATE OR REPLACE function drop_user_withobjects(text) returns integer
AS
$$
Declare
db record;
cmd text;
status text;
BEGIN
cmd:='DROP OWNED BY '||$1||' CASCADE;';
for db in select datname from pg_database where datname!='template0'
Loop
execute 'select dblink_connect('||''''||'dbname='||db.datname||''''||')' into status;
RAISE NOTICE 'CONNCETION STATUS % :- %',db.datname,status;
execute 'select dblink_exec('||''''||cmd||''''||')' into status;
RAISE NOTICE 'COMMAND " % " STATUS :- %',cmd,status;
execute 'select dblink_disconnect()' into status;
END Loop;
execute 'DROP USER '||$1;
exception when others then
return 1;
Return 0;
END;
$$ language plpgsql;
128. Controlling automatic database
maintenance
Exercising control requires some thinking about what you actually
want:
ffWhich are the best times of day to do things?
When are system resources more available?
ffWhich days are quiet, and are which not?
ffWhich tables
are critical to the application, and which are not?
The first thing to do is to make sure autovacuum is switched on.
You must have both the following parameters enabled in your postgresql.conf:
ffautovacuum =
on
fftrack_counts =
on
PostgreSQL controls autovacuum with 35
individually tunable parameters.
The following parameters can be set in postgresql.conf:
ffautovacuum
ffautovacuum_analyze_scale_factor
ffautovacuum_analyze_threshold
ffautovacuum_freeze_max_age
ffautovacuum_max_workers
ffautovacuum_naptime
ffautovacuum_vacuum_cost_delay
ffautovacuum_vacuum_cost_limit
ffautovacuum_vacuum_scale_factor
ffautovacuum_vacuum_threshold
fflog_autovacuum_min_duration
Individual tables can be controlled by "storage
parameters", which are set using
the following:
ALTER TABLE
mytable SET (storage_parameter = value);
Example -- ALTER
TABLE big_table SET (autovacuum_enabled = off);
The storage parameters
that relate to maintenance are as follows:
ffautovacuum_enabled
ffautovacuum_vacuum_cost_delay
ffautovacuum_vacuum_cost_limit
ffautovacuum_vacuum_scale_factor
ffautovacuum_vacuum_threshold
ffautovacuum_freeze_min_age
ffautovacuum_freeze_max_age
ffautovacuum_freeze_table_age
ffautovacuum_analyze_scale_factor
ffautovacuum_analyze_threshold
and "toast" tables can be controlled with
the following parameters:
fftoast.autovacuum_enabled
fftoast.autovacuum_vacuum_cost_delay
fftoast.autovacuum_vacuum_cost_limit
fftoast.autovacuum_vacuum_scale_factor
fftoast.autovacuum_vacuum_threshold
fftoast.autovacuum_freeze_min_age
fftoast.autovacuum_freeze_max_age
fftoast.autovacuum_freeze_table_age
fftoast.autovacuum_analyze_scale_factor
fftoast.autovacuum_analyze_threshold
If autovacuum is set,
then autovacuum will wake up every autovacuum_naptime seconds, and decide
whether to run VACUUM and/or
ANALYZE commands.
There will never be more than autovacuum_max_workers maintenance processes
running at any one time.
increasing
autovacuum_vacuum_cost_delay will slow down each VACUUM to reduce the impact on
user activity. Autovacuum will run an ANALYZE command when there
have been at least autovacuum_analyze_threshold changes, and a fraction of the
table defined by autovacuum_ analyze_scale_factor has been inserted, updated,
or deleted.
Autovacuum will run a VACUUM command
when there have been at least autovacuum_vacuum_ threshold changes, and a
fraction of the table defined by autovacuum_vacuum_scale_factor has been
updated or deleted.
If you set
log_autovacuum_min_duration, then any autovacuum that runs for longer than this
value will be logged to the server log,
129. What is toast table ?
The toast table is the location where oversize column values get
placed, which the documents refer to as "supplementary storage
tables". If there are no oversize values, then the toast table will occupy
little space. Tables with very wide values often have large toast tables. Toast
(stands for the oversize attribute
storage technique) is optimized for UPDATE. If you have a
heavily updated table, the toast table is untouched, so it may makes sense to
turn off auto vacuuming of the toast table as follows:
ALTER TABLE pgbench_accounts
SET ( autovacuum_vacuum_cost_delay = 20
,toast.autovacuum_enabled =
off);
which will turn off autovacuuming of the "toast" table.
Note that
autovacuuming of the toast table is performed completely separately from the
main table, even though you can't ask for an explicit include/exclude of the
toast table yourself when running VACUUM.
130. What is difference between VACUUM
and VACUUM FULL?
VACUUM allows inserts, updates, and deletes while it runs, though it
prevents actions such as ALTER TABLE and CREATE INDEX. Autovacuum can detect if a user requests a conflicting lock on
the table while it runs, and will cancel itself if it is getting in the user's
way.
Note that VACUUM does not shrink a
table when it runs, unless there is a large run of space at the end of a table,
and nobody is accessing the table when we try to shrink it. To properly shrink
a table, you need VACUUM FULL. That locks up the whole table for a long time, and should be
avoided, if possible. VACUUM FULL will literally rewrite every row of the table, and completely
rebuild all indexes.
142. Why
SQL don't use my indexes OR query not
using an index
131. We have a website
that is busy, mainly during the daytime, with some occasional night time use.
How we can configure auto vacuum in this situation?
We decide to have two profiles, one for daytime, when we want only
less aggressive autovacuuming, and another at night, where we allow more
aggressive vacuuming.
postgresql.conf also allows "include directives", which look like the
following:
include
'autovacuum.conf'
These specify another
file that will be read at that point, just as if those parameters had been
included in the main file.
This can be used to maintain multiple sets of files for autovacuum
configuration.
We add the following lines to postgresql.conf:
ffautovacuum
= on
ffautovacuum_max_workers
= 3
ffinclude 'autovacuum.conf'
and remove all other autovacuum parameters.
We then create one file named autovacuum.conf.day,
containing the following parameters:
ffautovacuum_analyze_scale_factor
= 0.1
ffautovacuum_analyze_threshold
= 50
ffautovacuum_vacuum_cost_delay
= 30
ffautovacuum_vacuum_cost_limit
= -1
ffautovacuum_vacuum_scale_factor
= 0.2
ffautovacuum_vacuum_threshold =
50
and another file named autovacuum.conf.night,
that contains the following parameters:
ffautovacuum_analyze_scale_factor
= 0.05
ffautovacuum_analyze_threshold
= 50
ffautovacuum_vacuum_cost_delay
= 10
ffautovacuum_vacuum_cost_limit
= -1
ffautovacuum_vacuum_scale_factor
= 0.1
ffautovacuum_vacuum_threshold =
50
To swap profiles, we would simply do the following actions:
$ ln -sf autovacuum.conf.night
autovacuum.conf
$ pg_ctl -D datadir reload #
server reload command
(customized depending
upon your platform).
This then allows us to
switch profiles twice per day without needing to edit the configuration files.
You can also tell easily which is the active profile simply by looking at the
full details of the linked file (using ls –l). The exact details of the
schedule are up to you; night/day was just an example, which is unlikely to
suit everybody.
132. What is freezing
in postgres ?
PostgreSQL performs regular sweeps to clean out old transaction
identifiers, which is known as "freezing". It does this to defer
transaction wraparound.
There are two routes that a row can take in
PostgreSQL: the row version dies and needs to be removed by VACUUM, or a row
version gets old enough to need to be frozen, also performed by the VACUUM
process.
Freezing takes place when a transaction identifier on a row
becomes more than vacuum_ freeze_min_age
transactions older than the current next value. Normal VACUUMs will perform
a small amount of freezing as you go, and in most cases, you won't notice at
all. As example, large transactions leave many rows with the same transaction
identifiers, so those might cause problems at freezing time.
VACUUM is normally
optimized to look only at chunks of a table that require cleaning. When a table
reaches vacuum_freeze_table_age, we ignore that optimization, and scan the
whole table. While it does so, it's fairly likely to see rows that need
freezing, which need to be re-written. So that is what causes the great
increase in I/O.
133. How to confirm
the absence of page corruptions in postgres ?
VACUUM is also an
efficient way to confirm the absence of page corruptions, so it is worth
scanning the whole database from time-to-time, every block. To do this, you can
run the following script on each of your databases:
SET
vacuum_freeze_table_age = 0;
VACUUM;
You can do this
table-by-table as well.
If you've never had a corrupt
block, then you may only need to scan maybe every two-to-three months. If you
start to get corrupt blocks, then you may want to increase the scan rate to
confirm everything is OK. Corrupt blocks are usually hardware-induced, though
they show up as database errors. It's possible but rare that the corruption was
instead from a PostgreSQL bug.
There's no easy way to
fix page corruptions at present. There are ways to investigate and extract data
from corrupt blocks, for example, using the contrib/pageinspect utility.
134. How postgres manage/store transaction internal and what is
transaction wraps.
PostgreSQL uses internal transaction identifiers that are four
bytes long, so we only have 2^31 transaction ids (about two billion).
PostgreSQL wraps around and starts again from the beginning when that wraps
around, allocating new identifiers in a circular manner. The reason we do this
is that moving to an eight-byte identifier has various other negative effects
and costs that we would rather not pay, so we keep the four-byte transaction
identifier, which also has costs.
PostgreSQL is designed to continue using
ids even after the system wraps. Properly maintained, everything will keep
working forever and you'll never notice what happens on the inside. To allow
that to happen we need to run regular VACUUMs.
135. How Avoiding
transaction wraparound
First: have you seen the following message?
WARNING: database "postgres" must
be vacuumed within XXX transactions.
HINT: To avoid a database shutdown, execute
a database-wide VACUUM in that database.
You might also need to commit or roll back
old prepared transactions.
Or even worse, the following message:
ERROR: database is not accepting commands
to avoid wraparound data loss in database "template0"
HINT: Stop the postmaster and use a
standalone backend to vacuum that database.
You might also need to commit or roll back
old prepared transactions.
If not, then you don't need to do anything
apart from normal planned maintenance. Those messages are reported to users,
and they are also written to the server log.
If you've received the WARNING, then follow
both hints. First, let's do the suggested VACUUM on the appropriate database
Either run the following:
$ vacuumdb postgres
or use the following:
psql -c "VACUUM"
postgres
or use your admin tool to initiate a VACUUM
on the appropriate database
If you received the aforementioned ERROR, and the database is no
longer accepting commands you're probably wondering what the phrase use a
standalone backend to vacuum that database means.
A "standalone backend" means
running the database server from just a single executable process. This is the
equivalent of unix run-level 1, also known as single user mode. We restrict
access to the database to just a single user.
The command to do this is the following, noting that the -–single
must be the very first command on the command line:
$ postgres --single -D
/full/path/to/datadir postgres
which then returns the following command line prompt:
PostgreSQL stand-alone backend
9.0
backend>
and you can then run the VACUUM from there, as follows:
PostgreSQL stand-alone backend
9.0
backend> VACUUM;
backend>
when you're finished, type <CTRL>-D (or whatever
you have set EOF to be for your terminal window) once or twice if you also used
the -j option.
You should also check for old prepared
transactions and Remove old prepared
transactions.
136. What is prepared transactions? How Removing old prepared transactions
Prepared
transactions are persistent across crashes, so you can't just do a fast restart
to get rid of them. They have both an internal transaction identifier and an
external "global identifier". Either of those can be used to locate
locked resources, and decide how to resolve the transactions.
Removing
a prepared transaction is also referred to as "resolving in-doubt
transactions". The transaction is literally stuck between committing and
aborting. The database or transaction manager crashed, leaving the transaction
mid-way through the two-phase commit process.
Prepared
transactions are part of the "two-phase commit" feature, also known
as 2PC. A transaction
commits in two stages rather than one, allowing multiple databases to have
synchronized commits. It's typical use is to combine multiple "resource
managers" using the XA protocol, usually provided by a Transaction Manager (TM), as used by
the Java Transaction API (JTA) and others. If none of that meant anything to you, then you
probably don't have any prepared transactions.
First,
check the setting of max_prepared_transactions. If this is
zero, then you don't have any.
SHOW
max_prepared_transactions;
If your setting is
more than zero, then look to see if you have any. As an example, you may find
something like the following:
postgres=# SELECT * FROM pg_prepared_xacts;
-[ RECORD 1
]------------------------------
transaction
| 121083
gid | prep1
prepared |
2010-03-28 15:47:57.637868+01
owner |
postgres
database | postgres
where the gid
("global identifier") will usually have
been automatically generated
To resolve the transaction, we need to
decide whether we want that change, or not. The best way is to check what
happened externally to PostgreSQL. That should help you decide.
If you wish to commit
the changes, then:
COMMIT
PREPARED 'prep1';
or if you want to
rollback the changes then:
ROLLBACK PREPARED 'prep1';
The pg_locks
view shows locks are held by prepared transactions. You can get a
full report of what is being locked using the following query:
postgres=#
SELECT l.locktype, x.database, l.relation, l.page, l.tuple,l.classid, l.objid,
l.objsubid,
l.mode,
x.transaction, x.gid, x.prepared, x.owner FROM pg_locks l JOIN pg_prepared_xacts x
ON
l.virtualtransaction = '-1/' || x.transaction::text;
The table locks are listed as being held by a virtual transaction.
A simpler query is the following:
postgres=# SELECT DISTINCT x.database,
l.relation FROM pg_locks l JOIN pg_prepared_xacts x
ON l.virtualtransaction = '-1/' ||
x.transaction::text WHERE l.locktype !=
'transactionid';
database | relation
----------+----------
postgres | 16390
postgres | 16401
(2 rows)
This tells you which relations in which databases have been
touched by the remaining prepared transactions. We can't tell the names because
we'd need to connect to those databases to check.
You can then fully
scan each of those tables, looking for changes like the following:
SELECT *
FROM table WHERE xmin = 121083;
which will show you all the rows in that
table inserted by transaction 121083, taken from the transaction column of pg_prepared_xacts.
137. Actions for heavy
users of temporary tables
If
you are a heavy user of temporary tables in your applications, then there are
some additional actions you may need to perform.
There are four main things to check, which are as follows:
1. Make sure you run VACUUM on system tables, or enable autovacuum
to do this for you.
2. Monitor running queries to see how many and how large temporary
files are active.
3. Tune memory parameters. Think about increasing the temp_buffers
parameter, though be careful not to overallocate memory by doing so.
4. Separate temp table
I/O. In a query intensive system, you may find that read/write to temporary
files exceeds reads/writes on permanent data tables and indexes. In this case,
you should create new tablespace(s) on separate disks, and ensure that the temp_tablespaces
parameter is configured to use the additional
tablespace(s).
In PostgreSQL 9.0, when we create a
temporary table, we insert entries into the catalog tables pg_class and
pg_attribute. These catalog tables and their indexes begin to grow and to bloat
To control that growth, you can either
VACUUM those tables manually, or set autovacuum = on in postgreql.conf.
If you VACUUM the system catalog tables manually, make sure you
get all of the system tables. You can get the full list of tables to VACUUM
using the following query:
postgres=# SELECT relname, pg_relation_size(oid) FROM pg_class
WHERE relkind in ('i','r') and relnamespace = 11 ORDER BY 2 DESC;
Note -- You cannot run ALTER
TABLE against system
tables, so it is not possible to set specific autovacuum settings for any of
these tables.
138. Explain HOT update and non-HOT update.
HOT
updates take place when the UPDATE statement does not change any of the column
values that are indexed by any index. If you change even one column that is
indexed by just one index then it will be a non-HOT update, and there will be a
performance hit. non-HOT updates cause indexes to bloat
139. Identifying and
fixing bloated tables and indexes
Non-primary key
indexes are also prone to some bloat from normal inserts, as is common in most
relational databases. non-HOT updates cause indexes to bloat. Autovacuum does
not detect bloated indexes, nor does it do anything to rebuild indexes
CREATE OR REPLACE VIEW av_needed AS
SELECT *,
n_dead_tup > av_threshold AS "av_needed",
CASE WHEN reltuples > 0
THEN round(100.0 * n_dead_tup / (reltuples))
ELSE 0
END AS pct_dead
FROM
(SELECT N.nspname, C.relname,
pg_stat_get_tuples_inserted(C.oid) AS n_tup_ins,
pg_stat_get_tuples_updated(C.oid) AS n_tup_upd,
pg_stat_get_tuples_deleted(C.oid) AS n_tup_del,
pg_stat_get_tuples_hot_updated(C.oid)::real /
pg_stat_get_tuples_updated(C.oid) AS HOT_update_ratio,
pg_stat_get_live_tuples(C.oid) AS n_live_tup,
pg_stat_get_dead_tuples(C.oid) AS n_dead_tup,
C.reltuples AS reltuples,round(
current_setting('autovacuum_vacuum_threshold')::integer
+current_setting('autovacuum_vacuum_scale_factor')::numeric
* C.reltuples)
AS av_threshold, date_trunc('minute',greatest(pg_stat_get_last_
vacuum_time(C.oid),pg_stat_get_last_autovacuum_time(C.oid))) AS
last_
vacuum,
date_trunc('minute',greatest(pg_stat_get_last_analyze_time(C.
oid),pg_stat_get_last_analyze_time(C.oid))) AS last_analyze
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't')
AND N.nspname NOT IN ('pg_catalog', 'information_schema') AND
N.nspname !~ '^pg_toast'
) AS av
ORDER BY av_needed DESC,n_dead_tup DESC;
-[ RECORD 1 ]----+------------------------
nspname | public
relname | pgbench_accounts
n_tup_ins | 100001
n_tup_upd | 117201
n_tup_del
| 1
hot_update_ratio
| 0.123454578032611
n_live_tup | 100000
n_dead_tup | 0
reltuples | 100000
av_threshold | 20050
last_vacuum | 2010-04-29 01:33:00+01
last_analyze | 2010-04-28 15:21:00+01
av_needed | f
pct_dead | 0
Which we can then use
to look at individual tables as follows:
postgres=# \x
postgres=# SELECT * FROM av_needed
WHERE relation = 'public.pgbench_accounts';
non-HOT updates cause indexes to bloat. The following query is
useful in investigating index size, and how that changes over time. It runs
fairly quickly, and can be used to monitor whether your indexes are changing in
size over time.
SELECT
nspname,relname,
round(100 * pg_relation_size(indexrelid) /
pg_relation_size(indrelid)) / 100
AS index_ratio,
pg_size_pretty(pg_relation_size(indexrelid))
AS index_size,
pg_size_pretty(pg_relation_size(indrelid))
AS table_size
FROM pg_index I
LEFT JOIN pg_class C ON (C.oid = I.indexrelid)
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE
nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
AND
C.relkind='i' AND
pg_relation_size(indrelid) > 0;
Another route is to use the contrib/pgstattuple module, supplied
with PostgreSQL. This provides overkill statistics about what's happening in
your tables and indexes, which it derives by scanning the whole table or index,
and literally counting everything.
Just use carefully: if you have time to scan the table, you may as
well have VACUUMed the whole table anyway.
Scan tables using pgstattuple() as follows:
test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
-[ RECORD 1 ]------+-------
table_len | 458752
tuple_count | 1470
tuple_len | 438896
tuple_percent | 95.67
dead_tuple_count | 11
dead_tuple_len | 3157
dead_tuple_percent | 0.69
free_space | 8932
free_percent | 1.95
and scan indexes using pgstatindex() as follows:
postgres=> SELECT * FROM pgstatindex('pg_cast_oid_index');
-[ RECORD 1 ]------+------
version | 2
tree_level | 0
index_size | 8192
root_block_no | 1
internal_pages | 0
leaf_pages | 1
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 50.27
leaf_fragmentation | 0
140. Explain index in postgreSQL ?
Indexes are designed for performance, and
in all databases, deleting index entries causes contention and loss of
performance. PostgreSQL does not remove index entries for a row when that row
is deleted, so an index can fill with dead entries. PostgreSQL does attempt to
remove dead entries when a block becomes full, though that doesn't stop small
numbers of dead entries accumulating in many data blocks.
PostgreSQL
supports a command known as CREATE
INDEX CONCURRENTLY, that builds an index without
taking a full table lock. CREATE INDEX CONCURRENTLY allows inserts, updates,
and deletes while the index is being created. It cannot be executed inside
another transaction, and only one per table can be created at any time. Because
you can't add a primary index to a table concurrently, in PostgreSQL 9.0 at
least.
When we
create an index using CREATE INDEX CONCURRENTLY, it
is a two-stage process. The first phase builds the index, and then marks it
invalid. Inserts, updates, and deletes now begin maintaining the index, but we
do a further pass over the table to see if we missed anything before we declare
the index valid. User queries don't use the index until it says valid.
Once
the index is built and the flag is valid, then if we set the flag to invalid,
the index will still be maintained, just not used by queries. This allows us to
turn the index off quickly, though with the option to turn it back on again if
we realize we actually do need the index after all.
PostgreSQL
also supports the ability to have two indexes, with different names, that have
exactly the same definition.
PostgreSQL supports commands that will rebuild indexes for you.
The client utility reindexdb allows
you to execute the REINDEX
command in a convenient way from the operating system:
$ reindexdb
This executes the SQL REINDEX command on every table
in the default database. If you want to reindex all databases, then use the
following:
$ reindexdb –a
REINDEX
puts a full table lock (AccessExclusiveLock) on the table while it runs.
141. Planning maintenance
Build a regular cycle of activity around
the following tasks
·
Capacity planning
·
Backups, recovery
testing, and emergency
planning
·
Vacuum and index maintenance
·
Server log file analysis
·
Security and intrusion detection
·
Understanding usage patterns
·
Long term performance analysis
142. Why
SQL don't use my indexes OR query not
using an index
Indexes are not
used by every query. Indexes are used only if the table is larger than a
minimum size, and the query selects only a small percentage of the rows in the
table. This is because the random disk access caused by an index scan can be
slower than a straight read through the table, or sequential scan.
To determine if
an index should be used, PostgreSQL must have statistics about the table. These
statistics are collected using VACUUM ANALYZE, or simply ANALYZE. Using
statistics, the optimizer knows how many rows are in the table, and can better
determine if indexes should be used. Statistics are also valuable in
determining optimal join order and join methods. Statistics collection should
be performed periodically as the contents of the table change.
Indexes are
normally not used for ORDER BY or to perform joins. A sequential scan followed
by an explicit sort is usually faster than an index scan of a large table.
However, LIMIT combined with ORDER BY often will use an index because only a
small portion of the table is returned.
If you believe
the optimizer is incorrect in choosing a sequential scan, use SET
enable_seqscan TO 'off' and run query again to see if an index scan is indeed
faster.
When using
wild-card operators such as LIKE or ~, indexes can only be used in certain
circumstances:
- The beginning of the search string must be anchored to the
start of the string, i.e.
- LIKE patterns must not start with % or _.
- ~ (regular expression) patterns must start with ^.
- The search string can not start with a character class, e.g.
[a-e].
- Case-insensitive searches such as ILIKE and ~* do not utilize
indexes. Instead, use expression indexes.
- C locale must be used during initdb because sorting in a non-C
locale often doesn't match the behavior of LIKE. You can create a special
text_pattern_ops index that will work in such cases, but note it is only
helpful for LIKE indexing.
It is also
possible to use full text indexing for word searches.
The SlowQueryQuestions article
contains some more tips and guidance.
143. How do I force a
query to use an index
set enable_seqscan to false
If you do:
set
enable_seqscan to false;
you tell PostgreSQL
that it is really very expensive to do sequential scans. It still does a
seqscan (instead of failing) if it is the only way to do the query.
Once you enable seqscans again, it will use a sequential
scan instead of the more costly (in this case) bitmap index scan as follows:
set enable_seqscan to true;
Lower random_page_cost
For a softer nudge towards using
indexes, set random_page_cost to a lower value, maybe even make it equal to seq_page_cost.
This makes PostgreSQL prefer index scans on more occasions, but does still not
produce entirely unreasonable plans, at least for cases where data is mostly
cached in shared buffers or systems disk cache.
Default values for these parameters
are as follows:
random_page_cost
= 4;
seq_page_cost
= 1;
Try setting:
set
random_page_cost = 2;
and see if it helps; if not, set it
to 1.
144. Is the user connected?
Make sure that you are logged in as
a superuser
SELECT
datname FROM pg_stat_activity WHERE usename = 'bob';
If this query returns any rows, then
database user bob is connected to database.
SELECT
datname,usename,client_addr,client_port FROM pg_stat_activity ;
The client_addr and client_port help
you look up the exact computer, and even the process on that computer that has
connected to this database.
145. What are SQL running?
Make sure that you are logged in as
a superuser or as the same database user you want to check.
Make sure that the parameter track_activities = on is set.
This can be done either in the
postgresql.conf file or by the superuser using the following SQL statement:
SET
track_activities = on
All connected users are running now, just run the following:
SELECT
datname,usename,current_query FROM pg_stat_activity ;
Information for only active queries, exclude the idle ones
by running the following:
SELECT datname,usename,current_query FROM pg_stat_activity
WHERE
current_query != '<IDLE>' ;
If you want, you can also make the
queries being run show up in process titles, by setting the following:
update_process_title = on
Although ps or top output is not the
best place for watching the database queries;
146. What is difference
between pg_cancel_backend(processid) and
pg_terminate_backend(processid)
When a backend executes the pg_terminate_backend(processid)
function, it sends a signal SIGQUIT to the backend given as an argument, after
checking that the process identified by the argument processid actually is a
PostgreSQL backend. The backend receiving this signal stops whatever it is
doing, and terminates it in a controlled way. The client using that backend
loses the connection to database.
The difference between these two is
that pg_cancel_backend() just cancels the current query, whereas
pg_terminate_backend() really kills the backend.
No comments:
Post a Comment