121. PostgreSQL 9.0 Architecture
PostgreSQL Architecture includes sevaral things memory,
process and storage file system
PostgreSQL 9.0 Architecture Overview
PostgreSQL instance consists of set of Process and Memory.
PostgreSQL uses a simple "process per-user" client/server
model. PostgreSQL types of process.
- The 'postmaster', supervisory daemon process, 'postmaster' is
attached to shmmem segment but refrains from accessing to it.
- Utility processes (bgwriter, walwriter, syslogger, archiver,
statscollector and autovacuum lancher) and
- User Backend process (postgres process itself, Server Process)
When a client request for connection to the database,
firstly request is hit to Postmaster daemon process after performing
Authentication and authorization it spawns one new backend server
process(postgres). From that point on, the frontend process and the backend
server communicate without intervention by the postmaster. Hence, the
postmaster is always running, waiting for connection requests, whereas frontend
and backend processes come and go. The libpq library allows a single frontend
to make multiple connections to backend processes.
However, each backend process is a single-threaded process
that can only execute one query at a time; so, the communication over any one
frontend-to-backend connection is single-threaded. Postmaster and postgres
servers run with the user ID of the PostgreSQL "superuser".
One Postgres process exists for every open database session.
Once authenticated with user connection, it directly connects (with who and for
what purpose) with shared memory.
Memory:
Shared Buffers
WAL Buffers
clog Buffers
Other Buffers
Following links will brief more.
Utility Process:
Mandatory process: These processes are not
having an option of Enable/Disable.
- BGWriter
- WAL Writer
Optional Process: These processes are
having an option of Enable/Disable.
- Stats-collector
- Autovacuum launcher
- Archiver
- Syslogger
- WAL Sender
- WAL Receiver
122.
How to change all objects ownership in a particular schema in PostgreSQL
No special code included in a script, I basically picked
the technique suggested and simplified the implementation method via script.
Actually, REASSIGN
OWNED BY command does most of the work smoothly, however, it changes
database-wide objects ownership regardless of any schema. Two eventualities,
where you may not use REASSIGN OWNED BY:
1. If the user by mistake creates all his objects with super-user(postgres), and later intend to change to other user, then REASSIGN OWNED BY will not work and it merely error out as:
1. If the user by mistake creates all his objects with super-user(postgres), and later intend to change to other user, then REASSIGN OWNED BY will not work and it merely error out as:
- postgres=# reassign owned by postgres to user1;
- ERROR: cannot reassign ownership of objects owned by role postgres because they are required by the database system
2. If user wish to change just
only one schema objects ownership.
Either cases of changing objects, from "postgres" user to other user or just changing only one schema objects, we need to loop through each object by collecting object details from pg_catalog's & information_schema and calling ALTER TABLE / FUNCTION / AGGREGATE / TYPE etc.
I liked the technique of tweaking pg_dump output using OS commands(sed/egrep), because it known that by nature the pg_dump writes ALTER .. OWNER TO of every object (TABLES / SEQUENCES / VIEWS / FUNCTIONS / AGGREGATES / TYPES) in its output. Grep'ing those statements from pg_dump stdout by replacing new USER/SCHEMANAME with sed and then passing back those statements to psql client will fix the things even if the object owned by Postgres user. I used same approach in script and allowed user to pass NEW USER NAME and SCHEMA NAME, so to replace it in ALTER...OWNER TO.. statement.
Script usage and output:
Either cases of changing objects, from "postgres" user to other user or just changing only one schema objects, we need to loop through each object by collecting object details from pg_catalog's & information_schema and calling ALTER TABLE / FUNCTION / AGGREGATE / TYPE etc.
I liked the technique of tweaking pg_dump output using OS commands(sed/egrep), because it known that by nature the pg_dump writes ALTER .. OWNER TO of every object (TABLES / SEQUENCES / VIEWS / FUNCTIONS / AGGREGATES / TYPES) in its output. Grep'ing those statements from pg_dump stdout by replacing new USER/SCHEMANAME with sed and then passing back those statements to psql client will fix the things even if the object owned by Postgres user. I used same approach in script and allowed user to pass NEW USER NAME and SCHEMA NAME, so to replace it in ALTER...OWNER TO.. statement.
Script usage and output:
sh change_owner.sh -n new_rolename -S schema_name
-bash-4.1$ sh change_owner.sh -n user1 -S public
Summary:
Tables/Sequences/Views : 16
Functions : 43
Aggregates : 1
Type : 2
You can download the script from here, and there's also README to
help you on the usage.
123.
Duplicate Rows in a primary key Table.
"ERROR: could not create
unique index
DETAIL: Table contains duplicated values."
This error is thrown out by Postgres when it encounters duplicate rows in a primary key table by failing any of these command REINDEX or CREATE UNIQUE INDEX.
Why duplicate rows exists in a table ?
Two thing to my mind.
Firstly, it might be delayed index creation or if you have shared sequences in a database, sharing on two different Primary key Tables might be the cause while restoring the data into table (pg_restore). Secondly, if any huge transaction is taking place on that table and at the backend someone has abruptly stopped the instance, which might also fail the index (primary key) to point to the right row.
How to fix it?
Well, as common practice, when we encounter some duplicate rows in a table (despite of any reason), we first filter the duplicate rows and delete them, and later by doing REINDEX should fix the issue.
Query for finding duplicate rows:
DETAIL: Table contains duplicated values."
This error is thrown out by Postgres when it encounters duplicate rows in a primary key table by failing any of these command REINDEX or CREATE UNIQUE INDEX.
Why duplicate rows exists in a table ?
Two thing to my mind.
Firstly, it might be delayed index creation or if you have shared sequences in a database, sharing on two different Primary key Tables might be the cause while restoring the data into table (pg_restore). Secondly, if any huge transaction is taking place on that table and at the backend someone has abruptly stopped the instance, which might also fail the index (primary key) to point to the right row.
How to fix it?
Well, as common practice, when we encounter some duplicate rows in a table (despite of any reason), we first filter the duplicate rows and delete them, and later by doing REINDEX should fix the issue.
Query for finding duplicate rows:
- select count(*),primary_column from table_name group by primary_column having count(*) > 1;
Even after deleting the duplicate rows REINDEX or CREATE
UNIQUE INDEX fails, it means your index is not cleaned properly. Above query
might not be giving 100% result oriented output what you are expecting, because
the query is going to pick the index which is already corrupted with duplicate
rows. See the explain plan below.
postgres=# explain select count(*),id from duplicate_test group by id having count(*) > 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.00..5042.90 rows=99904 width=4)
Filter: (count(*) > 1)
-> Index Scan using duplicate_test_pkey on duplicate_test (cost=0.00..3044.82 rows=99904 width=4)
(3 rows)
We need to catch CTID of duplicate rows from the main table
and delete with conditional statement as CTID + PRIMARY KEY VALUE.
I have played a bit with pg_catalogs to voilate Primary Key Table to reproduce the scenario with similar error. (Please don't it)
I have played a bit with pg_catalogs to voilate Primary Key Table to reproduce the scenario with similar error. (Please don't it)
postgres=# create unique index idup on duplicate_test(id);
ERROR: could not create unique index "idup"
DETAIL: Key (id)=(10) is duplicated.
My Table Definition & Data:
postgres=# \d duplicate_test
Table "public.duplicate_test"
Column | Type | Modifiers
--------+---------+-----------
id | integer | not null
name | text |
Indexes:
"duplicate_test_pkey" PRIMARY KEY, btree (id)
postgres=# select * from duplicate_test ;
id | name
----+---------
10 | Raghav ---Duplicate
20 | John H
30 | Micheal
10 | Raghav ---Duplicate
(4 rows)
Now, lets fix this....
Step 1. Create a new table from effected table by pulling only two column values CTID and PRIMARY KEY.
Step 1. Create a new table from effected table by pulling only two column values CTID and PRIMARY KEY.
postgres=# CREATE TABLE dupfinder AS SELECT ctid AS tid, id FROM duplicate_test;
SELECT 4
Step 2. Now, lets run the duplicate finder query
with CTID to get the exact duplicates.
postgres=# select * from dupfinder x where exists (select 1 from dupfinder y where x.id = y.id and x.tid != y.tid);
tid | id
-------+----
(0,1) | 10
(0,5) | 10
(2 rows)
Step 3. On above result, now you can delete one row
from main table(effected table) with CTID.
postgres=# delete from duplicate_test where ctid='(0,5)' and id=10;
DELETE 1
Step 4. Now, your REINDEX or CREATE UNIQUE INDEX
will be successful.
postgres=# create unique index idup on duplicate_test(id);
CREATE INDEX
postgres=# select * from duplicate_test ;
id | name
----+---------
10 | Raghav
20 | John H
30 | Micheal
(3 rows)
Step 5. Don't forget to do immediate VACUUM ANALYZE
on the table to update the system catalogs as well CTID movement.
124. PostgreSQL 9.0 Memory & Processes
Every PostgreSQL Instance startup, there will be a set of
utilty process (including mandatory and optional process) and memory. Two
mandatory process (BGWRITER and WAL Writer) and four optional process
(Autovacuum launcher, stats collector, syslogger, and Archiver). You can check
it out with the command 'ps -ef | grep postgres' given below in figure
Overview of the Process and memory.
Above figure shows the processes attached to the PostgreSQL
Shared memory.
BGWriter/Writer Process:
BGWRITER or WRITER process is a mandotary process.
All PostgreSQL server process reads data from disk and
moves them into Shared Buffer Pool. Shared Buffer pool uses ARC algorithm or
LRU(least-recently used) mechanism to select the page it evicts from the pool.
BGWRITER spends much of its time sleeping, but every time it wakes, it searches
through the shared buffer pool looking for modified pages. After each search, the
BGWRITER chooses some number of modified pages, writes them to disk, and evicts
those pages from the shared buffer pool. BGWRITER process can be controlled
with three parameters BGWRITER_DELAY,BGWRITER_LRU_PERCENT and
BGWRITER_LRU_MAXPAGES.
WAL Writer Process:
WAL writer process is a mandatory process.
WAL writer process writes and fsync WAL at convenient
Intervals. WAL buffers hold the changes made to the database in the transaction
logs, in order to guarantee transaction security. WAL buffers are written out
to the disk at every transaction commit, as WAL writer process is responsible
to write on to the disk. WAL_WRITER_DELAY parameter for invoking the WAL Writer
Process, however there are other parameters which also keeps the WAL Writer
busy. Follow below link.
Stats Collector Process:
Stats collecotr process is optional process, default is
ON.
Stats collector process will collect the information about
the server activity. It counts number of access to the tables and indexes in
both disk-block and individual row items. It also tracks the total number of
rows in each table, and information about VACUUM and ANALYZE actions for each
table. Collection of statistics adds some overhead to query execution, whether
to collect or not collect information. Some of the parameter in the
postgresql.conf file will control the collection activity of the stats
collector process. Following link will brief more about the stats collector
process and its related parameters.
Autovacuum Launcher Process:
Autovacuuming is an optional Process, default is ON.
For automating the execution of VACUUM and ANALYZE command,
Autovacuum Launcher is a daemon process consists of multiple processes called
autovacuum workers. Autovacuum launcher is a charge of starting autovacuum
worker processes for all databases. Launcher will distribute the work across
time, attempting to start one worker on each database for every interval, set
by the parameter autovacuum_naptime. One worker will be launched for each
database, set by the parameter autovacuum_max_workers. Each worker process will
check each table within its database and execute VACUUM or ANALYZE as needed.
Following will brief about the AUTOVACUUM LAUNCHER PROCESS parameters.
Syslogger Process / Logger Process:
Logging is an optional process, default is OFF.
As per the figure, it is clearly understood that all the
utility process + User backends + Postmaster Daemon attached to syslogger
process for logging the information about their activities. Every process
information is logged under $PGDATA/pg_log with the file .log.
Note: If the data directory is created with INITDB command,
then their wont be pg_log directory under it. Explicit creation is needed.
Debugging more on the process information will cause
overhead on the Server. Minimul tunning is always recommended, however,
increasing the debug level when required. Link below will brief on logging
parameters.
Archiver Process:
Achiver process is optional process, default is OFF.
Above Figure is made from my observation on the Archiving
process in PostgreSQL. Setting up the database in Archive mode means, to
capture the WAL data of each segment file once it is filled, and save that data
somewhere before the segment file is recycled for reuse.
Diagrammatical explination on Numbering tags.
1. On Database Archivelog mode, once the WAL data is filled
in the WAL Segment, that filled segment named file is created under
$PGDATA/pg_xlog/archive_status by the WAL Writer naming the file as
".ready". File naming will be "segment-filename.ready".
2. Archiver Process triggers on finding the files which are
in ".ready" state created by the WAL Writer process. Archiver process
picks the 'segment-file_number' of .ready file and copies the file from
$PGDATA/pg_xlog location to its concerned Archive destination given in
'archive_command' parameter(postgresql.conf).
3. On successful completion of copy from source to destination,
archiver process renames the "segment-filename.ready" to
"segment-filename.done". This completes the archiving process.
It is understood that, if any files named
"segement-filename.ready" found in $PGDATA/pg_xlog/archive_status are
the pending files still to be copied to Archive destination.
For more information on parameters and Archiving, see the
below link.
125. Database File Layout
All the data
needed for a database cluster is stored within the cluster's data directory,
commonly referred to as PGDATA
(after the name of the environment variable that can be used to define it). A
common location for PGDATA is /var/lib/pgsql/data. Multiple clusters,
managed by different server instances, can exist on the same machine.
The PGDATA directory contains several
subdirectories and control files. In addition to these required items, the
cluster configuration files postgresql.conf,
pg_hba.conf, and pg_ident.conf are traditionally stored in PGDATA (although in PostgreSQL 8.0 and later,
it is possible to keep them elsewhere).
Item
|
Description
|
PG_VERSION
|
A file containing the major version number of PostgreSQL
|
base
|
Subdirectory containing per-database subdirectories
|
global
|
Subdirectory containing cluster-wide tables, such as pg_database
|
pg_clog
|
Subdirectory containing transaction commit status data
|
pg_multixact
|
Subdirectory containing multitransaction status data
(used for shared row locks)
|
pg_notify
|
Subdirectory containing LISTEN/NOTIFY status data
|
pg_serial
|
Subdirectory containing information about committed
serializable transactions
|
pg_snapshots
|
Subdirectory containing exported snapshots
|
pg_stat_tmp
|
Subdirectory containing temporary files for the
statistics subsystem
|
pg_subtrans
|
Subdirectory containing subtransaction status data
|
pg_tblspc
|
Subdirectory containing symbolic links to tablespaces
|
pg_twophase
|
Subdirectory containing state files for prepared
transactions
|
pg_xlog
|
Subdirectory containing WAL (Write Ahead Log) files
|
postmaster.opts
|
A file recording the command-line options the server was
last started with
|
postmaster.pid
|
A lock file recording the current postmaster process ID
(PID), cluster data directory path, postmaster start timestamp, port number,
Unix-domain socket directory path (empty on Windows), first valid
listen_address (IP address or *,
or empty if not listening on TCP), and shared memory segment ID (this file is
not present after server shutdown)
|
For each
database in the cluster there is a subdirectory within PGDATA/base, named after the database's
OID in pg_database. This
subdirectory is the default location for the database's files; in particular,
its system catalogs are stored there.
Each table and
index is stored in a separate file. For ordinary relations, these files are
named after the table or index's filenode number, which can be found in pg_class.relfilenode.
But for temporary relations, the file name is of the form tBBB_FFF, where BBB is the backend ID of the backend
which created the file, and FFF
is the filenode number. In either case, in addition to the main file (a/k/a
main fork), each table and index has a free space map, which stores
information about free space available in the relation. The free space map is
stored in a file named with the filenode number plus the suffix _fsm. Tables also have a visibility map,
stored in a fork with the suffix _vm,
to track which pages are known to have no dead tuples. The visibility map is
described further. Unlogged tables and indexes have a third fork, known as the
initialization fork, which is stored in a fork with the suffix _init.
Caution
|
Note that
while a table's filenode often matches its OID, this is not necessarily the case; some operations, like
TRUNCATE, REINDEX, CLUSTER and some forms of ALTER TABLE, can change the filenode while preserving the
OID. Avoid assuming that filenode and table OID are the same. Also, for
certain system catalogs including pg_class
itself, pg_class.relfilenode contains zero. The actual
filenode number of these catalogs is stored in a lower-level data structure,
and can be obtained using the
pg_relation_filenode()
function. |
When a table or
index exceeds 1 GB, it is divided into gigabyte-sized segments. The
first segment's file name is the same as the filenode; subsequent segments are
named filenode.1, filenode.2, etc. This arrangement avoids problems on
platforms that have file size limitations. (Actually, 1 GB is just the default
segment size. The segment size can be adjusted using the configuration option --with-segsize when building PostgreSQL.) In principle,
free space map and visibility map forks could require multiple segments as
well, though this is unlikely to happen in practice.
A table that has
columns with potentially large entries will have an associated TOAST
table, which is used for out-of-line storage of field values that are too large
to keep in the table rows proper. pg_class.reltoastrelid links from a table to its TOAST
table, if any.
Tablespaces make
the scenario more complicated. Each user-defined tablespace has a symbolic link
inside the PGDATA/pg_tblspc
directory, which points to the physical tablespace directory (i.e., the
location specified in the tablespace's CREATE
TABLESPACE command). This symbolic link is named after the
tablespace's OID. Inside the physical tablespace directory there is a
subdirectory with a name that depends on the PostgreSQL
server version, such as PG_9.0_201008051.
(The reason for using this subdirectory is so that successive versions of the
database can use the same CREATE TABLESPACE
location value without conflicts.) Within the version-specific subdirectory,
there is a subdirectory for each database that has elements in the tablespace,
named after the database's OID. Tables and indexes are stored within that
directory, using the filenode naming scheme. The pg_default tablespace is not accessed through pg_tblspc, but corresponds to PGDATA/base. Similarly, the pg_global tablespace is not accessed
through pg_tblspc, but
corresponds to PGDATA/global.
The
pg_relation_filepath()
function shows
the entire path (relative to PGDATA)
of any relation. It is often useful as a substitute for remembering many of the
above rules. But keep in mind that this function just gives the name of the
first segment of the main fork of the relation — you may need to append a
segment number and/or _fsm or _vm to find all the files associated with
the relation.
Temporary files
(for operations such as sorting more data than can fit in memory) are created
within PGDATA/base/pgsql_tmp, or
within a pgsql_tmp subdirectory
of a tablespace directory if a tablespace other than pg_default is specified for them. The name
of a temporary file has the form pgsql_tmpPPP.NNN,
where PPP is the PID of
the owning backend and NNN
distinguishes different temporary files of that backend.
No comments:
Post a Comment