91. Why
is "SELECT count(*) FROM bigtable;" slow?
In 9.2 and above
it generally isn't thanks to Index-only scans.
For more
information on older versions, see Slow
Counting.
92. Why
is my query much slower when run as a prepared query?
In PostgreSQL
9.2 and above this issue is rare because PostgreSQL can decide to use a generic
or value-optimised plan on a per-execution basis. The planner might still make
the wrong choice, so the following remains somewhat relevant:
When PostgreSQL
has the full query with all parameters known by planning time, it can use
statistics in the table to find out if the values used in the query are very
common or very uncommon in a column. This lets it change the way it fetches the
data to be more efficient, as it knows to expect lots or very few results from
a certain part of the query. For example, it might choose an sequential scan
instead of doing an index scan if you search for 'active=y' and it knows that
99% of the records in the table have 'active=y', because in this case a
sequential scan will be much faster.
In a prepared
query, PostgreSQL doesn't have the value of all parameters when it's creating
the plan. It has to try to pick a "safe" plan that should work fairly
well no matter what value you supply as the parameter when you execute the
prepared query. Unfortunately, this plan might not be very appropriate if the
value you supply is vastly more common, or vastly less common, than is average
for some randomly selected values in the table.
If you suspect
this issue is affecting you, start by using the EXPLAIN
command to compare the slow and fast queries. Look at the output of
EXPLAIN SELECT query...
and compare it
to the result of PREPARE query... ;
EXPLAIN EXECUTE query...
to see if the plans are notably
different. EXPLAIN ANALYZE
may give you more information, such as row count estimates and counts.
Usually people
having this problem are trying to use prepared queries as a security measure to
prevent SQL injection, rather than as a performance tuning option for
expensive-to-plan queries frequently executed with a variety of different
parameters. Those people should consider using client-side prepared statements
if their client interface (eg PgJDBC) supports it. The PostgreSQL protocol
supports parameterised queries without server-side persistent prepared
statements and most client drivers support using that via their client side
prepared statement interfaces.
At present,
PostgreSQL does not offer a way to request re-planning of a prepared statement
using a particular set of parameter values; however, 9.2 and above may do so
automatically where statistics indicate that it is desirable.
93. Why is my query much slower when run
in a function than standalone?
Queries in
PL/PgSQL functions are prepared and cached, so they execute in much the same
way as if you'd
PREPARE
d Then
EXECUTE
d the query yourself.
If you're having
really severe issues with this that improving the table statistics or adjusting
your query don't help with, you can work around it by forcing PL/PgSQL to
re-prepare your query at every execution. To do this, use the
EXECUTE ... USING
statement in PL/PgSQL
to supply your query as a textual string. Alternately, the quote_literal
or quote_nullable functions may be used to escape parameters
substituted into query text.
94. Why
do my strings sort incorrectly?
First, make sure
you are using the locale you want to be using. Use
SHOW lc_collate
to show the
database-wide locale in effect. If you are using per-column collations, check
those. If everything is how you want it, then read on.
PostgreSQL uses
the C library's locale facilities for sorting strings. So if the sort order of
the strings is not what you expect, the issue is likely in the C library. You
can verify the C library's idea of sorting using the
sort
utility on a text file, e.g., LC_COLLATE=xx_YY.utf8 sort testfile.txt
If this results
in the same order that PostgreSQL gives you, then the problem is outside of
PostgreSQL.
PostgreSQL
deviates from the libc behavior in so far as it breaks ties by sorting strings
in byte order. This should rarely make a difference in practice, and is usually
not the source of the problem when users complain about the sort order, but it
could affect cases where, for example, combining and precombined Unicode
characters are mixed.
If the problem
is in the C library, you will have to take it up with your operating system
maintainers. Note, however, that while actual bugs in locale definitions of C
libraries have been known to exist, it is more likely that the C library is
correct, where "correct" means it follows some recognized international
or national standard. Possibly, you are expecting one of multiple equally valid
interpretations of a language's sorting rules.
Common complaint
patterns include:
- Spaces and special characters: The sorting algorithm normally
works in multiple passes. First, all the letters are compared, ignoring
spaces and punctuation. Then, spaces and punctuation are compared to break
ties. (This is a simplification of what actually happens.) It's not
possible to change this without changing the locale definitions themselves
(and even then it's difficult). You might want to restructure your data
slightly to avoid this problem. For example, if you are sorting a name
field, you could split the field into first and last name fields, avoiding
the space in between.
- Upper/lower case: Locales other than the C locale generally
sort upper and lower case letters together. So the order will be something
like a A b B c C ... instead of the A B C ... a b c ... that a sort based
on ASCII byte values will give. That is correct.
- German locale: sort order of รค as a or ae. Both of these are
valid (see http://de.wikipedia.org/wiki/Alphabetische_Sortierung),
but most C libraries only provide the first one. Fixing this would require
creating a custom locale. This is possible, but will take some work.
- It is not in ASCII/byte order. No, it's not, it's not supposed
to be. ASCII is an encoding, not a sort order. If you want this, you can
use the C locale, but then you use the ability to non-ASCII characters.
That said, if
you are on Mac OS X or a BSD-family operating system, and you are using UTF-8,
then give up. The locale definitions on those operating systems are broken.
95. What is Infinite Cache?
Infinite Cache
allows you to increase the amount of data maintained as in-memory cache by
distributing the cache across existing inexpensive commodity hardware farms.
Compression of cache data allows storing entire databases in a durable fashion
for lightning fast performance. By storing more or all data in a RAM based
cache there are many fewer disk operations which are much slower. Infinite
Cache is also completely transparent to applications and so requires no special
caching code by application developers to safeguard the durability of any write
operations.
96. How does Infinite Cache
compare to caching everything in the application server?
While
traditionally, the architecture of caching everything at the application tier
has been effective, maintaining the cache has been difficult to code and
burdensome. Additional development is necessary in the application to properly
synchronize the cache with the data in the database and it limits the overall
data flexibility by restricting how the data in the backend database is
changed.
97. I have a transactional system;
will Infinite Cache still help?
Yes. Infinite
Cache is most effective in a read-mostly environment, but the performance
benefits are still tangible as transactions are mixed into the workload. For
example, even in more extreme transactional cases such as a 50-50 read vs.
write mix, Infinite Cache still provides a 2X performance gain.
98. What is pgpool?
pgpool is a
connection pool server for PostgreSQL based databases. pgpool runs between
Postgres clients (front ends) and servers (back ends) and caches a collection
of established connections to the database so clients can connect quickly
improving performance. A Postgres client connects to pgpool as if it were a
standard Postgres Plus Advanced Server.
99. What is Migration Toolkit?
Migration
Toolkit comes with Advanced Server and provides migration from Oracle, Sybase,
and SQL Server databases to Postgres. It migrates Oracle schema, data,
triggers, stored procedures, packages, synonyms, functions, etc. For SQL Server
and Sybase it migrates schema and data.
100. When
I start the postmaster, I get a Bad System Call or core dumped message. Why?
It
could be a variety of problems, but first check to see that you have System V
extensions installed in your kernel. POSTGRESQL requires kernel support for
shared memory and semaphores.
101. When I try to start the postmaster,
I get IpcMemoryCreate errors. Why?
You
either do not have shared memory configured properly in your kernel or you need
to enlarge the shared memory available in the kernel. The exact amount you need
depends on your architecture and how many buffers and backend processes you
configure for the postmaster. For most systems, with default numbers of buffers
and processes, you need a minimum of ~1MB.
102. When I try to start the postmaster, I get IpcSemaphoreCreate errors.Why?
If
the error message is IpcSemaphoreCreate: semget failed (No space left on
device) then your kernel is not configured with enough semaphores. Postgres
needs one semaphore per potential backend process. A temporary solution is to
start the postmaster with a smaller limit on the number of backend processes.
Use -N with a parameter less than the default of 32. A more permanent solution
is to increase your kernel's SEMMNS and SEMMNI parameters.
If
the error message is something else, you might not have semaphore support configured
in
your
kernel at all.
103. How
do I prevent other hosts from accessing my POSTGRESQL database?
By
default, POSTGRESQL only allows connections from the local machine using Unix
domain
sockets.
Other machines will not be able to connect unless you add the -i flag to the
postmaster,
and
enable host-based authentication by modifying the file $PGDATA/pg_hba.conf
accordingly.
This
will allow TCP/IP connections.
104. Why can't I connect to my database from
another machine?
The
default configuration allows only Unix domain socket connections from the local
machine. To enable TCP/IP connections, make sure the postmaster has been
started with the -i option, and add an appropriate host entry to the filele
pgsql/data/pg_hba.conf.
105. Why can't I access the database as the root user?
You
should not create database users with user id 0 (root). They will be unable to
access the
database.
This is a security precaution because of the ability of users to dynamically
link object
modules
into the database engine.
106. All my servers crash under
concurrent table access. Why?
This
problem can be caused by a kernel that is not configured to support semaphores
107. What are the pg_sorttempNNN.NN files in my
database directory?
They
are temporary files generated by the query executor. For example, if a sort
needs to be done
to
satisfy an ORDER BY, and the sort requires more space than the backend's -S
parameter allows, then temporary files are created to hold the extra data.
The
temporary files should be deleted automatically, but might not if a backend
crashes during
a
sort. If you have no backends running at the time, it is safe to delete the pg_tempNNN.NN
files.
108. Why is the system confused about commas,
decimal points, and date
formats?
Check
your locale configuration. POSTGRESQL uses the locale setting of the user that
ran the
postmaster
process. There are postgres and psql SET commands to control the date format.
Set
those
accordingly for your operating environment.
109.
Why do I get the error
.FATAL: palloc failure: memory exhausted?.
It
is possible you have run out of virtual memory on your system, or your kernel
has a low limit
for
certain resources. Try this before starting the postmaster:
ulimit
-d 65536
limit
datasize 64m
Depending
on your shell, only one of these may succeed, but it will set your process data
segment
limit
much higher and perhaps allow the query to complete. This command applies to
the current
process,
and all subprocesses created after the command is run. If you are having a
problem with
the SQL client because the backend is returning
too much data, try it before starting the client.
110. Explain about concurreny with the
help of MVCC?
Multi
version concurrency control is used to manage concurrency. This feature is very
useful because changes made in the database will not be visible to other users until
the transaction is completed. This removes the need for read locks. ACID
principles are given a further boost by this feature and can be implemented in
general.
111. Explain
about string constants?
String constant contains a sequence
of characters bound by single quotes. This feature is used during insertion of
a character or passing character to database objects.
PostgreSQL allows the usage of
single quotes but embedded by a C style backslash. This feature is important in
parsing data.
112. Explain
about tokens?
Tokens are also known to contain
several special character symbols. It can be considered as keyword, constant,
identifier and quoted identifier. Keywords include pre-defined SQL meanings and
SQL commands. Variable names such as tables, columns, etc are represented by
identifiers.
113. Explain
about pgadmin?
Pgadmin forms a graphical front end
administration tool. This feature is available under free software released
under Artistic License. Pgadmin iii is the new database administration tool
released under artistic license.
114. Explain
about database administration tools?
There are various data administration tools they are
1) Psql
2) Pgadmin
3) Phppgadmin
Most of these tools are front end administration tools and web based interfaces. Out of these phppgadmin is the most popular one.
There are various data administration tools they are
1) Psql
2) Pgadmin
3) Phppgadmin
Most of these tools are front end administration tools and web based interfaces. Out of these phppgadmin is the most popular one.
115. Explain
about the command enable debug?
This command is used for enabling compilation of all libraries and applications. This process generally slows down the system and it also increases the binary file size. Debugging symbols are present which can assist developers in noticing bugs and problems associated with their script.
This command is used for enabling compilation of all libraries and applications. This process generally slows down the system and it also increases the binary file size. Debugging symbols are present which can assist developers in noticing bugs and problems associated with their script.
116. State
some of the advanced features of PostgreSQL?
These are the following features which are present in PostgreSQL they are
1) Object relational database
2) Extensibility and support for SQL
3) Database validation and flexible API
4) Procedural languages and MVCC
5) Client server and WAL.
These are the following features which are present in PostgreSQL they are
1) Object relational database
2) Extensibility and support for SQL
3) Database validation and flexible API
4) Procedural languages and MVCC
5) Client server and WAL.
117. Explain about Write Ahead logging?
This feature increases the reliability of the database by logging changes before any changes or updations to the data base. This provides log of database incase of a database crash. This helps to start the work from the point it was discontinued.
This feature increases the reliability of the database by logging changes before any changes or updations to the data base. This provides log of database incase of a database crash. This helps to start the work from the point it was discontinued.
118. Explain Multi version concurrency control (MVCC) ?
PostgreSQL implements MVCC (Multi-Version Concurrency Control), that allows users to read data at the same time as writers make changes. MVCC is used to avoid unnecessary locking of the database. This removes the time lag for the user to log into his database. This feature or time lag occurs when some one else is on the content. All the transactions are kept as a record.
PostgreSQL implements MVCC (Multi-Version Concurrency Control), that allows users to read data at the same time as writers make changes. MVCC is used to avoid unnecessary locking of the database. This removes the time lag for the user to log into his database. This feature or time lag occurs when some one else is on the content. All the transactions are kept as a record.
This is an important feature for concurrency in database applications, as it
can allow the following:
Better performance because of fewer
locks
Greatly reduced deadlocking
Simplified application design and
management
MVCC is a
core part of PostgreSQL and cannot be turned off, nor would you really want it
to be. The internals of MVCC have some
implications for the DBA that need to be understood. The price for these
benefits is that SQL UPDATE command can cause tables and indexes to grow in
size because they leave behind dead row versions. DELETEs and aborted INSERTs
take up space that must be reclaimed by garbage collection. VACUUM is the
mechanism by which we reclaim space, though there is also another internals
feature named HOT, which does much of this work automatically for us.
119. How can
I limit what resources a query or user uses (Prioritizing users, queries, or
databases) ?
PostgreSQL has
no facilities to limit what resources a particular user, query, or database
consumes, or correspondingly to set priorities such that one
user/query/database gets more resources than others. It's necessary to use
operating system facilities to achieve what limited prioritization is possible.
There are three
main resources that PostgreSQL users, queries, and databases will contend for:
- Memory
- CPU
- Disk I/O
Of these, disk
I/O is commonly a bottleneck for database applications, but that's not always
the case. Some schema designs and queries are particularly CPU heavy. Others
really benefit from having lots of memory to work with, typically for sorting.
Before
struggling too much with prioritizing your queries/users/databases, it's
worthwhile to optimize your queries and tune your database. You may find that
you can get perfectly acceptable performance without playing with priorities or
taking extreme measures, using techniques such as:
- Improving your queries
- Tune autovacuum to reduce bloat
- Generally polishing your cluster's
performance
- Avoiding use of VACUUM
FULL. That can lead to bloated indexes that eat lots of memory and
take forever to scan, wasting disk I/O bandwidth. See the wiki page on VACUUM
FULL for more information.
People often
complain of pegged (100%) CPU and assume that's the cause of database
slowdowns. That's not necessarily the case - a system may show an apparent 100%
CPU use, but in fact be mainly limited by I/O bandwidth.
Prioritizing CPU
For adjusting
the CPU priority of PostgreSQL processes, you can use "renice" (on
UNIX systems), but it's a bit clumsy to do since you need to "renice"
the backend of interest, not the client program connected to that backend. You
can get the backend process id using the SQL query "SELECT pg_backend_pid()"
or by looking at the pg_stat_activity view.
One significant
limitation of "renice", or any approach based on the setpriority()
call, is that on most UNIX-like platforms one must be root to lower the
numerical priority value (i.e. schedule the process to run more urgently) of a
process.
Increasing the
priority of important backends, via a root user's call to "renice",
instead of lowering the priority of unimportant ones, may be more effective.
Prioritizing I/O
I/O is harder.
Some operating systems offer I/O priorities for processes, like Linux's ionice,
and you'd think you could use these in a similar way to how you use 'nice'.
Unfortunately, that won't work particularly well, because a lot of the work
PostgreSQL does - especially disk writes - are done via a separate background
writer process working from memory shared by all backends. Similarly, the
write-ahead logs are managed by their own process via shared memory. Because of
those two, it's very hard to effectively give one user priority over another
for writes. ionice should be moderately effective for reads, though.
As with
"nice", effective control on a per-connection level will require the
addition of appropriate helper functions, and user co-operation is required to
achieve per-user priorities.
Better
separation of I/O workloads will require cluster
separation, which has its own costs and is only effective on the
per-database level.
Prioritizing memory
PostgreSQL does
have some tunable parameters for memory use
that are per-client, particularly work_mem
and maintenance_work_mem.
These may be set within a given connection to allow that backend to use more
than the usual amount of memory for things like sorts and index creation. You
can set these to conservative, low values in
postgresql.conf
then use the SET
command to
assign higher values to them for a particular backend, eg SET work_mem = '100MB';
.
You can set
different values for
work_mem
and maintenance_work_mem
using per-user GUC variables. For example: ALTER USER myuser SET work_mem = '50MB';
You cannot
affect the shared memory allocation done with settings like shared_buffers this
way, that value is fixed at database startup time and can't be changed without
restarting it.
There's no easy
way in most operating systems to prioritize memory allocations, so that for
example the OS would prefer to swap one backend's memory out instead of
another's.
120. How
can admin tasks login to the database without a password?
Most access to
the database, including the psql program, goes through the libpq library. This
library includes a feature where if you specify a file called .pgpass
(or the file referenced by PGPASSFILE) you can put the password needed to
connect as a user in there. This allows automating routine administration tasks
through mechanisms like cron.
The format of
the .pgpass file is the following:
hostname:port:database:username:password
the character
'*' can match any value in any of the fields (except password)
n.b.: if the
environment variable PGPASSWORD is set, then the ~/.pgpass file is not
read
Example
PGPASSFILE value for a path with spaces on Windows 7 64-bit:
set
PGPASSFILE=C:\Program Files\someapp\pgpass.conf
Note that the environment variable value must
not use " (double quotes).