Manoj Simar - The Technology Expert: Interview Q and A for PostgreSQL Part

147. Moving objects between schemas

To move one table from its current schema to a new schema, use the following:

ALTER TABLE cust SET SCHEMA anotherschema;

If you want to move all objects, you might consider just renaming the schema itself using the following query:

ALTER SCHEMA existingschema RENAME TO anotherschema;

This only works if another schema with that name does not already exist.

Otherwise, you'll need to run ALTER TABLE for each table you would like to move.

Views, sequences, functions, aggregates, and domains can also be moved by ALTER

When you move a table to a new schema, all of the indexes, triggers, and rules defined on those tables will also be moved to the new schema. If you've used a SERIAL data type, and an implicit sequence has been created, then that also moves to the new schema. Schemas are a purely an administrative concept, and do not affect the location of the table's data files. Tablespaces don't work this way, as we will see in later recipes.

Databases, users/roles, languages, conversions don't exist in a schema. Schemas exist in a particular database. = Schemas don't exist within schemas; they are not arranged in a tree or hierarchy.

Text search objects exist in a specific schema though there are no commands to move them to a new schema. Similarly, operator(s), operator class(es), and operator family(s) exist in a schema, though there are no commands to move them to new schemas. Also casts don't exist in a schema, though the data types and functions they reference don't.

148. Explain Tablespace in postgres?

Tablespaces allow us to store PostgreSQL data across different devices. A tablespace is just a directory where we store PostgreSQL data files. We use symbolic links from the data directory to the tablespace. A tablespace can contain objects from multiple databases, so it's possible to be in a position where there are no objects visible in the current database.

We might want to do that for performance, administrative ease, or your database might just have run out of disk space.

Create the tablespace is simple, as follows:

CREATE TABLESPACE new_tablespace LOCATION '/usr/local/pgsql/new_tablespace';

And, the command to remove the tablespace is also simple, which is:

DROP TABLESPACE new_tablespace;

A tablespace can only be dropped when it is empty, Tablespaces can contain both permanent and temporary objects. Permanent data objects are tables, indexes, and toast objects.

Toast objects are created and always live in the same tablespace as their main table, plus, you cannot manipulate their privileges or ownership.

Indexes can exist in separate tablespaces, as a performance option, though that requires explicit specification on the CREATE INDEX statement. The default is to create indexes in the same tablespace as the table to which they belong.

Temporary objects may also exist in a tablespace. These exist when users have explicitly created temporary tables or there may be implicitly created data files when large queries overflow their work_mem settings. These files are created according to the setting of the temp_tablespaces parameter.The users can change their setting of temp_tablespaces away from the default value specified in the postgresql.conf.

Giving each tablespace a specific owner, using the following query:

ALTER TABLESPACE new_tablespace OWNER TO eliza;

You may also wish to set default_tablespaces for a user, so that tables are automatically created there by issuing the following query:

ALTER USER eliza SET default_tablespace = 'new_tablespace';

As each tablespace has different I/O characteristics, we may wish to alter the planner cost parameters for each tablespace. These can be set with the following command:

ALTER TABLESPACE new_tablespace SET (seq_page_cost = 0.05, random_page_cost = 0.1);

Moving objects between tablespaces

if you alter the tablespace of a table, it's toast objects will move also.

ALTER TABLE mytable SET TABLESPACE new_tablespace;

ALTER INDEX mytable_val_idx SET TABLESPACE new_tablespace;

Moving tablespaces means bulk-copying data. Copying happens sequentially block-by-block, and that performs well, but there's no way to avoid the fact that the bigger the table, the longer it will take.The table is fully locked (AccessExclusiveLock) while the copy takes place, so this can cause an effective outage for your application.

SET default_tablespace = 'new_tablespace';

ALTER DATABASE mydb SET default_tablespace = 'new_tablespace';

Note :>>> Take care that you do not run the next command by mistake though:

ALTER DATABASE mydb SET TABLESPACE new_tablespace;

As this literally moves all objects that do not have an explicitly defined tablespace into new_ tablespace. For a large database, this will take a very long time, and your database will be completely locked while it runs

149. How do I make sure that the user X cannot access the table Y?

The current user must either be a superuser, the owner of the table, or must have a GRANT option for the table. Also, you can't revoke rights from a user who is a superuser.

To revoke all rights to table mysecrettable from user user whoshouldnotseeit, one must run the following SQL command:

REVOKE ALL ON mysecrettable FROM userwhoshoudnotseeit;

However, because the table is usually also accessible to all users through role PUBLIC, the following must also be run:

REVOKE ALL ON mysecrettable FROM PUBLIC;

To make sure that some user no longer can access a table, the right(s) to that table must be revoked from both PUBLIC and that specific user.

150. What are the default rights have Public role on newly created tables?

By default all users have a set of rights (SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, and TRIGGER) to all newly created tables through the special role PUBLIC.

151. Granting user access to a table

Access to schema is also needed

In order to access any table, the user first needs access to the schema containing the table:

GRANT USAGE ON SCHEMA someschema TO someuser;

Granting access to a table through a group role

CREATE GROUP webreaders;

GRANT SELECT ON pages TO webreaders;

GRANT INSERT ON viewlog TO webreaders;

GRANT webreaders TO tim, bob;

Now, both tim and bob have the SELECT privilege on table pages, and INSERT on table viewlog. You can also add privileges to the group role after assigning it to users. So after:

GRANT INSERT, UPDATE, DELETE ON comments TO webreaders;

Both bob and tim have all these privileges on table comments.

Granting access to all objects in schema

Version 9.0 added a capability to GRANT or REVOKE privileges on all objects of a certain kind in a specific schema:

GRANT SELECT ON ALL TABLES IN SCHEMA staging TO bob;

You still need to grant the privileges on the schema itself in a separate grant statement.

152. Temporarily preventing a user from connecting

To modify other users, you must either be a superuser or have the CREATEROLE privilege (in the latter case only non-superuser roles can be altered).

To temporarily prevent the user from logging in, run the following:

pguser=# alter user bob nologin;

The preventing user from connecting can be achieved by setting connection limit for that user to 0:

pguser=# alter user bob connection limit 0;

To let the user connect again, run the following:

pguser=# alter user bob login;

This sets a flag in the system catalog telling PostgreSQL not to let the user to log in. It does not kick out already connected users.

153. Limiting number of concurrent connections by a user

The preventing user from connecting can be achieved by setting connection limit for that user to 0:

pguser=# alter user bob connection limit 0;

To allow bob 10 concurrent connections, run the following:

pguser=# alter user bob connection limit 10;

To allow an unlimited number of connections by this user, run the following:

pguser=# alter user bob connection limit -1;

postgres=# ALTER ROLE fred CONNECTION LIMIT 1;

Note that even if you set connection limit to zero for superusers, they will still be able to connect.

The connection limit is applied during session connection. Raising this limit would never affect any connected users. Lowering the limit doesn't have any effect either, unless they try to disconnect and reconnect.

postgres=> SELECT rolconnlimit FROM pg_roles WHERE rolname = 'fred';

postgres=> SELECT count(*) FROM pg_stat_activity WHERE usename = 'fred';

Users can't raise or lower their own connection limit, in case you were worried they might be able to override this somehow.

154. Forcing NOLOGIN users to disconnect

In order to make sure that all users whose login privilege has been revoked are disconnected right away, run the following SQL statement as a superuser:

SELECT pg_terminate_backend(procpid)

FROM from pg_stat_activity a

JOIN pg_roles r ON a.usename = r.rolname AND not rolcanlogin;

On older versions of postgresql, where pg_terminate_backend() function does not exist, you can get the same effect from shell by running the following as user postgres on the database server:

postgres@hvost:~$ psql -t -c "\

select 'kill ' || procpid from pg_stat_activity a \

join pg_roles r on a.usename = r.rolname and not rolcanlogin;"\

| bash

This incantation constructs proper kill commands from a query, and then feeds them to the shell for execution.

155. checking all users have a secure password

PostgreSQL has no built-in facilities to make sure that you are using strong passwords.

The best you can do is to make sure that all users' passwords are encrypted, and that your pg_hba file does not allow logins with a plain password. That is, always use MD5 as login method for users.

To see which users have unencrypted passwords, use the following query:

test2=# select usename,passwd from pg_shadow where passwd not like 'md5%' or length(passwd) <> 35;

usename | passwd

----------+--------------

tim | weakpassword

asterisk | md5chicken

To see users with encrypted passwords, use the following:

test2=# select usename,passwd from pg_shadow where passwd like 'md5%' and length(passwd) = 35;

usename | passwd

----------+-------------------------------------

bob2 | md518cf038878cd04fa207e7f5602013a36

156. How to Audit DDL changes

One can collect DDL (Data Definition Language) from database logs in order to audit changes to the database structure

your postgresql.conf file, and set the following:

log_statement = 'ddl'

Setting it to 'mod' or 'all' is also ok for this. Don't forget to reload the configuration:

/etc/init-d/postgresql reload

Was the change committed

It is possible to have some statements recorded in the log file, but not visible in the database structure. Most DDL commands in PostgreSQL can be ROLLBACKed, so what is in the log, is just a list of commands executed by postgresql, not what was actually committed. The log file is not transactional, and also keeps commands that were rolled back.

Who made the change

To be able to know the database user who made the DDL changes, you have to make sure that this info is logged as well. In order to do so, you may have to change the log_line_prefix parameter to include the %u format string.

A recommended minimal log_line_prefix format string for auditing DDL is '%t %u %d', which tells postgresql to log timestamp, database user, and database name at the start of every log line.

157. Auditing data changes

1. Collecting data changes from server log

Set log_statement = 'mod' or to 'all' in the server log.

Collect all INSERT, UPDATE, DELETE, and TRUNCATE commands from log

2. Collecting changes using triggers

158. Choosing good names for database objects

The standard names for indexes in PostgreSQL are:

{tablename}_{columnname(s)}_{suffix}

where the suffix is one of the following:

ffpkey for a Primary Key constraint

ffkey for a Unique constraint

ffexcl for an Exclusion constraint

ffidx for any other kind of index

Standard suffix for sequences is

ffseq for all sequences

Tables can have multiple triggers fired on each event. Triggers are executed in alphabetical order, so trigger names should have some kind of action name to differentiate them and to allow the order to be specified. It might seem a good idea to put INSERT, UPDATE, or DELETE in the trigger name, though that can get confusing if you have triggers that work on both UPDATE and DELETE, and may end up as a mess.

A useful naming convention for triggers is:

{tablename}_{actionname}_{after|before}__trig

159. Handling objects with quoted names

PostgreSQL object names can contain spaces and mixed case characters if we enclose the tablenames in double quotes. PostgreSQL folds all names to lowercase when used within an SQL statement, which means that:

SELECT * FROM mycust;

is exactly the same as:

SELECT * FROM MYCUST;

If you want to access a table that was created with quoted names, then you must use quoted names, such as the following:

postgres=# SELECT count(*) FROM "MyCust";

Handy function named quote_ident(). This function puts double quotes around a value if PostgreSQL would require that for an object name, such as

postgres=# select quote_ident('MyCust');

quote_ident

-------------

"MyCust"

postgres=# select quote_ident('mycust');

quote_ident

-------------

mycust

160. Why is the database version important?

PostgreSQL has internal version numbers for the data file format, the database catalog layout, and the crash recovery format. Each of these is checked as the server runs, to ensure that the data isn't corrupted. PostgreSQL doesn't change these internal formats for a single release, they only change across releases.

From a user perspective, each release differs in terms of the way the server behaves. If you know your application well, then it should be possible to assess the differences just by reading the release notes for each version. In many cases, a retest of the application is the safest thing to do.

161. Identifying and removing duplicates

Identify the duplicates using a query, such as the following:

SELECT * FROM cust WHERE customerid IN (SELECT customerid FROM cust GROUP BY customerid HAVING count(*) > 1);

Use a query block like the following to delete all the exactly duplicate rows, leaving just one row from each set of duplicates:

BEGIN; LOCK TABLE new_cust IN ROW EXCLUSIVE MODE; DELETE FROM new_cust WHERE ctid NOT IN (SELECT min(ctid) FROM new_cust

WHERE customer_id IN (4) --specify exact duplicate ids

GROUP BY customerid); COMMIT;

And then follow that with

VACUUM new_cust;

To clean up the table after the deletions.

If we're looking for duplicates of more than one column (or even all columns) then we have to use an SQL of the following form:

SELECT * FROM mytable WHERE (col1, col2, … ,colN) IN (SELECT col1, col2, … ,colN FROM mytable GROUP BY col1, col2, … ,colN HAVING count(*) > 1);

with (col1, col2, …. , colN) as the list of columns of the key.

The DELETE query that we showed only works with PostgreSQL, because it uses the ctid value which is the internal identifier of each row in the table. If you wanted to run that query against more than one column

DELETE FROM mytable WHERE ctid NOT IN (SELECT min(ctid) FROM has_duplicates

-- need WHERE clause to filter only exact duplicates

GROUP BY col1, col2, …, colN);

Uniqueness is always enforced by an index.

Unique constraints can be marked as "deferrable". However, there are a number of restrictions on this that make this feature not very usable in PostgreSQL 9.0. The restrictions are as follows:

ffYou must define a constraint as DEFERRABLE on the CREATE TABLE. You cannot define this on a CREATE TABLE AS SELECT, nor can these be added later with an ALTER TABLE command.

ffYou cannot mix deferrable unique constraints with Foreign Keys. You will get an error message if you try to add a Foreign Key that refers to a unique constraint that is deferrable.

162. Loading data from a spreadsheet

If your spreadsheet data is neatly laid out in a single worksheet. CSV as the file type to be saved

We can then load it into an existing PostgreSQL table, using the psql command

postgres=# \COPY sample FROM sample.csv CSV HEADER

postgres=# SELECT * FROM sample;

key | value

-----+-------

1 | c

2 | d

Or from the command line this would be, as follows:

psql -c '\COPY sample FROM sample.csv CSV HEADER'

Note that the file can include a full file path if the data is in a different directory. The psql \COPY command transfers data from the client system where you run the command through to the database server, so the file is on the client.

If you are submitting SQL through another type of connection, then you would use the following SQL statement:

COPY sample FROM '/mydatafiledirectory/sample.csv' CSV HEADER;

Note that the preceding SQL statement runs on the database server, and can only be executed by a superuser. So you would need to transfer the data yourself to the server, and then load. The COPY statement shown in the preceding SQL statement uses an absolute path to identify data files, which is required.

The COPY (or \COPY) command does not create the table for you; that must be done beforehand. Note also that the HEADER option does nothing but ignore the first line of the input file, so the names of the columns from the .csv file don't need to match the Postgres table. If you say HEADER and the file does not have a header line, then all it does is ignore the first data row. Unfortunately, there's no way for PostgreSQL to tell whether the first line of the file is truly headers or not. Be careful.

163. How to Loading data from flat files.

Use pgloader to Loading data into your database. use a script to execute pgloader.

Typical example: cookbook_pgloader.conf

[pgsql]

host = 192.168.0.5

base = pgloader

user = dim

log_file = /tmp/pgloader.log

log_min_messages = DEBUG

client_min_messages = WARNING

lc_messages = C

client_encoding = 'utf-8'

copy_every = 10000

null = ""

empty_string = "\ "

max_parallel_sections = 4

[load]

table = load_example

filename = simple/simple.data

format = csv

datestyle = dmy

field_sep = |

trailing_sep = True

columns = a:1, b:3, c:2

reject_log = /tmp/simple.rej.log

reject_data = /tmp/simple.rej

section_threads = 4

We can use the load script like the following:

pgloader –-summary –-vacuum –-config cookbook_pgloader.conf

164. Explain pgloader? Differnece between COPY & pgloader ?

pgloader copes gracefully with errors. copy loads all rows in a single transaction, so only a single error is enough to abort the load. pgloader breaks down an input file into reasonably sized chunks, and loads them piece-by-piece. If some rows cause errors, then pgloader will iteratively check them so that it can skip those bad rows.

pgloader is written in Python, and allows connection to PostgreSQL through the standard Python client interface. Yes, pgloader is less efficient than loading data files using a COPY command, but running a COPY has many more restrictions: the file has to already be in the right place on the server, has to be in the right format, and must be unlikely to throw errors on load. pgloader has additional overhead, but it also has the ability to load data using multiple parallel threads, so it can be faster to use as well. pgloader's ability to call out to reformat functions written in Python is often essential in most cases; straight COPY is just too simple.

pgloader also allows loading from fixed-width files, which COPY cannot.

If you need to reload the table from fresh completely, then specify --truncate on the command line of pgloader.

165. How to reload data in parallel by pg_restore ?

If you are re-loading data that has been unloaded from PostgreSQL, then you may want to use the pg_restore utility instead. The pg_restore utility has an option to reload data in parallel, -j number_of_threads, though this is only possible if the dump was produced using the custom pg_dump format.

166. Default port number for PostgreSQL

5432

167. What is difference between pg_ctl stop fast & immediate?

You can issue a database server stop using fast mode as follows:

pg_ctl -D datadir -m fast stop

You must use -m fast if you wish to shut down as soon as possible. Normal shutdown means "wait for all users to finish before we exit". That can take a very long time, though all the while new connections are refused.

When you do a fast stop, all users have their transactions aborted and all connections are disconnected. It's not very polite to users, but it still treats the server and its data with care, which is good.

PostgreSQL is similar to other database systems, in that it does do a shutdown checkpoint before it closes. This means that the startup that follows will be quick and clean. The more work the checkpoint has to do, the longer it will take to shut down.

The basic command to perform an emergency restart on the server is the following:

pg_ctl -D datadir stop -m immediate

We must use an immediate stop mode.

When you do an immediate stop, all users have their transactions aborted, and all connections are disconnected. There is no clean shutdown, nor politeness of any kind.

An immediate mode stop is similar to a database crash. Some cached files will need to be rebuilt, and the database itself needs to undergo crash recovery when it comes back up.

Note that for DBAs with Oracle experience, immediate mode is the same thing as a shutdown abort. PostgreSQL immediate mode stop is not the same thing as shutdown immediate on Oracle.

168. How to Restarting the server quickly

The basic command to restart the server is the following:

pg_ctl -D datadir restart -m fast

The actual shutdown will happen much faster if we issue a normal checkpoint first, as the shutdown checkpoint will have much less work to do. So, flush all dirty shared_buffers to disk with the following command issued by a database superuser:

psql -c "CHECKPOINT"

169. Reloading the server configuration files

Some PostgreSQL configuration parameters can only be changed by "reloading" the whole configuration file. On all platforms, there is a specific command to reload the server, which is as follows:

UBUNTU/DEBIAN pg_ctlcluster 9.0 main reload

RED HAT/FEDORA service postgresql reload pg_ctl -D /var/lib/pgsql/data reload

SOLARIS pg_ctl -D /var/lib/pgsql/data reload

MAC OS pg_ctl -D /var/lib/pgsql/data reload

FREEBSD pg_ctl -D /var/lib/pgsql/data reload

You can also reload the configuration files while still connected to PostgreSQL. This can be done from the command line as follows, if you are a superuser:

postgres=# select pg_reload_conf();

pg_reload_conf

----------------

Which is also often executed from an admin tool, such as pgAdmin3

To reload the configuration files, we send the SIGHUP signal to the postmaster, which then passes that on to all connected backends. That's why some people call reloading the server "sigh-up-ing".

If you look at the catalog table pg_settings, you'll see that there is a column named context. Each setting has a time and a place where it can be changed. Some parameters can only be reset by a server reload, and so the value of context for those parameters will be a "sighup". Here are a few of the ones you'll want to change sometimes during server operation (there are others!):

postgres=# SELECT name, setting, unit ,(source = 'default') as is_default FROM pg_settings

WHERE context = 'sighup' AND (name like '%delay' or name like '%timeout') AND setting != '0';

As reloading the configuration file is achieved by sending the SIGHUP signal, we can reload the configuration file just for a single backend using the kill command First, find out the pid of the backend using pg_stat_activity.

Then, from the OS prompt, issue the following:

kill -SIGHUP pid

Or we can do both at once, as in the following command:

kill -SIGHUP \ `psql -t -c "select procpid from pg_stat_activity limit 1"`

though that is only useful with a sensible WHERE clause.

170. Preventing new connections

Certain emergencies, you may need to lock down the server completely, or just prevent specific users from accessing the database

Connections can be prevented in a number of ways as follows:

ffPause/Resume the session pool.

ffStop the server! but it is not recommended.

ffRestrict connections to zero for a specific database by setting the connection limit to zero.

ALTER DATABASE foo_db CONNECTION LIMIT 0;

This will limit normal users from connecting to that database, though it will still allow superuser connections.

ffRestrict connections to zero for a specific user by setting the connection limit to zero

ALTER USER foo CONNECTION LIMIT 0;

This will limit normal users from connecting to that database, though it will still allow connection if the user is a superuser, so luckily you cannot shut yourself out accidentally.

ffChange host-based authentication (HBA) file to refuse all incoming connections, and then reload the server:

..Create a new file named pg_hba_lockdown.conf, and add the following two lines to the file. This puts rules in place that will completely lock down the server, including superusers. Please have no doubt that this is a serious and drastic action.

# TYPE DATABASE USER CIDR-ADDRESS METHOD

local all all reject

host all all 0.0.0.0/0 reject

If you still want superuser access, then try something like the following:

# TYPE DATABASE USER CIDR-ADDRESS METHOD

local all postgres ident

local all all reject

host all all 0.0.0.0/0 reject

which will prevent connection to the database by any user except the postgres operating system userid connecting locally to the postgres database. Be careful not to confuse the second and third columns: the second column is the database and the third column is the username. It's worth keeping the header line just for that reason. The method "ident" should be replaced by other authentication methods if a more complex configuration is in use.

..Copy the existing pg_hba.conf to pg_hba_access.conf, so that it can be replaced again later, if required.

..Copy pg_hba_lockdown.conf to pg_hba.conf

..Reload the server

Manoj Simar - The Technology Expert

About Me

Thursday, 7 June 2018

Interview Q and A for PostgreSQL Part - 7

No comments:

Post a Comment