1. How
to check distribution policy of a test table sales?
Answer: The Describe table
sales shows the distribution details.
psql>\d sales
Table"public.sales"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
date | date |
Distributed by: (id)
2.
How many user schemas are there in the database?
Answer: Use"\dn"
at psql prompt.
3.When
is my tables last analyzed in Greenplum database?
Answer: In 4.x check pg_stat_operations
for all actionname performed on any object.
For
example,a sales table:
gpdb=# select
objname,actionname,statime from pg_stat_operations where objname like 'sales';
objname | actionname |
statime
--------+-----------+-------------------------------
sales | CREATE | 2010-10-01
12:18:41.996244-07
sales | ANALYZE | 2010-10-06
14:38:21.933281-07
sales | VACUUM | 2010-10-06
14:38:33.881245-07
4. How to check the size of a table?
Answer: Table Level: Replace
schema.tablename with your search table.
psql> select
pg_size_pretty(pg_relation_size('schema.tablename'));
Table
and Index: Replace schema.tablename
with your search table.
psql> select
pg_size_pretty(pg_total_relation_size('schema.tablename'));
5. How to check the Schema size?
Answer: Schema Level: Replace SCHEMANAME
with your schema name.
psql> select schemaname
,round(sum(pg_total_relation_size(schemaname||'.'||tablename))/1024/1024)
"Size_MB" from pg_tables where schemaname='SCHEMANAME' group by 1;
6. How to check the database size?
Answer: To see size of specific database:
psql> select
pg_size_pretty(pg_database_size('DATBASE_NAME'));
To
see all database sizes:
psql> select
datname,pg_size_pretty(pg_database_size(datname)) from pg_database;
7. How to check partitioned table size
including indexes and partitions?
Answer: Table size with partitions:
The
following SQL gives you employee_dailly table size, which includes partitions.
select
schemaname,tablename,round(sum(pg_total_relation_size(schemaname || '.' ||
partitiontablename))/1024/1024) "MB" from pg_partitions where
tablename='employee_daily' group by 1,2;
schemaname | tablename | MB
-----------+----------------+-----
public | employee_daily |
254
8. How do I get help on syntax to alter
table?
Answer: In psql session type \h alter table
which will display the syntax:
gpdb=# \h alter table
9. How to connect in utility mode?
Answer:
From
master host
PGOPTIONS='-c
gp_session_role=utility' psql -p <port> -h <hostname>
Where:
port
is segment/ master database port. hostname
is segment/master hostname.
10. Where/How to find db logs?
Answer:
Master:
Master
gpdb logfile is located in the $MASTER_DATA_DIRECTORY/pg_log/ directory and the
file name depends on the database "log_filename" parameter.
1.
$MASTER_DATA_DIRECTORY/pg_log/gpdb-yyyy-mm-dd_000000.csv -->Log file format
with default installation.
2.
~gpadmin/gpAdminLogs/ -->gpstart,gpstop,gpstate and other utility logs.
Segments:
1.
primary segments run below SQL to see logfile location:
select
dbid,hostname,datadir||'/pg_log' from gp_configuration where content not in
(-1) and isprimary is true;
2.
Miror Segments run below SQL to see logfile location:
select
dbid,hostname,datadir||'/pg_log' from gp_configuration where content not in
(-1) and isprimary is false;
11. How to see the list of available
functions in Greenplum DB?
Answer:
\df
schemaname.functionname (schemaname and function name support wildcard
characters)
test=# \df pub*.*test*
List of functions
Schema | Name | Result data
type | Argument data types
--------+-------------+------------------+---------------------
public | bugtest | integer |
public | test | boolean |
integer
public | test | void |
(3 rows)
12. How to check whether Greenplum server
is up and running?
Answer: The gpstate is the utility to check gpdb
status.
Use
gpstate -Q to show a quick status. Refer to gpstate --help for more options.
Sample
output:
[gpadmin@stinger2]/export/home/gpadmin>gpstate
-Q
gpadmin-[INFO]:-Obtaining
GPDB array type, [Brief], please wait...
gpadmin-[INFO]:-Obtaining
GPDB array type, [Brief], please wait...
gpadmin-[INFO]:-Quick
Greenplum database status from Master instance only
gpadmin-[INFO]:----------------------------------------------------------
gpadmin-[INFO]:-GPDB fault
action value = readonly
gpadmin-[INFO]:-Valid count
in status view = 4
gpadmin-[INFO]:-Invalid
count in status view = 0
gpadmin-[INFO]:----------------------------------------------------------
13. How to create a Database?
Answer:
There
are two ways to create gpdb database using psql session or the Greenplum
createdb utility.
1.
Using psql session:
gpdb=# \h create database
Command:
CREATE DATABASE
Description:
create a new database
Syntax:
CREATE
DATABASE name
[ [
WITH ] [ OWNER [=] dbowner ]
[
TEMPLATE [=] template ]
[
ENCODING [=] encoding ]
[ TABLESPACE
[=] tablespace ]
[
CONNECTION LIMIT [=] connlimit ] ]
2.
Using createdb utility:
Usage: $GPHOME/bin/createdb
--help
createdb
[OPTION]... [DBNAME] [DESCRIPTION]
Options:
-D,
--tablespace=TABLESPACE default tablespace for the database
-e,
--echo show the commands being sent to the server
-E,
--encoding=ENCODING encoding for the database
-O,
--owner=OWNER database user to own the new database
-T,
--template=TEMPLATE template database to copy
--help
show this help, then exit
--version
output version information, then exit
14. How do I get a list of databases in a
greenplum cluster?
Answer:
gpdb=# \ l (lowercase
letter "l")
List of databases
Name | Owner | Encoding
------{}----------
gpdb | gpadmin | UTF8
gpperfmon | gpadmin | UTF8
postgres | gpadmin | UTF8
template0 | gpadmin | UTF8
template1 | gpadmin | UTF8
Check
below SQL for more details on dbs.
gpdb=# select * from
pg_database;
15. How to delete/drop an existing
database in Greenplum?
Answer:
gpdb=# \h DROP Database
Command:
DROP DATABASE
Description:
remove a database
Syntax:DROP
DATABASE [ IF EXISTS ] name
Also
check dropdb utility:
$GPHOME/bin/dropdb --help
dropdb
removes a PostgreSQL database.
Usage:
dropdb
[OPTION]... DBNAME
16. Where can I get help on postgres psql
commands?
Answer:
In psql session
"\
?" - for all psql session help
"\h
<SQL Command> " For any SQL syntax help.
17. gpstart failed what should I do?
Answer:
Check
gpstart logfile in ~gpadmin/gpAdminLogs/gpstart_yyyymmdd.log
Take
a look at the pg start log file for more details in
$MASTER_DATA_DIRECTORY/pg_log/startup.log
18. Why do we need gpstop -m and gpstart
-m?
Answer: The gpstart -m command allows you to start
the master only and none of the data segments and is used primarily by support
to get system level information/configuration. An end user would not regularly
or even normally use it.
19. What is the procedure to get rid of
mirror segments?
Answer: There are no utilities available to
remove mirrors from Greenplum. You need to make sure all primary segments are
good then you can remove the mirror configuration from gp_configuration in 3.x.
20. How to run gpcheckcat?
Answer:
The gpcheckcat tool is used to check catalog inconsistencies between
master and segments. It can be found in the $GPHOME/bin/lib directory:
Usage:
gpcheckcat <option> [dbname]
-?
-B
parallel: number of worker threads
-g
dir : generate SQL to rectify catalog corruption, put it in dir
-h
host : DB host name
-p
port : DB port number
-P
passwd : DB password
-o :
check OID consistency
-U
uname : DB User Name
-v :
verbose
Example:
gpcheckcat
gpdb >gpcheckcat_gpdb_logfile.log
21. What is gpdetective and how do I run
it in Greenplum?
Answer: The gpdetective utility collects
information from a running Greenplum Database system and creates a bzip2-compressed
tar output file. This output file helps with the diagnosis of Greenplum
Database errors or system failures. for more details check help.
gpdetective --help
22. How to delete a standby?
Answer:
To
remove the currently configured standby master host from your Greenplum
Database system, run the following command in the master only:
# gpinitstandby -r
23. How to re-sync a standby?
Answer: Use this option if you already have
a standby master configured, and just want to resynchronize the data between
the primary and backup master host. The Greenplum system catalog tables will
not be updated.
# gpinitstandby -n
(resynchronize)
24. How to recover an invalid segment?
Answer: Use the gprecoverseg tool, which will recognize which segments need
recovery and will initialize recovery.
3.3.x:
o
Without "-F" option - First files will be compared, difference found
and only different files will be synched (the first stage could last a long
time if there are too many files in the data directory)
o
With "-F" option - Entire data directory will be resynched.
4.0.x:
o
Without "-F" option - The change tracking log will be sent and
applied to the mirror.
o
With "-F" option - Entire data directory will be resynched.
25. How to add mirrors to the array?
Answer: The gpaddmirrors utility configures
mirror segment instances for an existing Greenplum Database system that was
initially configured with primary segment instances only. For more details check
help.
# gpaddmirrors --help
26. How to see primary to mirror mapping?
Answer:
From
database catalog following query list configuration on content ID, you can
figure out primary and mirror for each content.
gpdb=# select * from
gp_configuration order by content.
Note:
starting from GPDB 4.x, gp_segment_configuration table is used instead.
gpdb=# select * from
gp_segment_configuration order by dbid;
27. How to start/stop db in admin mode?
Answer:
Admin mode:
The
gpstart with option (-R) is stands for Admin mode or restricted mode where only
super users can connect to database when database opened using this option.
utility mode:
Utility
mode allows you to connect to only individual segments when started using
gpstart -m, for example< to connect to only master instance only:
PGOPTIONS='-c
gp_session_role=utility' psql
28. How to run gpcheckperf IO/netperf?
Answer:
Create
a directory where you have free space and common in all hosts.
For
network I/O test for each nic card:
gpcheckperf
-f seg_host_file_nic-1 -r N -d /data/gpcheckperf > seg_host_file_nic_1.out
gpcheckperf
-f seg_host_file_nic-2 -r N -d /data/gpcheckperf > seg_host_file_nic_2.out
For
disk I/O:
gpcheckperf
-f seg_host_file_nic-1 -r ds -D -d /data/gpdb_p1 -d /data/gpdb_p2 -d
/data/gpdb_m1 -d /data/gpdb_m2
29. How to update postgresql.conf and
reload it?
Answer: In GP 4.0 version check gpconfig utility
to change postgres.conf parameters.
30. How to manage pg_hba.conf?
Answer:
The
pg_hba.conf file of the master instance controls client access and
authentication to your Greenplum system.
31.How to add new user to the database?
Answer:
Use
createuser utility to create users. See createuser --help for more details.
You
can also use SQL commands in psql prompt to create users.
For
example: CREATE USER or ROLE <ROLE_NAME> ....
32. How to create a password free trusted
env b/w the all the segment hosts?
Answer:
Use
gpssh-exkeys:
gpssh-exkeys -h hostname1 -h
hostname2 .. -h hostnameN
33. How to check db version and version at
init db?
Answer:
To
check version:
psql> select version();
or
postgres --gp-version
To
check gp version at install:
psql> select * from
gp_version_at_initdb;
34. How to see the value of GUC?
Answer: By connecting GPDB database using psql
query catalog or do show parameter.
Example:
gpdb# select name,setting
from pg_settings where name='GUC';
or
gpdb# show <GUC_NAME>;
35. What is the location of
pg_hba/logfile/master_data_directory?
Answer: cd $MASTER_DATA_DIRECTORY - Master
direcoty.
pg_hba.conf
and postgres.conf location and other GPDB internal directories.
cd
$MASTER_DATA_DIRECTORY/pg_logs -- Master database logfiles location.
36. How to find errors / fatal from log
files?
Answer: grep for ERRORS, FATAL, SIGSEGV in pg_logs
directory.
37. What is vacuum and when should I run
this?
Answer:
VACUUM reclaims storage occupied by deleted tuples. In normal GPDB
operation, tuples that are deleted or obsoleted by an update are not physically
removed from their table. They remain present on disk until a VACUUM is done.
Therefore, it is necessary to do VACUUM periodically, especially on
frequently-updated table.
38. What is difference between vacuum and
vacuum full?
Answer:
Unless
you need to return space to the OS so that other tables or other parts of the
system can use that space, you should use VACUUM instead of VACUUM FULL.
VACUUM
FULL is only needed when you have a table that is mostly dead rows, that is,
the vast majority of its contents have been deleted. Even then, there is no
point using VACUUM FULL unless you urgently need that disk space back for other
things or you expect that the table will never again grow to its past size. Do
not use it for table optimization or periodic maintenance as it is
counterproductive.
39. What is Analyze and how frequency
should I run this?
Answer:
ANALYZE
collects statistics about the contents of tables in the database, and stores
the results in the system table pg_statistic. Subsequently, the query planner
uses these statistics to help determine the most efficient execution plans for
queries.
It
is a good idea to run ANALYZE periodically, or just after making major changes
in the contents of a table. Accurate statistics will help the query planner to
choose the most appropriate query plan, and thereby improve the speed of query
processing. A common strategy is to run VACUUM and ANALYZE once a day during a
low-usage time of day.
40. What is resource queues?
Answer:
Resource
queues are used to manager Greenplum database workload management. All user /
queries can be prioritized using Resource queues.
41. What is gp_toolkit?
Answer: The gp_toolkit is a database schema, which
has many tables, views and functions to better manage Greenplum Database when
DB is up. All database users can access gp_toolkit to view and query the system
log files and other system metrics.
42. How to generate DDL for a table?
Answer:
Use
pg_dump utility to generate DDL.
Example: pg_dump -t njonna.accounts -s -f ddl_accounts.sql
Where:
-f
ddl_accounts.sql is output file.
-t
njonna.accounts is table name with schema njonna.
-s
dump only schema no data
43. What are the tools available in
Greenplum to take backup and restores?
Answer:
For
non-parallel backups:
Use
postgres utililities (pg_dump, pg_dumpall for backup, and pg_restore for
restore).
Another
useful command for getting data out of database is the COPY <TABLE> to
<File>.
For
parallel backups:
gp_dump
and gpcrondump for backups and gp_restore for restore process.
44. How do I clone my production database
to PreProd / QA environment?
Answer:
If
Prod and QA on same GPDB cluster, use CREATE database <Clone_DBname>
template <Source_DB>.
If
Prod and QA are on different clusters, use backup and restore utilities.
45. What is difference between pg_dump and
gp_dump?
Answer:
pg_dump
- Non-parallel backup utility, you need big file system where backup will be
created in the master node only.
gp_dump
- Parallel backup utility. Backup will be created in master and segments file
system.
46. What is gpcrondump?
Answer:
A
wrapper utility for gp_dump, which can be called directly or from a crontab
entry.
Example:
gpcrondump -x
<database_name>
47. What are the backup options available
at OS level?
Answer:
Solaris:
zfs snapshots at file system level.
All
OS: gpcrondump / gp_dump.
48. My SQL query is running very slow, it
was running fine yesterday what should I do?
Answer:
A.
Check that your connection to the Greenplum cluster is still good if you are
using a remote client. You can do this by running the SQL locally to the GP
cluster.
B.
Check that the system tables and user tables involved are not bloated or
skewed. Read jetpack or Greenplum toolkit documentation about how to do this.
C.
Check with your DBA that the Greenplum interconnect is still performing
correctly.
This
can be done by checking for dropped packets on the interconnect "netstat
-i" and by running gpcheckperf.
It
is also possible that a segment is experiencing hardware problems, which can be
found in the output of dmesg or in
/var/log/messages*
(Linux) and /var/adm/messages* (Solaris).
49. How to turn on timing, and checking
how much time a query takes to execute?
Answer:
A.
You can turn in timing per session before you run your SQL with the \timing
command.
B.
You can run explain analyze against your SQL statement to get the timing.
50. How to trace child processes on
segment server?
Answer:
When
session start in master and segments, all the child processes in segments will
be identified with master session_id connection string (con+sess_id).
For
example:
gpdb=# select * from
pg_Stat_activity;
datid | datname | procpid |
sess_id |.. ..
-------+---------+---------+---------+
16986 | gpdb | 18162 | 76134
| .. ..
In
all segments child processes for session 76134:
[gpadmin@stinger2]/export/home/gpadmin/gp40>gpssh
-f host_file /usr/ucb/ps -auxww |grep con76134
[stinger2] gpadmin 18162 1.7
6.0386000124480 ? S 09:57:55 0:04 postgres: port 4000, gpadmin gpdb [local]
con76134 [local] cmd3 CREATE DATABASE.......................................
[stinger2] gpadmin 18625 0.3
2.726056455932 ? S 10:01:56 0:01 postgres: port 40000, gpadmin gpdb
10.5.202.12(18864) con76134 seg0 cmd4 MPPEXEC UTILITY...............................
[stinger2] gpadmin 18669 0.0
0.1 3624 752 pts/2 S 10:02:36 0:00 grep con76134
[stinger3] gpadmin 22289 0.8
9.4531860196404 ? S 09:36:20 0:05 postgres: port 40000, gpadmin gpdb
10.5.202.12(18866) con76134 seg1 cmd4 MPPEXEC UTILITY...............................
51. What kind of locks should we focus on
MPP system when system is slow /hung?
Answer:
Locks
that are held for a very long time and multiple other queries are waiting for
that lock also.
52. How do I monitor user activity history
in Greenplum database ?
Answer: Use Greenplum performance monitor (gpperfmon), which has GUI to monitor
and query performance history.
53. What is Greenplum performance monitor
and how to install ?
Answer: It’s a monitoring tool that collects
statistics on system and query performance and builds historical data.
54. when the client connects does he
connect to the Master or segment node?
Answer: Master
55. Can you explain the process of data
migration from Oracle to Greenplum?
Answer: There are many ways.
Simplest
steps are Unload data into csv files, create tables in greenplum database
corresponding to Oracle, Create external table,
start gpfdist pointing to external table location, Load data into
greenplum.
You
can also use gpload utility. Gpload creates external table at runtime.
56. Which command would you use to backup
a database?
Answer: gp_dump,
gpcrondump, pg_dump, pg_dumpall, copy
57. When you restore from a backup taken
from gp_dump, can you import a table?
Answer: NO. Yes if during the gp_dump you backed up
one table only.
58. Can you open and view a dump file?
Answer: Yes
59. Which option would you use to export
the ddl of the database or table?
Answer: -s (-s | --schema-only Dump only the
object definitions (schema), not data.)
60. When a user submits a query, where
does it run in Master or segment nodes?
Answer: Segment nodes
61. If you configure your with Master and
Segment nodes, where would the data reside?
Answer: Segment nodes
62. How would go about query tuning? OR
What would you do when a user or users are complaining that a particular
query is running slow?
Answer: 1) look at the query plan
2) Look at the stats of the
table/tables in the query
3) look at the table distribution keys and
joins in the query
4)
look at the network performance
5) look at the resource queues
6) look at the interconnect performance
7) look at the join order of tables in the
query
8) look at the query itself i.e. if it can
be written in more efficient way
63. What would you do to gather statistics
in the database? as well as reclaim the space?
Answer: VACUUM FULL , CTAS .
A
VACUUM FULL will reclaim all expired row space, but is a very expensive
operation and may take an unacceptably long time to finish on large,
distributed Greenplum Database tables. If you do get into a situation where the
free space map has overflowed, it may be more timely to recreate the table with
a CREATE TABLE AS statement and drop the old table. A VACUUM FULL is not recommended
in Greenplum Database.
64. How would you implement compression
and explain possible the compression types?
Answer: There are two types of in-database
compression available in the Greenplum Database for append-only tables:
1.
Table-level compression is applied to an entire table.
2.
Column-level compression is applied to a specific column. You can apply
different column-level compression algorithms to different columns.
65. What are major differences between
Oracle and Greenplum?
Answer: Oracle is relational database. Greenplum is
MPP nature. Greenplum is shared nothing architecture. There are many other
differences in terms of functionality and behavior.
66. What is good and bad about the
Greenplum, compared to Oracle and Greenplum?
Answer: Greenplum is built on top of Postgresql . It
is shared nothing, MPP architecture best for data warehousing environment. Good
for big data analytics purpose.
Oracle
is an all-purpose database.
67. How would troubleshoot an
issue/error/problem, when there is no one available to help you or you are all
by yourself?
Answer: Look at the log files. querying the gp
perfmon and tools schema for various data and statistics.
68. Can you write stored procedures in
Greenplum/Postgres?
Answer: No
69. Can you create stored functions and
use them in Greenplum?
Answer: Yes
70. Can you do partitioning in Greenplum
tables?
Answer: Yes
71. Which parameters can you use to manage
workload in a Greenplum database?
Answer: workload
management is done by creating resource queues and assigning various limits.
72. Tell me some of the
aspects/implementations/configurations you have done in Greenplum?
Answer: Configuration like port change, Active
directory authentication, pg_hba.conf changes, postgresql.conf changes, gpfdist
etc
73. How to check if my session queries are
running or waiting on locks?
Answer: Check "waiting" column in
pg_stat_activity and "granted" column in pg_locks for any object
level locks.
74. What is the maximum number of
columns that can be present in the "Select" statement?
Answer: 1664
75. What is the maximum number of
URL's that can be placed in the "Location" part of the external table
creation?
Answer: Example:
CREATE EXTERNAL TABLE ext3 ( a int )
LOCATION ('gpfdist://mdw:8081/a1','gpfdist://mdw:8081/a2',.....,'gpfdist://mdw:8081/a999',) FORMAT 'TEXT' (DELIMITER '|');
There is no upper limit for the max number of URL's that can be
added to the LOCATION clause , what restrict is the amount of size the location
clause which is 32712 (32KB) , if you exceed you might hit with the error:
ERROR: row is too big: size 42872, maximum size 32712
So if you planning to add many URL's in the Location clause ,
would recommend to move them on to a separate folder and start the gpfdist file
server program in the background using
gpfdist -p 8081 -d <file_location> -l /home/gpadmin/log &
and write the create external command ( to the one below ) to read
all the files in the location
CREATE EXTERNAL TABLE ext3 ( a int )
LOCATION ('gpfdist://mdw:8081/*') FORMAT 'TEXT' (DELIMITER '|');
Also another limit is the FORMAT clause can take in
only 8191 bytes , if you exceed you may end up with error
"ERROR: format options must be less than 8191 bytes in size".
76. What is the max text length of the SQL
statement?
Answer: No limit , but Pivotal Greenplum do control the length of the text
using the max_stack_depth parameter
and OS control it using the stack limit defined by Unix stack size , so if the
customer has a bigger text , then increase the stack size from the OS first and
then alter the max_stack_depth in Greenplum and restart the database after each
changes , since to take advantage of the new stack size , the green plum
process needs to be recycled.
77. How to determine the disk/network speed
between servers?
Answer: GPDB has a utility called "gpcheckperf" that can help to
determine the speed of the Disk / Network / Memory Bandwidth
78. How to do I understand which
process/query is consuming most of the segment memory?
Answer: Execute mem_watcher script available in"Pivotal Support Package" to track the
memory consumption of the queries on the segments.
Review the
output once an OOM is observed to identify possible queries or processes.
79. How do i determine if my table in a
query needs a analyze?
Answer: Check the "gp_stat_check" utility
80. What is an orphan process?
Answer: Orphan processes are postgres/greenplum process that exists in
some segments and there is no information about its gang members on any other
segments.
81. Will these orphan processes cause a
performance issue?
Answer: Yes, they can if the process is holding locks on a relation
on that segments and subsequent queries are waiting for this lock to be
released.
82. What is data skew ?
Answer: Data skew happens if data is not equally
distributed across all active segment instances.
83. what is Data directory ?
Answer: The data directory is the file system
location on disk where database data is stored. The master data directory
contains the global system catalog only — no user data is stored on the master.
The data directory on the segment instances has user data for that segment plus
a local copy of the system catalog. The data directory contains several
subdirectories, control files, and configuration files as well.
84. What is Distribution key ?
Answer: In a Greenplum table that uses hash
distribution, one or more columns are used as the distribution key, meaning
those columns are used to divide the data among all of the segments. The
distribution key should be the primary key of the table or a unique column or
set of columns.
85. What is Distribution policy ?
Answer: The distribution policy determines how
to divide the rows of a table among the Greenplum segments. Greenplum Database
provides two types of distribution policy: hash distribution and random
distribution.
86. What is Greenplum array ?
Answer: A Greenplum array aka Greenplum system, or
simply Greenplum database, is a set of individual segments host designed to
work together in parallel, and a master server to distribute queries across
segments.
87. What is Heap tables ?
Answer: Whenever you create a table without
specifying a storage structure, the default is a heap storage structure. In a
heap structure, the table is an unordered collection of data that allows
multiple copies or versions of a row. Heap tables have row-level versioning
information and allow updates and deletes.
88. What is Master host / node ?
Answer: The master is a dedicated processing unit
with its own CPU, disks, memory, and network interface. It is responsible for
accepting and distributing queries. It is the entry point into the Greenplum
system and does not store data.
89. What is Mirror ?
Answer: A mirror segment is a standby segment
typically used in high-availability environments that is activated should its
corresponding primary segment no longer be available. Mirrors are optional.
90. What is motion node ?
Answer: A motion node is a portion of a query
execution plan that indicates data movement between the various database
instances of Greenplum Database (segments and the master). Some operations,
such as joins, require segments to send and receive tuples to one another in
order to satisfy the operation. A motion node can also indicate data movement
from the segments back up to the master.
91. What is Primary segment ?
Answer: A primary segment stores a distinct portion
of data and is responsible for handling queries.
92. What is Processing skew ?
Answer: If you distribute your data on a good column
where the data is laid out well, but some segments end up working harder than
others, you introduce processing skew.
93. what is pg_catalog ?
Answer: pg_catalog contains the system catalog
tables, built-in data types, functions, and operators. It is always part of the
schema search path, even if it is not explicitly named in the search path.
94. What is pg_toast ?
94. What is pg_toast ?
Answer: pg_toast stores large objects such as
records that exceed the page size. This schema is used internally by the
Greenplum Database system.
95. What is pg_bitmapindex ?
Answer: pg_bitmapindex stores bitmap index objects
such as lists of values. This schema is used internally by the Greenplum
Database system.
96. What is pg_aoseg ?
Answer: pg_aoseg stores append-only table objects.
This schema is used internally by the Greenplum Database system.
97. What is Query dispatcher (QD) ?
Answer: The query dispatcher (QD) is a process that
is initiated when users connect to the master and issue SQL commands. This
process represents a user session and is responsible for sending the query plan
to the segments and coordinating the results it gets back. The query dispatcher
process spawns one or more query executor processes to assist in the execution
of SQL commands.
98. What is Query executor (QE) ?
Answer: A query executor process (QE) is associated
with a query dispatcher (QD) process and operates on its behalf. Query executor
processes run on the segment instances and execute their slice of the query plan
on a segment.
99. What is Query plan (QP) ?
Answer: A query plan is the set of operations
that Greenplum Database will perform to produce the answer to a given query.
Each node or step in the plan represents a database operation such as a table
scan, join, aggregation or sort. Plans are read and executed from bottom to
top. Greenplum Database supports an additional plan node type called a motion
node.
100. What is Random Distribution ?
Answer: With random distribution, table rows are
sent to the segments as they come in, cycling across the segments in a
round-robin fashion. Rows with columns having the same values will not
necessarily be located on the same segment. Although a random distribution
ensures even data distribution, there are performance advantages to choosing a
hash distribution policy whenever possible.
101. What is Resource queue ?
Answer: A resource queue has attributes that limit
the size and/or total number of queries that can be executed by the users (or
roles) in that queue.
102. Redistribute motion ?
Answer: To perform a local join on the segment
instance, matching rows must be located together on the same segment instance.
In the case where data was not distributed on the join key, a dynamic
redistribution of the needed rows form one of the tables to another segment
instance will be performed.
103. What is segment ?
Answer: A segment represents a portion of data in a
Greenplum database. User-defined tables and their indexes are distributed
across the available number of segment instances in the Greenplum Database
system. Each segment instance contains a distinct portion of the user data. A
primary segment instance and its mirror both store the same segment of data.
104. What is Segment instance ?
Answer: A Postgres database server process, simply
called a segment.
105. What is Segment host / node ?
Answer: A node or segment host, in a Greenplum
system is a dedicated processing unit with its own CPU, disks, memory, and
network interface. The segment host runs database server processes. It supports
1-n segments depending on the number of processors / cores.
106. What is slice ?
Answer: In order to achieve maximum parallelism
during query execution, Greenplum divides the work of the query plan into
slices. A slice is a portion of the plan that can be worked on independently at
the segment level. A query plan is sliced wherever a motion node occurs in the
plan, one slice on each side of the motion. Plans that do not require data
movement (such as catalog lookups on the master) are known as single-slice plans.
107. What is Shared-nothing architecture ?
Answer: In contrast to a shared-everything
architecture in which resources are used by many different applications, a
shared nothing architecture grants usage to resources to the same application
throughout all the time.
108. What is System catalog ?
Answer: The global system catalog resides on the
master and denotes the set of system tables that contain metadata about the
Greenplum Database system itself.
109. What is Scatter / Gather technique ?
Answer: Scattering denotes the process of getting
all data from all source systems to all segment instances in the Greenplum
system. Each segment instance in the system gathers the data it is responsible
for.
110. What is gpdbrestore and how to work ?
Answer: The Greenplum Database parallel restore
utility gpdbrestore takes the timestamp key generated by gpcrondump, validates
the backup set, and restores the database objects and data into a distributed
database in parallel. Parallel restore operations require a complete backup set
created by gpcrondump, a full backup and any required incremental backups.
The
Greenplum Database gpdbrestore utility provides flexibility and verification
options for use with the automated backup files produced by gpcrondump or with backup files moved
from the Greenplum array to an alternate location.
gpdbrestore -T faa.otp_r -s tutorial -u /tmp -a
No comments:
Post a Comment