1. Customer
is hitting bug 4462367 with an error message saying low open file descriptor,
how do I
work
around this until the fix is released with the Oracle Clusterware Bundle for
10.2.0.3 or 10.2.0.4 is released?
The fix for "low open
file descriptor" problem is to increase the ulimit for Oracle Clusterware.
Please be
careful when
you make this type of change and make a backup copy of the init.crsd before you
start!
To do this, you can modify
the init.crsd as follows, while you wait for the patch:
1. Stop Oracle Clusterware on
the node (crsctl stop crs)
2. copy the
/etc/init.d/init.crsd
3. Modify the file changing:
# Allow the daemon to drop a
diagnostic core file/
ulimit -c unlimited
ulimit -n unlimited
to
# Allow the daemon to drop a
diagnostic core file/
ulimit -c unlimited
ulimit -n 65536
4. restart Oracle Clusterware
in the node (crsctl start crs)
2. What
methods does QoS Management support for classifying applications and workloads?
QoS Management use database
entry points to “tag” the application or workload with user-specified names. Database
sessions are evaluated against classifiers that are sets of Boolean expressions
made up of Service Name, Program, User, Module and Action.
3. What is
the overhead of using QoS Management?
The QoS Management Server is
a set of Java MBeans that run in a single J2EE container running on one node in
the cluster. Metrics are retrieved from each database once every five seconds.
Workload classification and tagging only occurs at connect time or when a
client changes session parameters.
Therefore the overhead is
minimal and is fully accounted for in the management of objectives.
4. Does
QoS Management negatively affect an application’s availability?
No, the QoS Management server
is not in the transaction path and only adjusts resources through already existing
database and cluster infrastructure. In fact, it can improve availability by
distributing workloads within and cluster and prevent node evictions caused my
memory stress with its automatic Memory Guard feature.
5. What
happens should the QoS Management Server fail?
The QoS Management Server is
a managed Clusterware singleton resource that is restarted or failed over to another
node in the cluster should it hang or crash. Even if a failure occurs, there is
no disruption to the databases and their workloads running in the cluster. Once
the restart completes, QoS Management will continue managing in the exact state
it was when the failure occurred.
6. What is
Memory Guard and how does it work?
Memory Guard is an exclusive
QoS Management feature that uses metrics from Cluster Health Monitor to evaluate
the stress of each server in the cluster once a minute. Should it detect a node
has over-committed memory, it will prevent new database requests from being
sent to that node until the current load is relieved. It does this my turning
off the services to that node transactionally at which point existing work will
begin to drain off. Once the stress is no longer detected, services will
automatically be started and new connections will resume.
7. How
does QoS Management enable the Private Database Cloud?
The Private Database Cloud
fundamentally depends upon shared resources. Whether deploying a database service
or a separate database, both depend upon being able to deliver performance with
competing workloads. QoS Management provides both the monitoring and management
of these shared resources, thus complementing the flexible deployment of
databases as a service to also maintain a consistent level of performance and
availability.
8. Which
versions of Oracle databases does QoS Management support?
QoS Management is supported
on Oracle RAC EE and RAC One EE databases from 11g Release 2 (11.2.0.2) forward
deployed on Oracle Exadata Database Machine. It is also supported in
Measure-Only Mode with Memory Guard support on Oracle RAC EE and RAC One EE
databases from 11g Release 2 (11.2.0.3) forward. Please consult the Oracle
Database License Guide for details.
9. Is this
a product to be used by an IT administrator or DBA?
The primary user of QoS
Management is expected to be the IT or systems administrator that will have QoS
administrative privileges on the RAC cluster. As QoS Management actively
manages all of the databases in a cluster it is not designed for use by the DBA
unless that individual also has the cluster administration responsibility. DBA
level experience is not required to be a QoS Management administrator.
10. How to
use SCAN and node listeners with different ports?
Oracle SCAN was designed to
be the Single Client Access entry point to a database cluster and various Oracle
databases in this cluster. However, most of these entries assume a simple
configuration, regarding the ports and numbers of listeners in the cluster.
Basically, the assumption is that 1 SCAN listener, running on 1-3 nodes in the
cluster, will work with 1 node listener, running on all of the nodes in the
cluster. In addition, most examples assume that both listeners actually use the
same port (default 1521).
Quite a few customers,
nevertheless, want to use dedicated listeners per database either on the
same or a different port. There is no general requirement to do this using
an Oracle RAC 11g Release 2, as the overall idea is that any client will use the
SCAN as its initial entry point and will then be connected to the respective
instance and service on the node this service is most suitably served on using
the node listener on this node.
This assumes that the respective database
that the instance belongs to and that the service is assigned to uses the
correct entries for the LOCAL_LISTENER and REMOTE_LISTENER instance parameters.
The defaults for the case described would be: LOCAL_LISTENER points to the node
listener on the respective node and the REMOTE_LISTENER points to the SCAN. Example:
remote_listener:
cluster1:1521
local_listener:(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)
(HOST=192.168.0.61)(PORT=1521))))
Any Oracle 11g Rel. 2 database that is
created using the DBCA will use these defaults. In this context, some fundamentals
about listeners in general and the listener architecture in Oracle RAC 11g
Release 2 need to be understood in order to follow the examples below:
_ With Oracle RAC 11g
Release 2 using SCAN is the default.
_ SCAN is a combination of
an Oracle managed VIP and a listener.
_ The SCAN listener
represents a standard Oracle listener used in a certain way.
_ As with other listeners,
there is no direct communication between the node and the SCAN listeners.
_ The listeners are only
aware of the instances and services served, since the instances (PMON) register
themselves and the services they host with the listeners.
_ The instances use the
LOCAL and REMOTE Listener parameters to know which listeners to register with.
_ Any node listener is
recommended to be run out of the Oracle Grid Infrastructure home, although the
home that a listener uses can be specified.
_ Listeners used for a
client connection to Oracle RAC should be managed by Oracle Clusterware and should
be listening on an Oracle managed VIP.
Given these fundamentals, there does not
seem to be a compelling use case, why
multiple listeners or dedicated listeners per database should be used with 11g
Rel. 2 RAC, even if they where used in previous versions. The most
reasonable use case seems to be manageability in a way that some customers
prefer to stop a listener to prevent new client connections to an assigned
database as opposed to stopping the respective services on the database, which
mainly has the same effect (note that the standard database service - the one
that is named after the database name - must not be used to connect clients to
an Oracle
RAC database anyways, although being used
in this example for simplicity reasons.)
If the motivation to
have this setup is to assign certain listeners as an entry point to certain
clients, note that this would defeat the purpose of SCAN and therefore SCAN
cannot be used anymore. SCAN only supports one address in the TNS connect
descriptor and allows only 1 port assigned to it. This port does not have to be
same as the one that is used for the node listeners (which would be the
default), but it should only be one port (Bug 10633024 - SRVCTL ALLOWS
SPECIFYING MORE THAN ONE PORT FOR SCAN (- P PORT1,PORT2,PORT3) - has been filed
for Oracle RAC 11.2.0.2, as this version allows setting more than one port
using SRVCTL). Consequently, a typical client TNSNAMES entry for the client to
connect to any database in the cluster would look like the following:
testscan1521 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL =
TCP)(HOST = cluster1)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = ORCL)
))
In this TNSNAMES entry
"cluster1" is the SCAN name, typically registered in the DNS as
mentioned. This entry will connect any client using "testscan1521" to
any database in the cluster assuming that node listeners are available and the
database is configured accordingly using the following configuration:
remote_listener:
cluster1:1521
local_listener:(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)
(HOST=192.168.0.61)(PORT=1521))))
If the motivation to
have dedicated listeners for the database is so that clients would get
different connection strings to connect to the database (e.g. different host
entries or ports) SCAN cannot be used and the node listeners need to be
addressed directly,
as it used to be the case with previous versions of Oracle RAC. In this case,
the SCAN is basically not used for client connections. Oracle does not recommend
this configuration, but this entry will explain its configuration later on.
Change the port of the
SCAN listeners only
Note 1: in the following only 1
SCAN listener is used for simplification reasons.
_ Get the name of the scan
listener: srvctl status scan_listener returns: LISTENER_SCAN1
_ Get the port of the scan
listener: lsnrctl status LISTENER_SCAN1 returns: 1521
_ Change the port of the
SCAN listener: srvctl modify scan_listener -p 1541 new port 1541
_ Restart the SCAN
listener: srvctl stop scan_listener followed by srvctl start scan_listener
_ Double-check using lsnrctl
status LISTENER_SCAN1 -
this should show port 1541
Note 2: Your SCAN listener
does not serve any database instance at this point in time, as the database has
not been informed about the change in port for the SCAN or their remote
listener. In order to have the database instances register with the SCAN
listener using the new port, you must alter the REMOTE_LISTENER entry
accordingly:
_ alter
system set remote_listener='cluster1:1541' scope=BOTH SID='*';
_ alter
system register;
_ Double-check using lsnrctl
status LISTENER_SCAN1 that
the instances have registered.
With this change the following
configuration has been established:
_ The SCAN listener port
has been changed to port 1541 (was: 1521)
_ The node listeners -
here named LISTENER - still use port 1521
_ In order for clients to
be able to connect, change their TNSNAMES.ora accordingly:
testscan1541 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL =
TCP)(HOST = cluster1)(PORT = 1541))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = ORCL)
))
Add additional node
listeners to the system using different ports
So far, only one node listener (listener
name LISTENER) on the respective node VIP (here: 192.168.0.61) on port 1521 has
been used. The idea of having dedicated listeners per database would mean that
additional node listeners need to be created, using the same IP, but preferably
different ports. In order to achieve this configuration, perform the following
steps (the Grid Infrastructure software owner should have enough privileges to
perform these steps, hence the user is not explicitly mentioned):
_ Add an additional node
listener using port 2011 for example: srvctl add listener -l LISTENER2011
-p 2011
_ Start the new node
listener: srvctl start listener -l LISTENER2011
_ Double-check using: srvctl
status listener -l LISTENER2011
_ Double-check using: lsnrctl
status LISTENER2011
Note 1: The srvctl command
"add listener" does allow specifying an Oracle Home that the newly
added listener will be running from and yet have this listener be managed by
Oracle Clusterware. This entry does not elaborate on these advanced
configurations.
Note 2: Your new node listener
does not serve any database instance at this point in time, as the database has
not been informed that it should connect to the newly created listener. In
order to have the database instances register with this listener, you must alter
the LOCAL_LISTENER entry for each instance accordingly:
_ alter
system set local_listener='(DESCRIPTION= (ADDRESS_LIST= (ADDRESS=
(PROTOCOL=TCP)(HOST=192.168.0.61)(PORT=2011))))'
scope=BOTH SID='OCRL1';
_ alter
system register;
_ Double-check using lsnrctl
status LISTENER2011 that
the instance has registered.
Note 3: It is crucial to use
spaces between the various segments of the command as shwon above (for example).
Reason: the database agent in Oracle Clusterware currently determines whether
the local_listener or remote_listener have been manually set by a string
comparison operation. If the string looks like it is not manually altered, the
agent will overwrite these parameters with the default values that it
determines on instance startup. In order to prevent a reset of these parameters
at instance startup and thereby make this setting persistent across instance
starts, slight modifications in the string used for this parameter are
required.
ER 11772838 has been filed to allow for a
more convenient mechanism.
Note 4: As the LOCAL_LISTENER
parameter is a per instance parameter, perform this change on all nodes that
the database is running on accordingly.
Note 5: This example so far
assumed that only one database (ORCL) is used in the system, with the SCAN name
"cluster1" and now using "LISTENER2011", listening on port
2011, as the new node listener. Before the new node listener was created, the
listener with the name "LISTENER" used to be the default node
listener.
This listener, listening on port 1521, has
not been removed yet and can therefore now be used as a dedicated listener for
additional databases added to the system for example. In order to ensure that
those databases will use this listener, the LOCAL_LISTENER instance parameter
should point to this listener as follows:
local_listener:(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)HOST=192.168.0.61)(PORT=1521))))
Note 6: The clients'
TNSNAMES.ora files do not need to be modified in this case, as the SCAN remains
as the primary entry point for clients to connect to databases in the cluster. This
is the beauty of SCAN. With this change the following configuration has
been established:
_ The SCAN listener port
remains on port 1541 (was: 1521)
_ The node listener used
by database ORCL is now called LISTENER2011, listening on port 2011
_ In order for clients to
be able to connect to this database, no change to their TNSNAMES.ora is
required.
They still use:
testscan1541 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL =
TCP)(HOST = cluster1)(PORT = 1541))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = ORCL)
))
_ Even, if more databases
are added to the cluster, using the default node listener "LISTENER",
still listening on port 1521 in this example, the client TNSNAMES.ora would not
change. Again, this is the beauty of SCAN.
Use the node listeners
as the primary entry point directly
Continuing the previous example, the
following configuration is assumed for the next steps:
_ The SCAN listener port
remains on port 1541 - SCAN name is "cluster1"
_ The node listener used
by database ORCL is now called LISTENER2011, listening on port 2011
_ The node listener used
by database FOOBAR is called LISTENER, listening on port 1521
In order for clients to connect to the
databases ORCL and FOOBAR, but not using SCAN, a TNSNAME.ora entry for each
database must be used. The pre-Oracle 11g Rel. 2 RAC paradigm must be followed
in this case. Hence, one typical TNSNAMES.ora entry for the example used here
would look like the following:
ORCL =
(DESCRIPTION =
(ADDRESS = (PROTOCOL =
TCP)(HOST = node1)(PORT = 2011))
(ADDRESS = (PROTOCOL =
TCP)(HOST = node2)(PORT = 2011))
(ADDRESS = (PROTOCOL =
TCP)(HOST = node...)(PORT = 2011))
(ADDRESS = (PROTOCOL =
TCP)(HOST = nodeN)(PORT = 2011))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = ORCL)
))
FOOBAR =
(DESCRIPTION =
(ADDRESS = (PROTOCOL =
TCP)(HOST = node1)(PORT = 1521))
(ADDRESS = (PROTOCOL =
TCP)(HOST = node2)(PORT = 1521))
(ADDRESS = (PROTOCOL =
TCP)(HOST = node...)(PORT = 1521))
(ADDRESS = (PROTOCOL =
TCP)(HOST = nodeN)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = FOOBAR)
))
Each database (ORCL and FOOBAR) on the
other hand must be adjusted to register with the local and remote listener(s)
logically "assigned" to the respective database. This means for ORCL's
first instance:
local_listener:(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)
(PORT=2011))))
remote_listener:(DESCRIPTION=(ADDRESS_LIST=(ADDRESS
= (PROTOCOL = TCP)(HOST =node2)(PORT = 2011))(ADDRESS = (PROTOCOL = TCP)(HOST =
node...)(PORT = 2011))(ADDRESS = (PROTOCOL = TCP)(HOST = nodeN)(PORT = 2011))))
For FOOBAR's first instance this
means:
local_listener:(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)
(PORT=1521))))
remote_listener:(DESCRIPTION=(ADDRESS_LIST=(ADDRESS
= (PROTOCOL = TCP)(HOST =
node2)(PORT =
1521))(ADDRESS = (PROTOCOL = TCP)(HOST = node...)(PORT = 1521))
(ADDRESS = (PROTOCOL =
TCP)(HOST = nodeN)(PORT = 1521))))
Note 1: Unlike when using
SCAN, you can use a server side TNSNAMES.ora to resolve the local and remote listener
parameters as it used to be recommended for pre-Oracle RAC 11g Release 2
databases. With Oracle RAC 11g Rel. 2, the use of SCAN would make this
unnecessary.
Note 2: Avoiding the necessity
to set parameters for each database and to change those every time the cluster
and the databases change with respect to he number of nodes, is the reason you
should use SCAN.
11. How to change the SCAN configuration
after the Oracle Grid Infrastructure 11g Release 2
installation is complete?
Use SRVCTL to modify the SCAN.
In order to make the cluster aware of the
modified SCAN configuration, delete the entry in the hosts-file or make sure
that the new DNS entry reflects (depending on where you have setup your SCAN
name resolution in the first place) and then issue: "srvctl modify scan -n "
as the root user on one node in the cluster.
The scan_name provided can be the
existing fully qualified name (or a new name), but should be resolved through
DNS, having 3 IPs associated with it. The remaining reconfiguration is then
performed automatically. A successful reconfiguration will result in 3 SCAN
VIPs and 3 SCAN_LISTENERS in the cluster, enabling to load balancing of
connections to databases running in the cluster. Each SCAN_LISTENER listens on
one of the SCAN VIP addresses.
Most changes to the SCAN configuration
can be performed using 'srvctl modify scan'. This includes name changes
(changes to the SCAN name) and IP address changes (assuming that the new IP
addresses are taken from the same subnet as the old ones). Removing and
adding-back the SCAN configuration should not be required. However, the SCAN
listeners may need to be restarted using 'srvctl stop / start scan' to reflect
an IP address change, if the IP addresses were changed.
Also note that updating the SCAN name
might require to change the remote_listener settings for the various Oracle RAC
databases in the cluster, since the default configuration would be to have the
remote_listener parameter for an Oracle RAC database point to the SCAN name. If
the SCAN name changes, the parameter needs to be updated manually for each
database.
12. Why am I only using 1 out of 3 SCAN
IP addresses?
The SCAN name must be set up to round robin
across 3 IP addresses. This requires a SCAN name resolution via either DNS or
the new Oracle Grid Naming Service (GNS). Using the hosts-file (Linux:
/etc/hosts), you will only get 1 SCAN IP and you cannot work around this other than
using the formerly mentioned DNS or GNS based name resolution.
Trying to work around this restriction by
setting up a hosts-file entry like the following one will not work as expected
and should therefore be avoided, since it is a non-conformant use of the
hosts-file:
# SCAN addr
192.21.101.74 rac16-cluster.example.com
rac16-cluster
192.21.101.75 rac16-cluster.example.com
rac16-cluster
192.21.101.76 rac16-cluster.example.com
rac16-cluster
Even with such a hosts-file entry, you
will only get 1 SCAN VIP and 1 SCAN Listener.
IF you have set up a DNS based SCAN name
resolution and you still notice that the client would only use one IP address
(out of the three IP addresses that are resolved via SCAN), make sure that the
SCAN addresses are returned by the DNS in a round robin manner. You can check
the SCAN configuration in DNS using “nslookup”. If your DNS is set up to
provide round-robin access to the IPs resolved by the SCAN entry, then run the
“nslookup” command at least twice to see the round-robin algorithm work. The
result should be that each time, the “nslookup” would return a set of 3 IPs in
a different order.
13. How to
install Oracle Grid Infrastructure using SCAN without using DNS?
Oracle Universal Installer (OUI) enforces
providing a SCAN resolution during the Oracle Grid Infrastructure installation,
since the SCAN concept is an essential part during the creation of Oracle RAC
11g Release 2 databases in the cluster. All Oracle Database 11g Release 2 tools
used to create a database (e.g. the Database Configuration Assistant (DBCA), or
the Network Configuration Assistant (NetCA)) would assume its presence. Hence,
OUI will not let you continue with the installation until you have provided a
suitable SCAN resolution.
However, in order to overcome the
installation requirement without setting up a DNS-based SCAN resolution, you
can use a hosts-file based workaround. In this case, you would use a typical
hosts-file entry to resolve the SCAN to only 1 IP address and one IP address
only. It is not possible to simulate the round-robin resolution that the DNS
server does using a local host file. The host file look-up the OS performs will
only return the first IP address that matches the name. Neither will you be
able to do so in one entry (one line in the hosts-file). Thus, you will create
only 1 SCAN for the cluster. (Note that you will have to change the hostsfile
on all nodes in the cluster for this
purpose.)
This workaround might also be used when
performing an upgrade from former (pre-Oracle Database 11g Release 2) releases.
However, it is strongly recommended to enable the SCAN configuration as
described under “Option 1” or “Option 2” above shortly after the upgrade or the
initial installation. In order to make the cluster aware of the modified SCAN
configuration, delete the entry in the hosts-file and then issue: "srvctl
modify scan -n " as the root user on one node in the cluster. The
scan_name provided can be the existing fully qualified name (or a new name),
but should be resolved through DNS, having 3 IPs associated with it, as
discussed. The remaining reconfiguration
is then performed automatically.
14. How
can I add more SCAN VIPs or listeners not using DNS?
You can only create the 3 SCAN VIPs and 3
SCAN Listeners across the cluster, if you have a DNS alias either at installation
time or later. You need to resolve the SCAN Name to those formerly mentioned 3
IP addresses at the moment of creation or when modifying the SCAN. This is how
they get created - the IPs that are resolved by the SCAN DNS entry are read and
the respective VIPs get created.
IF you have no DNS at all at hand at any
time, especially not for the servers in your cluster, you will not get 3 SCAN
VIPs in your cluster and hence you will have only 1 VIP, which can be
considered a single point of failure.
This means that you have 2 choices: You
can either live with this configuration and the respective consequences OR you
can fall back to using the nodes VIPs of the cluster to connect your clients
to, neither of which is recommended, as mentioned in My Oracle Support note
with DOC-Id. 887522.1
15. Is it recommended that we put the
OCR/Voting Disks in Oracle ASM and, if so, is it preferable to
create a separate disk group for them?
With Oracle Grid Infrastructure 11g
Release 2, it is recommended to put the OCR and Voting Disks in Oracle ASM,
using the same disk group you use for your database data. For the OCR it is
also recommended to put another OCR location into a different disk group
(typically, the Fast Recovery Area disk group) to provide additional protection
against logical corruption, if available.
Using the same disk groups for the Oracle
Clusterware files (OCR and Voting Disks) simplifies (you do not have to create
special devices to store those files) and centralizes the storage management
(all Oracle related files are stored and managed in Oracle ASM), using the same
characteristics for the data stored.
If the Voting Disks are stored in an
Oracle ASM disk group, the number of Voting Disks that will be created in this
disk group and for the cluster is determined by the redundancy level of the
respective disk group. For more information.The Voting Disks for a particular
cluster can only reside in one disk group.
In case "external redundancy"
has been chosen for the disk group that holds the database data, it is assumed that
an external mechanism (e.g. RAID) is used to protect the database data against
disk failures. The same mechanism can therefore be used to protect the Oracle
Clusterware files, including the Voting Disk (only one Voting Disk is created).
Under certain circumstances, one may want
to create a dedicated disk group for the Oracle Clusterware files (OCR and
Voting Disks), separated from the existing database data containing disk
groups. This should not be required, but can be configured. Potential scenarios
include, but are not limited to:
A 1:1 relationship between disk groups
and databases is preferred and disk groups are generally not shared amongst
databases.
The backup and recovery for individual
databases (more than one in the cluster) is based on a snapshot restore
mechanism (BCVs). This approach is most likely used in conjunction with a 1:1
disk group to database relationship as mentioned before.
Certain and frequent system specific
maintenance tasks uncommonly require to unmount specific, database data
containing disk groups. This scenario can most likely be avoided using a
different approach for those maintenance tasks.
A higher protection level than the one
provided for the "external redundancy disk groups" and therefore for the
database data is for some reason required for the Oracle Clusterware files.
16. How to efficiently recover from a
loss of an Oracle ASM disk group containing the Oracle Clusterware files?
If an Oracle ASM disk group containing
Oracle database data and the Oracle Clusterware files is lost completely, the
system needs to be restored starting with the restore of the Oracle Clusterware
files affected.
Note: Oracle recommends to have two disk
groups as a standard deployment scenario: the database data containing disk
group (commonly referred to as the DATA disk group) and the backup data
containing disk group (commonly referred to as the FRA disk group).In this
configuration, the Oracle Voting Files(s) and the first Oracle Cluster Registry
(OCR) location should share the same disk group as the Oracle Database data, here
the DATA disk group. A second OCR location should be placed into the second
disk group, here FRA, using "ocrconfig -add +FRA" as
root, while the cluster is running.
A complete failure of the FRA disk group
would be without effect for the overall cluster operation in this case.
A complete failure of the DATA disk group
instead will require a restore of the Oracle Voting Files and the Oracle
database data that were formerly stored in this disk group.
The most
efficient restore procedure in this case is outlined as follows:
Start the cluster in exclusive mode on
one node using "crsctl start crs -excl" (root
access required).
Ensure that the cluster is running
properly using "crsctl check crs" and that the
FRA disk group is mounted.
The FRA disk group contains the copy of
the OCR that contains a backup of the Voting Disk data required to restore the
Voting Disk(s).
IF the Cluster Ready Service Daemon
(CRSD) is not running AND an "ocrcheck" fails, you will need
to mark the FRA disk group as the only surviving OCR location using "ocrconfig
-overwrite", followed by a "crsctl stop crs" to
stop the cluster. You will then need to restart the cluster on one node in
exclusive mode again using "crsctl start crs -excl" (root
access required), since the Voting Disks still need to be restored.
Use "crsctl query css votedisk"
to retrieve the list of voting files currently defined.
Use "crsctl replace votedisk +FRA"
assuming the best practices configuration to restore the Voting Files into the
FRA disk group, since the DATA disk group has not been restored yet. The Voting
Files can be replaced later, if required.
Stop the cluster using "crsctl
stop crs".
Start the cluster in normal mode using
"crsctl start crs" - on all nodes in the cluster, as
desired and ensure proper cluster operation using "crsctl check crs".
Re-create the DATA disk group using the
appropriate method foreseen in your restore procedure. IF this procedure does
not foresee to restore the OCR in the DATA disk group (most likely), add the
second OCR location (the first location is now in the FRA disk group) using
"ocrconfig -delete +DATA", followed by "ocrconfig
-add +DATA" (note: the DATA disk group must be mounted on all nodes
in the cluster at this time).
The re-creation of the data in an Oracle
ASM disk group is typically performed by re-creating the DATA disk group and
restoring the database data as required and documented.
Note: In case your Backup and Recovery
scenario is based on BCV copies of the Oracle ASM disk groups, the same
procedure as described above applies, except for the last step:
To restore the DATA disk group, use the
BCV copy and mount the disk group once re-created. With the restore of the DATA
disk group former Oracle Clusterware files are restored as well. This is
without effect for the Voting Disks. Remaining, former Voting Disk data in the
freshly restored DATA disk group is automatically discarded. The OCR location
being restored with the DATA disk group is automatically synced with the OCR location
present in the FRA disk group, latest at the next cluster restart or when a new
OCR writer is chosen.
17. How do
I explain the following phrase in the "Oracle® Clusterware Administration
and Deployment Guide 11g Release 2 (11.2)" to a customer?
Page
2-27:"If Oracle ASM fails, then OCR is not accessible on the node on which
Oracle ASM failed, but the cluster remains operational. The entire cluster only
fails if the Oracle ASM instance on the OCR master node fails, if the majority
of the OCR locations are in Oracle ASM, and if there is an OCR read or write
access, then the crsd stops and the node becomes inoperative."
This was a documentation bug and has been
fixed.
Here is the updated write up (posted in
the online version):
If an Oracle ASM instance fails on any
node, then OCR becomes unavailable on that particular node. If the crsd process
running on the node affected by the Oracle ASM instance failure is the OCR
writer, the majority of the OCR locations are stored in Oracle ASM, and you
attempt I/O on OCR during the time the Oracle ASM instance is down on this
node, then crsd stops and becomes inoperable.
Cluster management is now affected on this particular node. Under no
circumstances will the failure of one Oracle ASM instance on one node affect
the whole cluster.
18. If the root.sh script fails on a
node during the install of the Grid Infrastructure with Oracle
Database 11g Release 2, can I re-run it?
Yes, however you should first fix the
problem that caused it to fail, only then run:
GRID_HOME/crs/install/rootcrs.pl -delete -force
Then rerun root.sh
19. Is the GNS recommended for most
Oracle RAC installations?
The Grid Naming Service (GNS) is a part
of the Grid Plug and Play feature of Oracle RAC 11g Release 2. It provides name
resolution for the cluster. If you have a larger cluster (greater than 4-6
nodes) or a requirement to have a dynamic cluster (you expect to add or remove
nodes in the cluster), then you should implement GNS. If you are implementing a
small cluster 4 nodes or less, you do not need to add GNS. Note: Select GNS during
install assumes that you have a DHCP server running on the public subnet where
Oracle Clusterware can obtain IP addresses for the Node VIPs and the SCAN VIPs.
20. If a
current customer has an Enterprise License Agreement (ELA), are they entitled
to use Oracle
RAC One
Node?
Yes, assuming the existing ELA/ULA
includes Oracle RAC. The license guide states that all Oracle RAC option
licenses (not SE RAC) include all the features of Oracle RAC One Node.
Customers with existing RAC licenses or Oracle RAC ELA's can use those licenses
as Oracle RAC One Node. This amounts to "burning" a Oracle RAC
license for Oracle RAC One Node, which is expensive long term. Obviously if the
ELA/ULA does not include Oracle RAC, then they are not entitled to use Oracle
RAC One Node.
21. Does
Rac One Node make sense in a stretch cluster environment?
Yes. However, remember that most stretch
cluster implementations also implement deparate storage arrays at both
locations. So write latency is still an issue that must be considered since ASM
is still writing blocks to both sites. Anything beyond a metro area
configuration is likely to introduce too much latency for the application to
meet performance SLAs.
22. How
does RAC One Node compare with virtualization solutions like VMware?
RAC One Node offers greater benefits and
performance than VMware in the following ways:
- Server Consolidation: VMware
offers physical server consolidation but imposes a 10%+ processing overhead to
enable this consolidation and have the hypervisor control access to the systems
resources. RAC One Node enables both physical server consolidation as well as
database consolidation without the additional overhead of a hypervisor-based
solution like VMware.
- High Availability: VMware
offers the ability to fail over a failed virtual machine – everything running
in that vm must be restarted and connections re-established in the event of a
virtual machine failure.
VMware cannot detect a failed process
within the vm – just a failed virtual machine. RAC One Node offers a
finer-grained, more intelligent and less disruptive high availability model.
RAC One Node can monitor the health of the database within a physical or
virtual server. If it fails, RAC One Node will either restart it or migrate the
database instance to another server. Oftentimes, database issues or problems
will manifest themselves before the whole server or virtual machine is
affected. RAC One Node will discover these problems much sooner than a VMware
solution and take action to correct it.
Also, RAC One Node allows database and OS
patches or upgrades to be made without taking a complete database outage. RAC
One Node can migrate the database instance to another server, patches or
upgrades can be installed on the original server and then RAC One Node will
migrate the instance back. VMware offers a facility, Vmotion, that will do a
memory-to-memory transfer from one virtual machine to another. This DOES NOT
allow for any OS or other patches or upgrades to occur in a non-disruptive
fashion (an outage must be taken). It does allow for the hardware to be dusted and
vacuumed, however.
- Scalability: VMware allows
you to “scale” on a single physical server by instantiating additional virtual
machines – up to an 8-core limit per vm. RAC One Node allows online scaling by
migrating a RAC One Node implementation from one server to another, more
powerful server without taking a database outage. Additionally, RAC One Node allows
further scaling by allowing the RAC One Node to be online upgraded to a full
Real Application Clusters implementation by adding additional database
instances to the cluster thereby gaining almost unlimited scalability.
- Operational Flexibility and Standardization:
VMware only works on x86-based servers. RAC One Node will be available for all
of the platforms that Oracle Real Application Clusters supports including Linux,
Windows, Solaris, and AIX, HP-UX.
23. Can I
use Oracle RAC One Node for Standard Edition Oracle RAC?
No, Oracle RAC One Node is only part of
Oracle Database 11g Release 2 Enterprise Edition. It is not licensed or
supported for use with any other editions.
24. What
is RAC One Node Omotion?
Omotion is a utility that is distributed
as part of Oracle RAC One Node. The Omotion utility allows you to move the
Oracle RAC One Node instance from one node to another in the cluster. There are
several reasons you may want to move the instance such as the node is
overloaded so you need to balance the workload by moving the instance, or you
need to do some operating system maintenance on the node however you want to
eliminate the outage for application users by moving the instance to another
node in the cluster.
25. What
is Cluster Health Monitor (IPD/OS)?
This tool (formerly known as
Instantaneous Problem Detection tool) is designed to detect and analyze operating
system (OS) and cluster resource related degradation and failures in order to
bring more explanatory power to many issues that occur in clusters where Oracle
Clusterware and Oracle RAC are running such as node eviction.
It tracks the OS resource consumption at
each node, process, and device level continuously. It collects and analyzes the
cluster-wide data. In real time mode, when thresholds are hit, an alert is
shown to the operator.
For root cause analysis, historical data
can be replayed to understand what was happening at the time of failure.
26. What
OS does Cluster Health Monitor (IPD/OS) support?
Cluster Health Monitor (IPD/OS) is a
standalone tool that should be installed on all clusters where you are using
Oracle Real Application Clusters (RAC). It is independent of the Oracle
Database or Oracle Clusterware version used.
Cluster Health Monitor (IPD/OS) is
currently supported on Linux (requires Linux Kernel version greater than or
equal to 2.6.9) and Windows (requires at least Windows Server 2003 with service
pack 2).
It supports both, 32-bit and 64-bit
installations. The client installation requires the 32-bit Java SDK.
27. What is
Oracle’s goal in developing QoS Management?
QoS Management is a full Oracle stack
development effort to provide effective runtime management of datacenter SLAs
by ensuring when there are sufficient resources to meet all objectives they are
properly allocated and should demand or failures exceed capacity that the most
business critical SLAs are preserved at the cost of less critical ones.
28. What
type of applications does Oracle QoS Management manage?
QoS Management is currently able to
manage OLTP open workload types for database applications where clients or
middle tiers connect to the Oracle database through OCI or JDBC. Open workloads
are those whose demand is unaffected by increases in response time and are
typical of Internet-facing applications.
29. What
does QoS Management manage?
In datacenters where applications share
databases or databases share servers, performance is made up of the sum of the
time spent using and waiting to use resources. Since an application’s use of
resources is controlled during development, test, and tuning it cannot be
managed at runtime; however the wait for resources can. QoS Management manages
resource wait times.
30. What
types of resources does QoS Management manage?
Currently QoS Management manages CPU
resources both within a database and between databases running on shared or
dedicated servers. It also monitors wait times for I/O, Global Cache, and Other
database waits.
31. What
type of user interfaces does QoS Management support?
QoS Management is integrated into
Enterprise Manager Database Control 11g Release 2 and Enterprise Manager 12c
Cloud Control and is accessible from the cluster administration page.
32. What
QoS Management functionality is in Oracle Enterprise Manager?
Enterprise Manger supports the full range
of QoS Management functionality organized by task. A Policy Editor wizard
presents a simple workflow that specifies the server pools to manage; defines
performance classes that map to the database applications and associated SLAs
or objectives, and specifies performance policies that contain performance
objectives and relative ranking for each performance class and baseline server
pool resource allocations. An easy to monitor dashboard presents the entire
cluster performance status at a glance as well as recommended actions should
resources need to be re-allocated due to performance issues. Finally a set of
comprehensive graphs track the performance and metrics of each performance
class.
33. What
types of performance objectives can be set?
QoS Management currently supports
response time objectives. Response time objectives up to one second for
database client requests are supported. Additional performance objectives are
planned for future releases.
34. Does
QoS Management require any specific database deployment?
Oracle databases must be created as RAC
or RAC One Node Policy-Managed databases. This means the databases are deployed
in one or more server pools and applications and clients connect using
CRSmanaged database services. Each managed database must also have Resource
Manager enabled and be enabled for QoS Management. It is also recommended that
connection pools that support Fast Application Notification (FAN) events be
used for maximum functionality and performance management.
35. How is
Oracle RAC One Node licensed and priced?
Oracle RAC One Node is an option to the
Oracle Database Enterprise Edition and licensed based upon the number of CPU's
in the server on which it is installed. Current list price is $10,000 per CPU
(Check price list). Unlike the Oracle RAC feature, Oracle RAC One Node is not
available with the Oracle Standard Edition. Oracle RAC One Node licensing also
includes the 10-day rule, allowing a database to relocate to another node for
up to 10 days per year, without incurring additional licensing fees. This is
most often used in the case of failover, or for planned maintenance and
upgrading. Only one node in the cluster can be used for the 10-day rule.
36. Is
Oracle RAC One Node supported with 3rd party clusterware and/or 3rd party CFS?
No. Oracle RAC One Node is only supported
with version 11.2 (and above) of Oracle grid infrastructure.
37. How do
I check whether OCFS is properly configured?
You can use the component command 'cfs'
to check this. Provide the OCFS file system you want to check through the -f
argument. Note that, the sharedness check for the file sytem is supported for
OCFS version 1.0.14 or higher.
38. Is
there a way to verify that the Oracle Clusterware is working properly before
proceeding with RAC install?
Yes. You can use the post-check command
for cluster services setup(-post clusvc) to verify CRS status. A more
appropriate test would be to use the pre-check command for database
installation(-pre dbinst). This will check whether the current state of the
system is suitable for RAC install.
39. What
about discovery? Does CVU discover installed components?
At present, CVU discovery is limited to
these components. CVU discovers available network interfaces if you do not
specify any interface or IP address in its command line. For storage related
verification, CVU discovers all the supported storage types if you do not
specify a particular storage. CVU discovers CRS HOME if one is available.
40. How
does RAC One Node compare with traditional cold fail over solutions like HP
Serviceguard,
IBM HACMP,
Sun Cluster and Symantec, and Veritas Cluster Server?
RAC One Node is a better high
availability solution than traditional cold fail over solutions.
RAC One Node operates in a cluster but
only a single instance of the database is running on one node in the cluster.
If that database instance has a problem, RAC One Node detects that and can
attempt to restart the instance on that node. If the whole node fails, RAC One
Node will detect that and will bring up that database instance on another node
in the cluster. Unlike traditional cold failover solutions, Oracle Clusterware
will send out notifications (FAN events) to clients to speed reconnection after
failover. 3rd-party solutions may simply wait for potentially lengthy timeouts
to expire.
RAC One Node goes beyond the traditional
cold fail over functionality by offering administrators the ability to proactively
migrate instances from one node in the cluster to another. For example, lets
say you wanted to do an upgrade of the operating system on the node that the
RAC One Node database is running on. The administrator would activate
"OMotion," a new Oracle facility that would migrate the instance to
another node in the cluster. Once the instance and all of the connections have
migrated, the server can be shut down, upgraded and restarted. OMotion can then
be invoked again to migrate the instance and the connections
back to the now-upgraded node. This
non-disruptive rolling upgrade and patching capability of RAC One Node exceeds
the current functionality of the traditional cold fail over solutions.
Also, RAC One Node provides a load
balancing capability that is attractive to DBAs and Sys Admins. For example, if
you have two different database instances running on a RAC One Node Server and
it becomes apparent that the load against these two instances is impacting
performance, the DBA can invoke OMotion and migrate one of the instances to
another less-used node in the cluster. RAC One Node offers this load balancing
capability, something that the traditional cold fail over solutions do not.
Lastly,many 3rd-party solutions do not
support ASM storage. This can slow down failover, and prevent consolidation of
storage across multiple databases, increasing the management burden on the DBA.
The following table summarizes the
differences between RAC One Node and 3rd-party fail over solutions:
Feature RAC One Node EE
plus 3rd
Party
Clusterware
Out of the box experience RAC One Node
provides everything necessary to implement database failover. 3rd-party fail
over solutions require a separate install and a separate management
infrastructure.
Single Vendor RAC One Node is 100%
supported by Oracle EE is supported by Oracle, but the customer must rely on
the 3rd-party to support their clusterware.
Fast failover RAC One Node supports FAN Events,
to send notifications to clients after failovers and to speed re-connection. 3rd-party
fail over solutions rely on timeouts for clients to detect failover and initiate
a reconnection. It could take several minutes for a client to detect there had
been a failover.
Rolling DB patching, OS, Clusterware, ASM
patching and upgrades RAC One Node can migrate a database from one server to
another to enable online rolling patching.
Most connections should migrate with no disruption. 3rd-party solutions must be
failed over
from one node to another, which means all
connections will be dropped and must reconnect. Some transactions will be dropped
and must reconnect. Reconnection could take several minutes.
Workload Management RAC One Node can
migrate a database from one server to another while online to enable load balancing
of databases across servers in the cluster. Most connections should migrate
with no disruption. 3rd-party solutions must be failed over
from one node to another, which means all
connections will be dropped and must reconnect. Some transactions will be dropped
and must reconnect. Reconnection could take several minutes.
Online scale out Online upgrade to
multi-node RAC Complete reinstall including Oracle Grid Infrastructure is
required.
Standardized tools and processes RAC and
RAC One Node use the same tools, management interfaces, and processes. EE and
RAC use different tools, management interfaces, and processes. 3rd-party
clusterware requires additional interfaces.
Storage virtualization RAC One Node
supports use of ASM to virtualize and consolidate storage. Because it’s shared across
nodes, it eliminates the lengthy failover of volumes and file systems. Traditional
3rd-party solutions rely on local file systems and volumes that must be failed
over. Large volumes can take a long time to fail over. Dedicated storage is
also more difficult to manage.
41. How
does RAC One Node compare with a single instance Oracle Database protected with
Oracle
Clusterware?
Feature RAC One Node EE
plus Oracle Clusterware
Out of the box
experience RAC
One Node is a
complete solution that
provides everything
necessary to implement a
database protected from
failures by a failover
solution.
Using Oracle Clusterware to protect
an EE database is possible by
customizing some sample scripts
we provide to work with EE. This
requires custom script development
by the customer, and they need to
set up the environment and install
the scripts manually.
Supportability RAC One Node is 100%
supported
While EE is 100% supported, the
scripts customized by the customer
are not supported by Oracle.
DB Control support RAC One Node fully
supports failover of DB
Control in a transparent
manner
DB Control must be reconfigured
after a failover (unless the customer
scripts are modified to support DB
Control failover)
Rolling DB patching, OS,
Clusterware, ASM
patching
and upgrades
RAC One Node can online
migrate a database from one
server to another to enable
online rolling patching. Most
connections should migrate
with no disruption
EE must be failed over from one
node to another, which means all
connections will be dropped and
must reconnect. Some transactions
will be dropped and must
reconnect. Reconnection could take
several minutes.
Workload Management RAC One Node can online
migrate a database from one
server to another to enable
load balancing of databases
across servers in the cluster.
Most connections should
migrate with no disruption
EE must be failed over from one
node to another, which means all
connections will be dropped and
must reconnect. Some transactions
will be dropped and must
reconnect. Reconnection could take
several minutes.
Online scale
out
Online upgrade to multi-node
RAC
Take DB outage and re-link to
upgrade to multi-node RAC, re-start
DB.
Standardized tools and
processes
RAC and RAC One Node
use same tools,
management interfaces, and
processes
EE and RAC use different tools,
management interfaces, and
processes
42. What
is Oracle Real Application Clusters One Node (RAC One Node)?
Oracle RAC One Node is an option
available with Oracle Database 11g Release 2. Oracle RAC One Node is a single
instance of Oracle RAC running on one node in a cluster. This option adds to
the flexibility that Oracle offers for reducing costs via consolidation. It
allows customers to more easily consolidate their less mission critical, single
instance databases into a single cluster, with most of the high availability
benefits provided by Oracle Real Application Clusers (automatic
restart/failover, rolling
patches, rolling OS and clusterware
upgrades), and many of the benefits of server virtualization solutions like VMware.
RAC One Node offers better high
availability functionality than traditional cold failover cluster solutions because
of a new Oracle technology Omotion, which is able to intelligently relocate
database instances and connections to other cluster nodes for high availability
and system load balancing.
43. If I
add or remove nodes from the cluster, how do I inform RAC One Node?
You must re-run raconeinit to update the
candidate server list for each RAC One Node Database.
44. Is RAC
One Node supported with database versions prior to 11.2?
No. RAC One Node requires at least
version 11.2 of Oracle Grid Infrastructure, and the RAC One Node database must
be at least 11.2. Earlier versions of the rdbms can coexist with 11.2 RAC One
Node databases.
45. How do
I get Oracle Real Application Clusters One Node (Oracle RAC One Node)?
Oracle RAC One Node is only available
with Oracle Database 11g Release 2. Oracle Grid Infrastructure for 11g Release
2 must be installed as a prerequisite. Download and apply Patch 9004119 to your
Oracle RAC 11g Release 2 home in order to obtain the code associated with RAC
One Node. (this patch was released after 11.2.0.1 was released and is only
available for Linux). Support for other platforms will be added with 11.2.0.2.
46. Does
Enterprise Manager Support RAC One Node?
Yes, you can use Enterprise Manager DB
Console to manage RAC One Node databases. Note that in 11.2.0.1, when you run
raconeinit, the instance name is changed. you should either configure EM DB Console
after running raconeinit, and after every instance relocation (Omotion) or
failover, the EM DB Console will need to be reconfigured to see the new
instance on the new node. This can be done using emca and is the same as with
adding any new DB to the configuration. In the future, 11.2.0.2, EM will support
RAC One Node database out of the box. so EM will be able to detect when the
instance is migrated or failed over to another node.
47. How
does RAC One Node compare with database DR products like DataGuard or Golden
Gate?
The products are entrely complementary.
RAC One Node is designed to protect a single database. It can be used for
rolling database patches, OS upgrades/patches, and grid infrastructure
(ASM/Clusterware) rolling upgrades and patches. This is less disruptive than
switching to a datbase replica. Switching to a replica for patching, or for
upgrading the OS or grid infrastructure requires that you choose to run
Active/Active (and deal with potential conflicts) or Active/Passive (and wait
for work on the active primary database to drain before allowing work on the
replica). You need to make sure replication supports all data types you are
using. You need to make sure the replica can keep up with your load. You need
to figure out how to re-point your clients to the replica (not an issue with
RAC One Node because it's the same database, and we use VIPs). And lastly, RAC
One Node allows a spare node to be used 10 days per year without licensing. Our
recommendation is to use RAC or RAC One Node to protect from local failures and
to support rolling maintenance activities. Use Data Guard or replication
technology for DR, data protection, and for rolling database upgrades. Both are
required as part of a comprehensive HA solution.
48. How do
I install the command line tools for RAC One Node?
The command line tools are installed when
you install the RAC One Node patch 9004119 on top of 11.2.0.1.
49. Are we
certifying applications specifically for RAC One Node?
No. If the 3rd party application is
certified for Oracle Database 11g Release 2 Enterprise Edition, it is certified
for RAC One Node.
50. How do I check the Oracle
Clusterware stack and other sub-components of it?
Cluvfy provides commands to check a
particular sub-component of the CRS stack as well as the whole CRS stack. You
can use the 'comp ocr' command to check the integrity of OCR. Similarly,
you can use 'comp crs' and 'comp clumgr' commands to check
integrity of crs and clustermanager sub-components. To check the entire CRS
stack, run the stage command 'clucvy stage -post crsinst'.
No comments:
Post a Comment