341. interfaces to provide High Availability and/or Load
Balancing for my interconnect with Oracle Clusterware?
Windows -
available solutions:
Teaming
On Windows
teaming solutions to ensure NIC availability are usually part of the network
card driver. Thus, they depend on the network card used. Please, contact the
respective hardware vendor for more information.
OS
independent solution:
Redundant
Interconnect Usage enables load-balancing and high availability across multiple
(up to four) private networks (also known as interconnects).
Oracle RAC
11g Release 2, Patch Set One (11.2.0.2) enables Redundant Interconnect Usage as
a feature for all platforms, except Windows.
On systems
that use Solaris Cluster, Redundant Interconnect Usage will use clprivnet.
342. Is there a need to renice LMS processes in Oracle RAC
10gRelease 2?
LMS
processes should be running in RT by default since 10.2, so there's NO need to
renice them, or otherwise mess with them.
Check with ps -efl:
0 S spommere 31191 1 0 75 0 - 270857 - 10:01 ?
00:00:00 ora_lmon_appsu01
0 S spommere 31193 1 5 75 0 - 271403 - 10:01 ?
00:00:07 ora_lmd0_appsu01
0 S spommere 31195 1 0 58 - - 271396 - 10:01 ?
00:00:00 ora_lms0_appsu01
0 S spommere 31199 1 0 58 - - 271396 - 10:01 ?
00:00:00 ora_lms1_appsu01
7th column, if it is 75 or 76 then this is Time
Share, 58 is Real Time.
You can also use chrt to check:
LMS (Real Time):
$ chrt -p 31199
pid 31199's current scheduling policy: SCHED_RR
pid 31199's current scheduling priority: 1
LMD (Time Share)
$ chrt -p 31193
pid 31193's current scheduling policy: SCHED_OTHER
pid 31193's current scheduling priority: 0
343. How do I
check for network problems on my interconect?
1. Confirm
that full duplex is set correctly for all interconnect links on all interfaces
on both ends.Do not rely on auto negotiation.
2. ifconfig
-a will give you an indication of collisions/errors/overuns and dropped packets
3. netstat
-s will give you a listing of receive packet discards, fragmentation and
reassembly errors for IP and UDP.
4. Set the
udp buffers correctly
5. Check
your cabling
Note: If you
are seeing issues with RAC, RAC uses UDP as the protocol. Oracle Clusterware uses
TCP/IP.
344. How to use VLANs in Oracle RAC?
It is
Oracle's standing recommendation to separate the various types of communication
in an Oracle RAC cluster as much as possible. This general recommendation is
the basis for the following separation of communication:
Each node
in an Oracle RAC cluster must have at least one public network.
Each node
in an Oracle RAC cluster must have at least one private network, also referred
to as "interconnect".
Each node
in an Oracle RAC cluster must have at least an additional network interface, if
the shared storage is accessed using a network based connection.
In addition
Oracle RAC and Oracle Clusterware deployment best practices recommend that the interconnect
be deployed on a stand-alone, physically seperate, dedicated switch, since it
represents the easiest to configure and most secure as well as stable
configuration. Many customers, however, have consolidated or prefer to
consolidate these stand-alone switches into larger managed switches.
Depending on
the level of consolidation that is performed on the switch level, the switch
thereby may become a single point of failure. Hardware redundancy within an
enterprise switch may mitigate some of the risks, but there are limitations as
far as maintenance operations are concerned. Mainaining switch redundancy is therefore
highly recommended. Another consequence of this consolidation is a merging of
IP networks on a single shared switch, segmented by VLANs in various levels,
which include, but are not limited to:
Sharing
the same switch (and network channel) for private and public communication
Sharing
the same switch (and network channel) for the private communication of more
than one cluster.
Sharing
the same switch (and network channel) for private communication and shared
storage access.
While an
increasingly powerful network infrastructure makes it more and more interesting
for customers to consolidate network communication on fewer physical networks,
it needs to be remembered that the latency and bandwidth requirements as well
as availability requirements of the Oracle RAC / Oracle Clusterware interconnect
IP network are more in-line with high performance computing. In a more abstract
way, one should not look at the interconnect as a network, but rather as a
backplane to connect the memory of the cluster nodes.
While
observing the bandwidth requirements, Oracle generally recommends maintaining a
1:1 relation when VLANs are used in any possible way and if the usage of VLANs
cannot be avoided. In this context, it needs to be noted that bandwidth and
latency are not the only concerns. Security, ease of management, and unintended
but possible side-effects of using a shared resource such as multicast flooding
or spanning tree re-convergence also need to be considered. In detail:
Sharing the
same switch (and network channel) for private and public communication
and
deploying the interconnect on a VLAN in this environment, there should be a 1:1
mapping of the VLAN to a non-routable subnet and the VLAN should not span
multiple VLANs (tagged) or multiple switches.
Sharing the
same switch (and network channel) for the private communication of more than
one cluster,
one VLAN
per cluster is recommended for the purpose of a "cleaner" management
and security
Further
consolidation, such as using only one VLAN for all clusters, is supported, but
not recommended.
It is
supported to use the same, consolidated network infrastructure (within the same
security domain) for various clusters without the use of VLANs, while separated
channels are recommended.
Sharing the
same switch (and network channel) for private communication and shared storage access
is
supported, if the underlying network infrastructure recognizes and prioritizes
network based communication to the storage.
345. Are there any issues for the interconnect when sharing the
sameswitch as the public network by using VLAN to separate thenetwork?
RAC and
Clusterware deployment best practices recommend that the interconnect be
deployed on a stand-alone, physically seperate, dedicated switch. Many
customers have consolidated these stand-alone switches into larger managed
switches. A consequence of this consolidation is a merging of IP networks on a
single shared switch, segmented by VLANs. There are caveats
associated
with such deployments. RAC cache fusion exercises the IP network more
rigorously than non-RAC Oracle databases. The latency and bandwidth
requirements as well as availability requirements of the RAC/Clusterware
interconnect IP network are more in-line with high performance computing.
Deploying the RAC/Clusterware interconnect on a shared switch, segmented VLAN
may expose the interconnect links to congestion and instability in the larger
IP network topology. If deploying the interconnect on a VLAN, there should be a
1:1 mapping of VLAN to non-routable subnet and the VLAN should not span
multiple VLANs (tagged) or multiple switches. Deployment concerns in this
environment include Spanning Tree loops when the larger IP network topology
changes, Assymetric routing that may cause packet flooding, and lack of fine grained
monitoring of the VLAN/port.
346. Are jumbo
frames supported for the RAC interconnect?
Yes. For
details see Note:341788.1 Cluster Interconnect and Jumbo Frames
347. We are using Transparent Data Encryption (TDE).We create a
wallet on node 1 and copy to nodes 2 & 3. Open thewallet and we are able to
select encrypted data on all three nodes. Now, we want to REKEY the MASTER KEY.
What do we have todo?
After a
re-key on node one, 'alter system set wallet close' on all other nodes, copy
the wallet with the new master key to all other nodes, 'alter system set wallet
open identified by "password"; on all other nodes to load the
(obfuscated) master key into node's SGA.
348. Why does the NOAC attribute need to be set on NFS mounted
RACBinaries?
The noac
attribute is required because the installer determines sharedness by creating a
file and checking for that fileıs existance on remote node. If the noac
attribute is not enabled then this test will incorrectly fail. This will
confuse installer and opatch. Some other minor issues with spfile in the
default $ORACLE_HOME/dbs will definitely be affected.
349. How do I
use DBCA in silent mode to set up RAC and ASM?
If I already
have an ASM instance/diskgroup then the following creates a RAC database on
that diskgroup:
su oracle -c
"$ORACLE_HOME/bin/dbca -silent -createDatabase -templateName General_Purpose.dbc
-gdbName $SID -sid $SID -sysPassword $PASSWORD -systemPassword $PASSWORD
-sysmanPassword $PASSWORD -dbsnmpPassword $PASSWORD -
emConfiguration
LOCAL -storageType ASM -diskGroupName $ASMGROUPNAME - datafileJarLocation
$ORACLE_HOME/assistants/dbca/templates -nodeinfo $NODE1,$NODE2 - characterset
WE8ISO8859P1 -obfuscatedPasswords false -sampleSchema false -oratabLocation /etc/oratab"
The
following will create a ASM instance & 1 diskgroup
su oracle -c
"$ORA_ASM_HOME/bin/dbca -silent -configureASM -gdbName NO -sid NO -emConfiguration
NONE -diskList $ASM_DISKS -diskGroupName $ASMGROUPNAME -datafileJarLocation
$ORACLE_HOME/assistants/dbca/templates -nodeinfo $NODE1,$NODE2 -
obfuscatedPasswords
false -oratabLocation /etc/oratab -asmSysPassword $PASSWORD -redundancy
$ASMREDUNDANCY"
where
ASM_DISKS = '/dev/sda1,/dev/sdb1' and ASMREDUNDANCY='NORMAL'
350. How does
OCR mirror work? What happens if my OCR is lost/corrupt?
OCR is the
Oracle Cluster Registry, it holds all the cluster related information such as
instances, services. The OCR file format is binary and starting with 10.2 it is
possible to mirror it. Location of file(s) is located in: /etc/oracle/ocr.loc in ocrconfig_loc and ocrmirrorconfig_loc variables. Obviously if
you only have one copy of the OCR and it is lost or corrupt then you must
restore a
recent
backup, see ocrconfig utility for details, specifically -showbackup and
-restore flags. Until a valid backup is restored the Oracle Clusterware will
not startup due to the corrupt/missing OCR file.
The
interesting discussion is what happens if you have the OCR mirrored and one of
the copies gets corrupt?
You would
expect that everything will continue to work seemlessly. The real answer
depends on when the corruption takes place.
--If the
corruption happens while the Oracle Clusterware stack is up and running, then
thecorruption will be tolerated and the Oracle Clusterware will continue to
funtion withoutinterruptions. Despite the corrupt copy. DBA is advised to
repair this hardware/software problemthat prevent OCR from accessing the device
as soon as possible; alternatively, DBA can replacethe failed device with
another healthy device using the ocrconfig utility with -replace flag.
--If however
the corruption happens while the Oracle Clusterware stack is down, then it will
not be possible to start it up until the failed device becomes online again or
some administrative action usingocrconfig utility with -overwrite flag is
taken. When the Clusteware attempts to start you will see messages similar to:
total id sets
(1), 1st set (1669906634,1958222370), 2nd set (0,0) my
votes (1), total
votes (2)
2006-07-12
10:53:54.301: [OCRRAW][1210108256]proprioini:disk 0
(/dev/raw/raw1)
doesn't have enough votes (1,2)
2006-07-12
10:53:54.301: [OCRRAW][1210108256]proprseterror: Error in
accessing
physical storage [26]
This is
because the software can't determin which OCR copy is the valid one. In the
above example one of the OCR mirrors was lost while the Oracle Clusterware was
down. There are 3 ways to fix this failure:
a) Fix
whatever problem (hardware/software?) that prevent OCR from accessing the
device.
b) Issue
"ocrconfig
-overwrite"
on any one of the nodes in the cluster. This command will overwrite the vote
check built into OCR when it starts up. Basically, if OCR device is configured with
mirror, OCR assign each device with one vote. The rule is to have more than 50%
of total vote (quorum) in order to safely make sure the available devices
contain the latest data. In 2-way mirroring, the total vote count is 2 so it
requires 2 votes to achieve the quorum. In the example above there isn't enough
vote to start if only one device with one vote is available. (In the earlier example,
while OCR is running when the device is down, OCR assign 2 vote to the
surviving device and that is why this surviving device now with two votes can
start after the cluster is down). See warning below
c) This
method is not recommend to be performed by customers. It is possible to
manually modify ocr.loc to delete the failed device and restart the cluster.
OCR won't do the vote check if the mirror is not configured. See warning below
EXTREME
CAUTION should be excersized if chosing option b or c above since data loss can
occur if the wrong file is manipulated, please contact Oracle Support for
assistance before proceeding.
351. If I use
Services with Oracle RAC, do I still need to set up Load Balancing ?
Yes,
Services allow you granular definition of workload and the DBA can dynamically
define which instances provide the service. Connection Load Balancing (provided
by Oracle NetServices) still needs to be set up to allow the user connections
to be balanced across allinstances providing a service. With Oracle RAC 10g
Release 2 or higher, set the CLB_GOAL on service to define the type of load
balancing you want, SHORT for short lived connections (IE connection pool) or
LONG (default) for applciations that have connections active for long
periods(IE Oracle Forms applicaiton).
352. What is
CLB_GOAL and how should I set it?
CLB_GOAL is
the connection load balancing goal for a service. There are 2
options,CLB_GOAL_SHORT and CLB_GOAL_LONG (default).Long is for applications
that have long-lived connections. This is typical for connection pools and SQL*Forms
sessions. Long is the default connection load balancing goal.
Short is for
applications that have short-lived connections. The GOAL for a service can be
set with EM or DBMS_SERVICE.
Note: You must still
configure load balancing with Oracle Net Services
353.How can a customer mask the change in their clustered
database configuration from their client or application? (I.E. So I do not
haveto change the connection string when I add a node to the RACdatabase)
The
combination of Server Side load balancing and Services allows you to easily
mask cluster database configuration changes. As long as all instances register
with all listeners (use the LOCAL_LISTENER and REMOTE_LISTENER parameters),
server side load balancing will allow clients to connect to the service on
currently available instances at connect time.
The load
balancing advisory (setting a goal on the service) will give advice as to how
many connections to send to each instance currently providing a service. When a
service is enabled on an instance, as long as the instance registers with the
listeners, the clients can start getting connections to the service and the
load balancing advisory will include that instance is its advice.
With Oracle
RAC 11g Release 2, the Single Client Access Name (SCAN) provides a single name
to be put in the client connection string (as the address). Clients using SCAN
never have to change even if the cluster configuration changes such as adding
nodes.
354. After executing DBMS_SERVICE.START_SERVICE, the serviceresource
remains OFFLINE status when confirming it with crs_stat.Is that expected
behavior ?
YES this is
expected behaviour. Unfortunately, the DBMS_SERVICE.START_SERVICE does not update the clusterware
until 11g Release 2. You should use srvctl start service -d dbname then you should see it come online.
Note: With Oracle RAC 11g
Release 2, the cluster resource for a Service, contains the values for all the attributes
of a service. Oracle Clusterware will update the database with its values when
it starts a service. In order to save modifications across restarts, all
service modifications should be made with srvctl (or Oracle Enterprise
Manager).
355. Is it possible to use SVRCTL start database with a user
accountother than oracle ( that is other than the owner of the oraclesoftware)?
YES. When
you create a RAC db as a user different than the home/software owner (oracle)
user, the db creation assistant would set the correct permissions/ACLs on the
CRS resources that control the db/instances etc, assuming that you had setup
group membership for this user to the dba group of the home (find it using
oracle_home/bin/osdbagrp) and also part of the crs home owners primary group
(usually oinstall) and there was group write permission on the oracle_home.
356. I am using shared services which the following set in
init.ora SQL>show parameters
dispatchers=(protocol=TCP)(listener=listeners_nl01)(con=500)(serv=oltp). I stopped
my service with srvctlstop service but it is still registered with the listener
and accepting connections. Is this expected?
YES. This is
by design of dispatchers which are part of Oracle Net Services. If you specify
the service attribute of the dispatchers init.ora parameter, the service
specified cannot be managed by the dba.
357. Why am I seeing the following warnings in my listener.log
for myRAC 10g environment?
WARNING: Subscription for node down event still pending
This message
indicates that the listener was not able to subscribe to the ONS events which
it uses to do the connection load balancing. This is most likely due to
starting the listener using lsnrctl from the database home. When you start the
listener using lsnrctl, make sure you have set the environment variable
ORACLE_CONFIG_HOME = {Oracle Clusterware HOME}, also set it in
racgwrap in
the $ORACLE_HOME/bin for the database.
358. Will FAN
work with SQLPlus?
Yes with
Oracle RAC 11g, you can specify the -F (FAILOVER) option. This enables SQL*Plus
to interact with the OCI failover mode in a Real Application Cluster (RAC)
environment. In this modea service or instance failure is transparently handled
with transaction status messages if applicable.
359. Can I use TAF and FAN/FCF?
With Oracle
Database 10g Release 1, NO. With Oracle Database 10g Release 2, the answer is
YES for OCI and ODP.NET, it is recommended. For JDBC, you should not use TAF
and FCFeven with the Thick JDBC driver.
360. What are
the changes in memory requirements from moving from single instance to RAC?
If you are
keeping the workload requirements per instance the same, then about 10% more
buffer cache and 15% more shared pool is needed. The additional memory
requirement is due to data structures for coherency management. The values are
heuristic and are mostly upper bounds. Actual resource usage can be monitored
by querying current and maximum columns for the gcs
resource/locks
and ges resource/locks entries in V$RESOURCE_LIMIT.
But in
general, please take into consideration that memory requirements per instance
arereduced when the same user population is distributed over multiple nodes. In
this case:
Assuming the
same user population N number of nodes M buffer cache for a single system then
(M / N) +
((M / N )*0.10) [ + extra memory to compensate for failed-over users ]
Thus for
example with a M=2G & N=2 & no extra memory for failed-over users
=( 2G / 2 )
+ (( 2G / 2 )) *0.10 =1G + 100M
361. What is Runtime Connection Load Balancing?
Runtime
connection load balancing enables the connection pool to route incoming work
requests to the available database connection that will provide it with the
best service. This will provide the best service times globally, and routing
responds fast to changing conditions in the system. Oracle has implemented
runtime connection load balancing with ODP.NET and JDBC connection pools.
Runtime Connection Load Balancing is tightly integrated with the automatic workload
balancing features introduced with Oracle Database 10g I.E. Services, Automatic
Workload Repository, and the new Load Balancing Advisory.
To enable and use
run-time connection load balancing, the connection goal must be set to SHORT
and either of the following service-level goals must be set:
· SERVICE_TIME—The Load Balancing
Advisory attempts to direct work requests to instances according to their
response time. Load Balancing Advisory data is based on the elapsed time for
work done by connections using the service, as well as available bandwidth to
the service. This goal is best suited for workloads that require varying
lengths of time to complete, for example, an internet shopping system.
· THROUGHPUT—The Load Balancing
Advisory measures the percentage of the total response time that the CPU
consumes for the service. This measures the efficiency of an instance, rather
than the response time. This goal is best suited for workloads where each work
request completes in a similar amount of time, for example, a trading system.
Client-side load
balancing balances the connection requests across the listeners by setting the
parameter ‘LOAD_BALANCE=ON’ directive. When you set this parameter to ON,
Oracle Database randomly selects an address in the address list, and connects
to that node's listener. This balances client connections across the available
SCAN listeners in the cluster. When clients connect using SCAN, Oracle Net
automatically load balances client connection requests across the three IP
addresses you defined for the SCAN, unless you are using EZConnect.
362. How do I enable the load balancing advisory?
The load
balancing advisory requires the use of services and Oracle Net connection load balancing.
To enable it, on the server: set a goal (service_time or throughput, and set
CLB_GOAL=SHORT ) on your service.
For client,
you must be using the connection pool.
For JDBC,
enable the datasource parameter FastConnectionFailoverEnabled.
For ODP.NET
enable the datasource parameter Load Balancing=true.
To enable the load
balancing advisory, use the ‘-B’ option when creating or modifying the service
using the ‘srvctl’ command.
363. How do I measure the bandwidth utilization of my NIC or
myinterconnect?
A more
reliable, interactive way on Linux is to use the iptraf utility or the prebuilt rpms
from redhat or Novell (SuSE), another option on Linux is Netperf . On other Unix
platforms: "snoop -S -tr -s 64 -d hme0", AIX's topaz can show that as well.. Try to look for the
peak (not average) usage and see if that is acceptably fast.
Remember
that NIC bandwidth is measured in Mbps or Gbps (which is BITS per second) and output
from above utilities can sometimes come in BYTES per second, so for comparison,
do proper conversion (divide bps value by 8 to get bytes/sec; or, multiple
bytes value by 8 to get bps value).
One
simple/quick and not very recommended way is to look at output of "ifconfig eth0" and compare
values of "RX bytes" and "TX bytes" over time this will
show _average_ usage per period of time.
Additionally,
you can't expect a network device to run at full capacity with 100% efficiency,
due to concurrency, collisions and retransmits that happens more frequently as
the utilization gets higher. If you are reaching high levels consider a faster
interconnect or NIC bonding (multiple NICs all servicing the same IP address).
Finally,
above is measuring bandwidth utilization (how much), not latency (how fast) of
the interconnect, you may still be suffering from high latency connection (slow
link) even though there is plenty of bandwidth to spare. Most experts agree
that low latency is by far more important than a high bandwidth with respect to
specifications of the private interconnect in RAC. Latency is best measured by
the actual user of the network link (RAC in this case), review statspack for
stats on latency. Also, in 10gR2 Grid Control you can view Global Cache Block
Access Latency, you can also drill down to the Cluster Cache Coherency page to
see the cluster
cache
coherency metrics for the entire cluster database.
Keep in mind
that RAC is using the private interconnect like it was never used before, to
synchronize memory regions (SGAs) of multiple nodes (remember, since 9i, entire
data blocks are shipped accross the interconnect), if the network is utilized
at 50% bandwidth, this means that 50% of the time it is busy and not available
to potential users. In this case delays (due to collisions and concurrency)
will increase the latency even though the bandwidth might look
"reasonable", it's hiding the real issue.
364. Does
Database blocksize or tablespace blocksize affect how thedata is passed across
the interconnect?
Oracle ships
database block buffers, i.e. blocks in a tablespace configured for 16K will
result in a 16K data buffer shipped, blocks residing in a tablespace with base
block size (8K) will be shipped as base blocks and so on; the data buffers are
broken down to packets of MTU sizes.
365. Does RAC
work with NTP (Network Time Protocol)?
YES! NTP and
Oracle RAC are compatible, as a matter of fact, it is recommended to setup NTP
in an Oracle
RAC cluster,
for Oracle 9i Database, Oracle Database 10g, and Oracle Database 11g Release 1.
With Oracle
Database 11g Release 2, Oracle Clusterware includes the Cluster Time
Synchronization Service (CTSS). On startup, Oracle Clusterware checks for a NTP
configuration, if found, CTSS goes into Observer mode. This means it will
monitor the clock synchronization and report in the Oracle Clusterware alert
log if it finds a problem. If it does not find a NTP configuration, CTSS will
be active. In active mode, CTSS synchronizes all the system clocks to the first
node in the cluster.
The Oracle
Clusterware requires the use of "-x" flag to the ntpd daemon to
prevent the clock from going backwards (Enterprise Linux: see /etc/sysconfig/ntpd; Solaris:
set "slewalways yes" in /etc/inet/ntp.conf)
"Node
Time Requirements
Before
starting the installation, ensure that each member node of the cluster is set
as closely as possible to the same date and time. Oracle strongly recommends using
the Network Time Protocol feature of most operating systems for this purpose,
with all nodes using the same reference Network Time Protocol server."
Each machine
has a different clock frequency and as a result a slightly different time
drift. NTP computes this time drift every about 15 minutes, and stores this
information in a "drift" file, it then adjusts the system clock based
on this known drift as well as compares it to a given time-server the
sys-admins sets up. This is the recommended approach.
Keep the
following points in mind:
Minor
changes in time (in the seconds range) are harmless for Oracle RAC and the
Oracle Clusterware. If you intend on making large time changes it is best to
shutdown the instances and the entire Oracle Clusterware stack on that node to
avoid a false eviction, especially if you are using the Oracle RAC 10g
lowbrownout patches, which allow really low misscount settings.
Backup/recovery
aspect of large time changes are documented in Note: 77370.1, basically you
can't use RECOVER DATABASE UNTIL TIME to reach the second recovery point, It is
possible to overcome with RECOVER DATABASE UNTIL CANCEL or UNTIL CHANGE. If you
are doing complete recovery (most of the times) then this is not an issue since
the Oracle recovery code uses SCN (System Change Numbers) to advance in the
redo/archive logs. The SCN numbers never go back in time (unless a reset-logs
operation is performed), there is always an association of an SCN to a human
readable timestamp (which may change forward or backwards), hence the issue
with recovery until point in time vs. until SCN/Cancel.
If
DBMS_SCHEDULER is in usage it will be affected by time changes, as it's using
actual clock rather than SCN.
On
platforms with OPROCD get fix for <> "OPROCD REBOOTS NODE WHEN TIME
IS SET BACK BY XNTPD"
If NTP is
not configured correctly (using -x flag), and diagwait not set to 13 Note:
559365.1 10.2/11.1 RAC systems can be rebooted due to OPROCD, during a leap
second event, see Note: 759143.1.
Daylight
saving time adjustments do not affect the system clock, only the displayed
time, hence have no impact on the Oracle software.
Apart from
these issues, the Oracle RDBMS server is immuned to time changes, i.e. will not
affect transaction/read consistency operations.
366. How do I
determine whether or not an OneOff patch is "rollingupgradeable"?
After you have downloaded a patch, you can go
into the directory where you unpacked the patch:
>pwd
/ora/install/4933522
Then use the
following OPatch command:
>opatch
query -is_rolling
...
Query ...
Please enter
the patch location:
/ora/install/4933522
----------
Query starts ------------------
Patch ID:
4933522
....
Rolling
Patch: True.
----------
Query ends -------------------
367. I have 2 clusters named "crs" (the default), how
do I get GridControl to recognize them as targets?
There are 2 options:
a) if the
grid control agent install (which is a separate install) has already been done
and has picked up the name of the cluster as it was configured as CRS, one can
go to the EM console as is, and for the second, manually delete and rediscover
the target. When you rediscover the target, give whatever display name you like
b) Prior to
performing the Grid control agent install, just set CLUSTER_NAME environment variable
and run the install. This variable need to be set only for that install
session. No need to set it every time agent starts.
368. When I look at ALL_SERVICES view in my database I see
services Idid not create, what are they for?
You will
always see a default database service that is the same name as your database.
This service is available on all instances in the cluster. You will also see
two services used by the database SYS$BACKGROUND (for background processes) and
SYS$USERS (users who connect via BEQ or without using a service_name). You may
also see services that end with XDB
which are
created for the XML DB feature and you will not be able to manage these
services.
369. Can you
have multiple RAC $ORACLE_HOME's on Linux?
No, there
should be only one Oracle Cluster Manager (ORACM) running on each node. All RAC
databases should run out of the $ORACLE_HOME that ORACM is installed in.
370. Is the hangcheck timer still needed with Oracle RAC 10g and
11g?
YES! The
hangcheck-timer module monitors the Linux kernel for extended operating system hangs
that could affect the reliability of the RAC node ( I/O fencing) and cause
database corruption. To verify the hangcheck-timer module is running on every
node:
as root user:
/sbin/lsmod |
grep hangcheck
If the
hangcheck-timer module is not listed enter the following command as the root
user:
9i:
/sbin/insmod hangcheck-timer hangcheck_tick=30 hangcheck_margin=180 hangcheck_reboot=1
10g & 11g:
/sbin/insmod hangcheck-timer hangcheck_tick=1 hangcheck_margin=10
hangcheck_reboot=1
To ensure
the module is loaded every time the system reboots, verify that the local
system startup file (/etc/rc.d/rc.local) contains the command above.
371. Customer did not load the hangcheck-timer before installing
RAC,Can the customer just load the hangcheck-timer ?
YES.
hangcheck timer is a kernel module that is shipped with the Linux kernel, all
you have to do is load it as follows:
9i:
/sbin/insmod hangcheck-timer hangcheck_tick=30 hangcheck_margin=180 hangcheck_reboot=1
10g & 11g:
/sbin/insmod hangcheck-timer hangcheck_tick=1 hangcheck_margin=10
hangcheck_reboot=1
No need to
reboot the nodes.
372. When I try to login to the +ASM2 on node2 with asmcmd
(aftersetting ORACLE_HOME and ORACLE_SID correctly) I get: ORA-01031:
insufficient privileges (DBD ERROR: OCI SessionBegin). When I try to login to
+ASM2 using sqlplus (connect / as sysdba) Iget the same ORA-01031: insufficient
privileges. When I try to loginto +ASM2 using sqlplus (connect sys/passwd as
sysdba) I get connected successfully.
This sounds
like the ORA_DBA group on Node2 is
empty, or else does not have the correct username in it. Double-check what user
account you are using to logon to Node2 as ( a 'set' command will show you the
USERNAME and USERDOMAIN values) and then make sure that this account is part of
ORA_DBA.
The other
issue to check is that SQLNET.AUTHENTICATION_SERVICES=(NTS) is set in the SQLNET.ORA
373. How to
move the OCR location ?
For Oracle
RAC 10g Release 1
- stop the
CRS stack on all nodes using "init.crs stop"
- Edit /var/opt/oracle/ocr.loc on all nodes and set up
ocrconfig_loc=new OCR device
- Restore
from one of the automatic physical backups using ocrconfig -restore.
- Run
ocrcheck to verify.
- reboot to
restart the CRS stack.
For Oracle
RAC 10g Release 2 or later Please use the OCR command to replace the OCR with
the new location:
# ocrconfig -replace ocr
/dev/newocr
# ocrconfig -replace ocrmirror
/dev/newocrmirror
Manual
editing of ocr.loc or equivalent is not recommended, and will not work.
374. Is it
supported to rerun root.sh from the Oracle Clusterwareinstallation ?
Rerunning
root.sh after the initial successful install of the Oracle Clusterware is
expressly discouraged and unsupported. We strongly recommend not doing it.
In case
where root.sh is failing to execute for the on an initial install (or a new
node joining an existing cluster), it is OK to re-run root.sh after the cause
of the failure is corrected (permissions, paths, etc.). In this case, please
run rootdelete.sh to undo the local effects of root.sh before rerunning root.sh.
375. When ct run the command 'onsctl start' receives the message
"Unable to open libhasgen10.so".
Any idea why the message "unable to open
libhasgen10.so" ?
Most likely
you are trying to start ONS from ORACLE_HOME instead of Oracle Clusterware (or
Grid Infrastructure in 11.2) home. Please try to start it from the Oracle
Clusterware home.
376. Voting
Files stored in ASM - How many disks per disk group do I need?
If Voting
Files are stored in ASM, the ASM disk group that hosts the Voting Files will
place the appropriate number of Voting Files in accordance to the redundancy
level. Once Voting Files are managed in ASM, a manual addition, deletion, or
replacement of Voting Files will fail, since users are not allowed to manually manage
Voting Files in ASM.
If the
redundancy level of the disk group is set to "external", 1 Voting
File is used.
If the
redundancy level of the disk group is set to "normal", 3 Voting Files
are used.
If the redundancy
level of the disk group is set to "high", 5 Voting Files are used.
Note that
Oracle Clusterware will store the disk within a disk group that holds the
Voting Files. Oracle Clusterware does not rely on ASM to access the Voting
Files.
In addition,
note that there can be only one Voting File per failure group. In the above
list of rules, it is assumed that each disk that is supposed to hold a Voting
File resides in its own, dedicated failure group.
In other
words, a disk group that is supposed to hold the above mentioned number of
Voting Files needs to have the respective number of failure groups with at
least one disk. (1 / 3 / 5 failure groups with at least one disk)
Consequently, a normal redundancy ASM
disk group, which is supposed to hold Voting Files, requires 3 disks in
separate failure groups, while a normal redundancy ASM disk group that is not
used to store Voting Files requires only 2 disks in separate failure groups.
377. OCR stored in ASM - What happens, if
my ASM instance fails on a node?
If an ASM instance fails on any node, the
OCR becomes unavailable on this particular node, but the node remains
operational.
If the (RAC) databases use ASM, too, they
cannot access their data on this node anymore during the time the ASM instance
is down. If a RAC database is used, access to the same data can be established
from another node.
If the CRSD process running on the node
affected by the ASM instance failure is the OCR writer, AND the majority of the
OCR locations is stored in ASM, AND an IO is attempted on the OCR during the
time the ASM instance is down on this node, THEN CRSD stops and becomes
inoperable. Hence cluster management is affected on this particular node.
Under no circumstances will the failure
of one ASM instance on one node affect the whole cluster
378. Can I change the public hostname in
my Oracle Database 10gCluster using Oracle Clusterware? OR
Can I change a node’s hostname?
Hostname changes are not supported in
Oracle Clusterware (CRS), unless you want to perform a deletenode followed by a new addnode
operation. The hostname is used to store among other things the flag files and
Oracle Clusterware stack will not start if hostname is changed.
379. Which processes access the OCR ?
Oracle Cluster Registry (OCR) is used to
store the cluster configuration information among otherthings. OCR needs to be
accessible from all nodes in the cluster. If OCR became inaccessible the CSS
daemon would soon fail, and take down the node. PMON never needs to write to
OCR. To confirm if OCR is accessible, try ocrcheck from your ORACLE_HOME and ORA_CRS_HOME.
380. How do I restore OCR from a backup?
On Windows, can I useocopy? OR
If you lost OCR, how do you Restore?
The only recommended way to restore an OCR from a backup is "ocrconfig -restore ". The ocopy command will not be able to perform the restore action for OCR.
The only recommended way to restore an OCR from a backup is "ocrconfig -restore ". The ocopy command will not be able to perform the restore action for OCR.
381. How
to Restore a Lost Voting Disk
As long as you can confirm via the CSS
daemon logfile that it thinks the voting disk is bad, you can restore the
voting disk from backup while the cluster is online. This is the backup that
you took with dd (by the manual's request) after the most recent addnode,
deletenode, or install operation. If by accident you restore a voting disk that
the CSS daemon thinks is NOT bad, then the entire cluster will probably go
down.
crsctl add css votedisk - adds a new voting disk
crsctl delete css votedisk - removes a voting disk
Note: the cluster has to be down. You can
also restore the backup via dd when the cluster is down.
382. Why
is the home for Oracle Clusterware not recommended to be subdirectory of the
Oracle base directory?
If anyone other than root has write
permissions to the parent directories of the Oracle Clusterware home / Oracle
Grid Infrastructure for a Cluster home, then they can give themselves root
escalations. This is a security issue.
Consequenely, it is strongly recommended
to place the Oracle Grid Infrastructure / Oracle Clusterware home outside of
the Oracle Base. The Oracle Universal Installer will confirm deviating settings
during the Oracle Grid Infrastructure 11g Release 2 and later installation.
The Oracle Clusterware home itself is a
mix of root and non-root permissions, as appropriate to the security requirements.
Please, follow the installation guides regarding OS users and groups and how to
structure the Oracle software installations on a given system.
383. What
are the IP requirements for the private interconnect?
The install guide will tell you the
following requirements private IP address must satisfy the following
requirements:
1. Must be separate from the public
network
2. Must be accessible on the same network
interface on each node
3. Must have a unique address on each
node
4. Must be specified in the /etc/hosts
file on each node
The Best Pratices recommendation is to
use the TCP/IP standard for non-routeable networks.
Reserved address ranges for private
(non-routed) use (see TCP/IP RFC 1918):
* 10.0.0.0 -> 10.255.255.255
* 172.16.0.0 -> 172.31.255.255
* 192.168.0.0 -> 192.168.255.255
Cluvfy will give you an error if you do
not have your private interconnect in the ranges above.
You should not ignore this error. If you
are using an IP address in the range used for the public
network for the private network
interfaces, you are pretty much messing up the IP addressing,
and possibly the routing tables, for the
rest of the corporation. IP addresses are a sparse commodity, use them wisely.
If you use them on a non-routable network, there is nothing to prevent someone
else to go and use them in the normal corporate network, and then when
those RAC nodes find out that there is
another path to that address range (through RIP), they just might start sending
traffic to those other IP addresses instead of the interconnect. This is just a
bad idea.
384. Does
Oracle Clusterware have to be the same or higher releasethan all instances
running on the cluster?
Yes - Oracle Clusterware must be the same or a
higher release with regards to the RDBMS or ASM Homes.
385. How
much I/O activity should the voting disk have?
Approximately 2 read + 1 write per second per
node.
386. I
made a mistake when I created the VIP during the install of OracleClusterware,
can I change the VIP?
Yes The details of how to do this are
described in Metalink Note.276434.1
387. I
have a 2-node RAC running. I notice that it is always node2 that isevicted when
I test private network failure scenario bydisconnecting the private network
cable. Doesn't matter whether it
is node1's
or node2's private network cable that is disconnected, itis always the node2
that is evicted. What happens in a 3-nodesRAC cluster if node1's cable is
disconnected?
The node with the lower node number will
survive(The first node to join the cluster). In case of 3 nodes, 2 nodes will
survive and the one you pulled the cable will go away. 4 nodes - the sub
cluster with the lower node number will survive.
388. Can I
configure a firewall (iptables) on the cluster interconnect?
Disable all firewalls on the cluster
interconnect. See note: 554781.1 for details
389. Can I
use ASM to mirror Oracle data in an extended RACenvironment?
This
support is for 10gR2 onwards and has the following limitations:
1. As in any extended RAC environments,
the additional latency induced by distance will affect I/O and cache fusion
performance. This effect will vary by distance and the customer isresponsible
for ensuring that the impact attained in their environment is acceptable for
theirapplication.
2. OCR must be mirrored across both sites
using Oracle provided mechanisms.
3. Voting Disk redundancy must exists
across both sites, and at a 3rd site to act as an arbitrage. This third site
may be via a WAN.
4. Storage at each site much be setup as
seperate failure groups and use ASM mirroring, to ensure at least one copy of
the data at each site.
5. Customer must have a seperate and
dedicated test cluster also in an extended configuration setup using the same
software and hardware components (can be fewer or smaller nodes).
6. Customer must be aware that in 10gR2
ASM does not provide partial resilvering. Should a loss of connectivity between
the sites occur, one of the failure groups will be marked invalid. When the site
rejoins the cluster, the failure groups will need to be manually dropped and
added.
390. What
is a stage?
CVU supports the notion of Stage verification.
It identifies all the important stages in RAC deployment and provides each
stage with its own entry and exit criteria. The entry criteria for a stage
define a specific set of verification tasks to be performed before initiating
that stage. This pre-check saves the user from entering into a stage unless its
pre-requisite conditions are met. The exit criteria for a stage define another
specific set of verification tasks to be performed after completion of the
stage. The post-check ensures that the activities for that stage have been completed
successfully. It identifies any stage specific problem before it propagates
tosubsequent stages; thus making it difficult to find its root cause. An
example of a stage is "precheckof database installation", which
checks whether the system meets the criteria for RACinstall.
391. What
is a component?
CVU supports the notion of Component
verification. The verifications in this category are not associated with any
specific stage. The user can verify the correctness of a specific
clustercomponent. A component can range from a basic one, like free disk space
to a complex one likeCRS Stack. The integrity check for CRS stack will
transparently span over verification of multiplesub-components associated with
CRS stack. This encapsulation of a set of tasks within specificcomponent
verification should be of a great ease to the user.
392. What
is nodelist?
A nodelist is a comma separated list of
hostnames without domain. Cluvfy will run the requested verification on all
nodes in the nodelist provided. Cluvfy will ignore any domain while processing
the nodelist. If duplicate entities after removing the domain exist, cluvfy
will eliminate the duplicate names while processing. Wherever supported, you
can use '-n all' to check on all the cluster nodes. Check "Do I have to
type the nodelist every time for the CVU commands? Is there any shortcut?"
for more information on nodelist and shortcuts.
393. What
is a configuration file?
CVU supports a configuration file called cvu_config under CV_HOME/cv/admin folder. This file
supports property-value style preferences in a persistent way. This might vary
depending upon the platform. Here is a brief description of some of those properties:
CV_ORACLE_RELEASE: This property can take a
value of the Oracle release that should be assumed when -r option is not
specified in the command line. The valid values that can be set are 10gR1,
10gR2, 11gR1 or 11gR2. If this property is not set then the default is 11gR2.
CV_NODE_ALL: This property stores a comma separated
list of nodes to be used for all the nodes in the cluster. This value will be
used for "-n all"
argument on the command line.
CV_RAW_CHECK_ENABLED: If this property is set
to TRUE, then CVU will perform scsi disk discovery and sharedness checks. For
Linux platforms, CVU requires the cvuqdisk rpm installed on all nodes if this
property is set.
CV_ASSUME_DISTID: This property is used in
cases where CVU can not detect or support a particular platform or a
distribution. It is not recommend to change this property as this might render
CVU non-functional.
CV_XCHK_FOR_SSH_ENABLED: If this property is set
to TRUE, CVU will also check whether X-Windows is configured with SSH for user
equivalence.
ORACLE_SRVM_REMOTESHELL: This property stores
alternative remote shell command location.
ORACLE_SRVM_REMOTECOPY: This property stores
alternative remote copy command location.
CV_ASSUME_CL_VERSION: By default, command line
parser uses CRS active version for the display of command line syntax usage and
syntax validation, use this property to pass a version other than CRS active
version for command line syntax display and validation.
CV_TRACELOC: Use this property to choose the location
in which CVU generates the trace files, set it to the absolute path of the
desired trace directory.
394. Do I
have to be root to use CVU?
No. CVU is intended for database and
system administrators. CVU assumes the current user as oracle user.
395. What
about discovery? Does CVU discover installed cponents?
At present, CVU's discovery is limited to
the following components. CVU discovers available network interfaces if you do
not specify any interface in its command line. For storage related
verification, CVU discovers all the supported storage types if you do not
specify a particular storage. CVU discovers CRS HOME if one is available. CVU
also discovers the statically configured nodelist for the cluster if an Oracle
supported vendor clusterware or Oracle Clusterware is available.
396. What
about locale? Does CVU support other languages?
Yes. CVU complies to Oracle's NLS
guidelines and supports locale.
397. What
version of Oracle Clusterware or RAC is supported by CVU?
On Linux x86 and x86_64: The current CVU
release supports Oracle Clusterware, Oracle RAC 10g, and Oracle RAC 11g. In
other words, "the current version" of CVU can check 10g as well as
11g releases of Oracle Clusterware or RAC. However, it can not check or verify
pre-10g products.
On Solaris SPARC64, AIX, HPUX (PARISC and IA64): CVU is limitedly
backward compatible to the previous Oracle Clusterware releases up to Oracle
Database 10g Release 1. It works on the operating system versions supported by
11gR2, that would be Solaris 9, Solaris 10, AIX 5.3, AIX 6.1, HPUX 11.23 and
HPUX 11.31 only.
398. What
are the requirements for CVU?
CVU requires:
1.An area with at least 200MB on Linux
x86, 285MB on Linux x86_64, 300MB on Solaris SPARC64 and Solaris x64, 158MB on
AIX, 160MB on HPUX IA64 and 160MB HPUX PARISC of free space for containing
software bits on the invocation node.
2.A work directory with at least 5MB on
all the nodes. CVU will attempt to copy the necessary bits as required to this
location. Make sure, the location exists on all nodes and it has write
permission for CVU user. This directory is set through the CV_DESTLOC
environment variable. If this variable is not set, CVU will use the common
temporary location such as "/tmp" for Linux and "C:\Temp"
for Windows as the work dir.
3.An optional package 'cvuqdisk' is
required on all the nodes for Linux distributions. This assists CVU in finding
scsi disks and helps CVU to perform storage checks on disks. Please refer to
What is 'cvuqdisk' rpm? for detail. Note that, this package should be installed
only on RedHat Linux
4(or higher), or Enterprise Linux 4(or
higher) or SuSE 9(or higher) distribution or on other Linux flavors of
comparable versions.
399. How
do I install CVU from OTN?
Here is how one can install CVU from a
zip file(cvupack_<platform>.zip) downloaded from OTN:
1. Create a CV home( say
/home/username/mycvhome ) directory. It should have at least 35M of free disk
space.
2. cd /home/username/mycvhome
3. copy the cvupack_<platform>.zip
file to /home/username/mycvhome
4. unzip the file: > unzip
cvupack<platform>.zip
5. (Optional) Set the environmental
variable CV_DESTLOC. This should point to a writable area on *all* nodes. When
invoked, the tool will attempt to copy the necessary bits as required to this
location. Make sure the location exists on all nodes and it has write
permission for CVU user. It is strongly recommended that you should set this
variable. If this variable has not been set, CVU will use "/tmp" as
the default.
> setenv CV_DESTLOC /tmp/cvu_temp
To verify, run cluvfy from <CV
Home>/bin directory (typically /home/username/mycvhome/bin/cluvfy). This
should show the usage.
For Linux platforms, an optional rpm
package 'cvuqdisk' is required on all the nodes. Please refer to How do I
install 'cvuqdisk' package?
400. What
is 'cvuqdisk' rpm? Why should I install this rpm?
cvuqdisk is applicable on Linux platforms
only. CVU requires root privilege to gather information about the scsi disks
during discovery. A small binary uses the setuid mechanism to query disk information
as root. Note that this process is
purely a read-only process with no adverse impact on the system. To make this
secured, this binary is packaged in the cvuqdisk rpm and need root privilege to
install on a machine.
When this package is installed on all the
nodes, CVU performs discovery and shared storage accessibility checks for scsi
disks. Otherwise, it complains about the missing package 'cvuqdisk'. You can
disable the scsi device check feature by setting the CV_RAW_CHECK_ENABLED to FALSE in
$CV_HOME/cv/admin/cvu_config file. CVU will not complain about the missing
rpm if this variable is set to false.
401. How
do I install 'cvuqdisk' package?
Here are the steps to install cvuqdisk
package.
1. Become root user
2. Copy the rpm ( cvuqdisk-1.0.7-1.rpm or
the latest version ) to a local directory. You can find the rpm in
<CV-HOME>/rpm directory where <CVHOME> is the directory in which
you have installed CVU from OTN.
3.Set the environment variable to a
group, who should own this binary. Typically it is the "dba" group.
export CVUQDISK_GRP=dba
4. Erase any existing package
rpm -e cvuqdisk
5. Install the rpm
rpm -iv cvuqdisk-1.0.7-1.rpm
6. Verify the package
rpm -qa | grep cvuqdisk
402. How
do I know about cluvfy commands? The usage text of cluvfy does not show
individual commands.
Cluvfy has context sensitive help built
into it. Cluvfy shows the most appropriate usage text based on the cluvfy
command line arguments. If you type 'cluvfy' on the command prompt, cluvfy displays the
high level generic usage text, which talks about valid stage and component
syntax. If you type 'cluvfy comp -list', cluvfy will show valid components with brief
description on each of them. If you type 'cluvfy comp -help', cluvfy will show
detail syntax for each of the valid components. Similarly, 'cluvfy stage -list' and 'cluvfy stage -help' will list valid stages
and their syntax respectively.
If you type an invalid command, cluvfy
will show the appropriate usage for thatparticular command. For example, if you
type 'cluvfy stage -pre
dbinst',
cluvfy will show the syntax for pre-check of dbinst stage.
403. What
are the default values for the command line arguments?
Here are the default values and behavior
for different stage and component commands:
For component nodecon: If no -i arguments
is provided, then cluvfy runs in the discovery mode.
For component nodereach: If no -srcnode
is provided, then the local(node of invocation) will be used as the source
node.
For components ssa:
If no -n argument is provided, then the
local node will be used.
If no -s argument is provided, then
cluvfy runs in the storage discovery mode.
If no -t argument is provided, then the
device is assumed to be used for oracle data files.
For components clu:
If no -n argument is provided, then all
the nodes in the cluster will be used for verification.
For components cfs, space, clu, clumgr,
ocr, crs, nodeapp, asm, gpnp, gns, ohasd, clocksync :
If no -n argument is provided, then the
local node will be used.
For components sys :
If no -n argument is provided, then the
local node will be used.
If no -r argument is provided, then 11gR2
will be used.
If no -osdba argument is provided, then
'dba' will be used.
If no -orainv argument is provided, then
'oinstall' will be used.
If -fixup argument is provided, but -fixupdir
argument is not provided, fixup files will be generated in CVU's work
directory.
For components admprv:
If no -n argument is provided, then the
local node will be used.
If no -osdba argument is provided, then
'dba' will be used.
If no -orainv argument is provided, then
'oinstall' will be used.
If -fixup argument is provided, but
-fixupdir argument is not provided, fixup files will be generated in CVU's work
directory.
For component peer:
If no -r argument is provided, then 11gR2
will be used.
If no -osdba argument is provided, then
'dba' will be used.
If no -orainv argument is provided, then
'oinstall' will be used.
For component software:
If no -n argument is provided, then the
local node will be used.
If no -d argument is provided, then crs
home will be discovered and the files for crs will be verified.
For component acfs:
If no -n argument is provided, then the
local node will be used.
For stage -post hwos:
If no -s argument is provided, then
cluvfy runs in the discovery mode.
For stage -pre crsinst:
If no -r argument is provided, then 11gR2
will be used.
If no -c argument is provided, then
cluvfy will skip OCR related checks.
If no -q argument is provided, then
cluvfy will skip voting disk related checks.
If no -osdba argument is provided, then
'dba' will be used.
If no -orainv argument is provided, then
'oinstall' will be used.
If -fixup argument is provided, but
-fixupdir argument is not provided, fixup files will be generated in CVU's work
directory.
For stage -pre dbinst:
If no -r argument is provided, then 11gR2
will be used.
If no -osdba argument is provided, then
'dba' will be used.
If -fixup argument is provided, but
-fixupdir argument is not provided, fixup files will be generated in CVU's work
directory.
For stage -pre dbcfg:
If -fixup argument is provided, but
-fixupdir argument is not provided, fixup files will be generated in CVU's work
directory.
For stage -pre acfscfg:
If no -asmdev argument is provided,
default discovery string will be used to discover ASM devices.
For stage -pre hacfg:
If no -osdba argument is provided, then
'dba' will be used.
If no -orainv argument is provided, then
'oinstall' will be used.
If -fixup argument is provided, but
-fixupdir argument is not provided, fixup files will be generated in CVU's work
directory.
For stage -pre nodeadd:
If -fixup argument is provided, but
-fixupdir argument is not provided, fixup files will be generated in CVU's work
directory.
NOTE: For each verification command that
supports the optional -r option to specify the supported Oracle release, the
default release is assumed to be
11gR2 if -r option is not specified. To
perform verifications for any previous release, ë-r 10gR1à or ë-r 10gR2à or ë-r
11gR1Ã must be specified. If the verifications are to be performed for a
specific release earlier than 11gR1 then use of -r option can be avoided by
setting the intended release value (10gR1 or 10gR2 or 11gR1) for
CV_ORACLE_RELEASE property in CVU's configuration file (located under <CVU
installation root dir>/cv/admin directory).
404. Do I
have to type the nodelist every time for the CVU commands? Is there any
shortcut?
You do not have to type the nodelist
every time for the CVU commands. Typing the nodelist for a large cluster is
painful and error prone. Here are few shortcuts.
To provide all the nodes of the cluster,
type '-n all'. Cluvfy will attempt to get the nodelist in the following order:
1. If a vendor clusterware is available,
it will pick all the configured nodes from the vendor clusterware using lsnodes
utility.
2. If CRS is installed, it will pick all
the configured nodes from Oracle clusterware using olsnodes utility.
3. It will look for the CV_NODE_ALL
property in the cvu_config file under $CV_HOME/cv/admin.
4. If none of the above, it will look for
the CV_NODE_ALL environmental variable.
5. Otherwise, it will complain.
To provide a partial list(some of the
nodes of the cluster) of nodes, you can set an environmental variable and use
it in the CVU command. For example:
export MYNODES=node1,node3,node5
cluvfy comp nodecon -n $MYNODES
405. How
do I get detail output of a check?
Cluvfy supports a verbose feature. By
default, cluvfy reports in non-verbose mode and just reports the summary of a
test. To get detailed output of a check, use the flag '-verbose' in the command
line. This will produce detail output of individual checks and where applicable
will show per-node result in a tabular fashion.
406. How
do I check network or node connectivity related issues?
Use component verifications commands like
'nodereach' or 'nodecon' for this purpose. For
detail syntax of these commands, type cluvfy comp -help on the command prompt.
If the 'cluvfy comp nodecon' command is invoked
without -i, cluvfy will attempt to discover all the available interfaces and
the corresponding IP address & subnet. Then cluvfy will try to verify the
node connectivity per
subnet. You can run this command in
verbose mode to find out the mappings between theinterfaces, IP addresses and
subnets.
Cluvfy will suggest interfaces for VIP
and private interconnect if suitable interfaces are available.You can check the
connectivity among the nodes byspecifying the interface name(s) through -i
argument.
407. Can
CVU fix something in the system?
Yes, CVU supports the functionality of
fixing up several system parameters that do not meet the requirements. Wherever
applicable, the argument '-fixup' can be specified in command line to request
generation of fixup scripts. When this argument is specified, CVU auto
generates fixup scripts containing appropriate values for those parameters that
need to be fixed and their fix is supported by CVU. Instructions are provided
at the end of cluvfy command execution results to execute the mentioned fixup
script under root privileges.
408. How
do I check whether OCFS or OCFS2 is properly configured?
OCFS or OCFS2 is applicable on Linux
platforms only. You can use the component command 'cluvfy comp cfs' to check this. Provide
the OCFS or OCFS2 file system you want to check through the -f argument. Note
that, the sharedness check for the OCFS file system is supported for OCFS
version 1.0.14 or higher.
409.
Can
I check if the storage is shared among the nodes?
Yes, you can use 'comp ssa' command to check the
sharedness of the storage.
410. How
do I check user accounts and administrative permissions related issues?
Use admprv component verification command. Refer to the
usage text for detail instruction and type of supported operations. To check
whether the privilege is sufficient for user equivalence, use '-o user_equiv' argument. You can
force CVU to check user equivalence using SSH only by the '- sshonly' flag. Similarly, the '-o crs_inst' will verify whether the
user has the correct permissions for installing Oracle Clusterware. The '-o db_inst' will check for
permissions required for installing RAC and '-o db_config' will check for permissions required for
creating a RAC database or modifying a RAC database configuration.
411. How
do I check if SSH is configured properly on my cluster?
You can use CVU's admprv component verification
command 'comp admprv
-n <nodelist> -o user_equiv -sshonly -verbose' to verify this. To
check whether X-Windows is configured to work with SSH for user equivalence as
per Oracle's requirement, set the following property "CV_XCHK_FOR_SSH_ENABLED=TRUE" in the $CV_HOME/cv/admin/cvu_config file.
412. How
do I check minimal system requirements on the nodes?
The component verification command sys is
meant for that. Note that, CVU can check the minimum system requirements for
Oracle Clusterware versions 10gR1, 10gR2, 11gR1 and 11gR2. Use the '-p crs'
argument to check requirements for Oracle Clusterware and -r argument for the
desired version.
413. Is
there a way to compare nodes?
You can use the peer comparison feature
of cluvfy for this purpose. The command 'comp peer' will list the values of
different nodes for several pre-selected properties. You can use the peer command
with -refnode
argument to compare those properties of other nodes against the reference node.
414. Why
the peer comparison with -refnode says matched when the group or user does not
exist?
Peer comparison with the -refnode feature
acts like a baseline feature. It compares the system properties of other nodes
against the reference node. If the value does not match( not equal to reference
node value ), then it flags that as a deviation from the reference node. If a
group or user does not exist on reference node as well as on the other node, it
will report this as 'matched' since there is no deviation from the reference node.
Similarly, it will report as 'mismatched' for a node with higher total memory
than the reference node for the above reason.
415. At
what point cluvfy is usable? Can I use cluvfy before installingOracle
Clusterware?
You can run cluvfy at any time, even
before CRS installation. In fact, cluvfy is designed to assist the user as soon
as the hardware and OS is up. If you invoke a command which requires CRS or RAC
on local node, cluvfy will report an error if those required products are not
yet installed.
416. How
do I turn on tracing?
Set the environmental variable SRVM_TRACE
to true. For example, in tcsh "setenv SRVM_TRACE true" will turn on
tracing.
The tracing is turned on by default. Set
the environmental variable SRVM_TRACE to false if you do not want tracing. For
example, in bash "export SRVM_TRACE=false" or ìexport
SRVM_TRACE=FALSEî will switch off tracing.
Also it may help to run cluvfy with
-verbose attribute
$script run.log
$export SRVM_TRACE=TRUE
$cluvfy -blah -verbose
$exit
417. Where
can I find the CVU trace files?
CVU log files can be found under $CV_HOME/cv/log directory. The log
files are automatically rotated and the latest log file has the
name cvutrace.log.0. It is a good idea to
clean up unwanted log files or archive them to reclaim disk place. If you want
the trace files to be generated in a location other than $CV_HOME/cv/log then set the
environment variable CV_TRACELOC
to the desired location of your choice. In recent releases, CVU trace files are
generated by default. Setting SRVM_TRACE=false before invoking cluvfy disables the trace generation for
that invocation
418. Why
cluvfy reports "unknown" on a particular node?
Cluvfy reports unknown when it can not
conclude for sure if the check passed or failed. A common cause of this type of
reporting is a non-existent location set for the CV_DESTLOC variable. Please make sure the directory
pointed by this variable exists on all nodes and is writable by the user.
419. Why
does CVU complain "WARNING: Could not find a suitable set of interfaces
for VIPs"?
CVU checks for the following criteria
before considering a set of interfaces for VIP:
-- the interfaces should have the same
name across nodes
-- they should belong to the same subnet
-- they should have the same netmask
-- they should be on public(and routable)
network.
Oftentimes, the interfaces planned for
the VIPs are configured on 10.*, 172.16.* - 172.31.* or 192.168.* networks,
which are not routable. Hence CVU does not consider them as suitable for VIPs.
If none of the available interfaces satisfy this criteria, CVU complains "WARNING:
Could not find a suitable set of interfaces for VIPs.". It is worth noting
that, such addresses will actually work if they're public, but CVU just thinks
they're private and reports accordingly.
420. What
OS versions and distributions are supported?
CVU is supported on all the OS versions
and distributions on which Oracle Clusterware 10g and 11g are supported.
No comments:
Post a Comment