Manoj Simar - The Technology Expert: Interview Q and A for Oracle RAC Part

341. interfaces to provide High Availability and/or Load Balancing for my interconnect with Oracle Clusterware?

Windows - available solutions:

Teaming

On Windows teaming solutions to ensure NIC availability are usually part of the network card driver. Thus, they depend on the network card used. Please, contact the respective hardware vendor for more information.

OS independent solution:

Redundant Interconnect Usage enables load-balancing and high availability across multiple (up to four) private networks (also known as interconnects).

Oracle RAC 11g Release 2, Patch Set One (11.2.0.2) enables Redundant Interconnect Usage as a feature for all platforms, except Windows.

On systems that use Solaris Cluster, Redundant Interconnect Usage will use clprivnet.

342. Is there a need to renice LMS processes in Oracle RAC 10gRelease 2?

LMS processes should be running in RT by default since 10.2, so there's NO need to renice them, or otherwise mess with them.

Check with ps -efl:

0 S spommere 31191 1 0 75 0 - 270857 - 10:01 ? 00:00:00 ora_lmon_appsu01

0 S spommere 31193 1 5 75 0 - 271403 - 10:01 ? 00:00:07 ora_lmd0_appsu01

0 S spommere 31195 1 0 58 - - 271396 - 10:01 ? 00:00:00 ora_lms0_appsu01

0 S spommere 31199 1 0 58 - - 271396 - 10:01 ? 00:00:00 ora_lms1_appsu01

7th column, if it is 75 or 76 then this is Time Share, 58 is Real Time.

You can also use chrt to check:

LMS (Real Time):

$ chrt -p 31199

pid 31199's current scheduling policy: SCHED_RR

pid 31199's current scheduling priority: 1

LMD (Time Share)

$ chrt -p 31193

pid 31193's current scheduling policy: SCHED_OTHER

pid 31193's current scheduling priority: 0

343. How do I check for network problems on my interconect?

1. Confirm that full duplex is set correctly for all interconnect links on all interfaces on both ends.Do not rely on auto negotiation.

2. ifconfig -a will give you an indication of collisions/errors/overuns and dropped packets

3. netstat -s will give you a listing of receive packet discards, fragmentation and reassembly errors for IP and UDP.

4. Set the udp buffers correctly

5. Check your cabling

Note: If you are seeing issues with RAC, RAC uses UDP as the protocol. Oracle Clusterware uses TCP/IP.

344. How to use VLANs in Oracle RAC?

It is Oracle's standing recommendation to separate the various types of communication in an Oracle RAC cluster as much as possible. This general recommendation is the basis for the following separation of communication:

Each node in an Oracle RAC cluster must have at least one public network.

Each node in an Oracle RAC cluster must have at least one private network, also referred to as "interconnect".

Each node in an Oracle RAC cluster must have at least an additional network interface, if the shared storage is accessed using a network based connection.

In addition Oracle RAC and Oracle Clusterware deployment best practices recommend that the interconnect be deployed on a stand-alone, physically seperate, dedicated switch, since it represents the easiest to configure and most secure as well as stable configuration. Many customers, however, have consolidated or prefer to consolidate these stand-alone switches into larger managed switches.

Depending on the level of consolidation that is performed on the switch level, the switch thereby may become a single point of failure. Hardware redundancy within an enterprise switch may mitigate some of the risks, but there are limitations as far as maintenance operations are concerned. Mainaining switch redundancy is therefore highly recommended. Another consequence of this consolidation is a merging of IP networks on a single shared switch, segmented by VLANs in various levels, which include, but are not limited to:

Sharing the same switch (and network channel) for private and public communication

Sharing the same switch (and network channel) for the private communication of more than one cluster.

Sharing the same switch (and network channel) for private communication and shared storage access.

While an increasingly powerful network infrastructure makes it more and more interesting for customers to consolidate network communication on fewer physical networks, it needs to be remembered that the latency and bandwidth requirements as well as availability requirements of the Oracle RAC / Oracle Clusterware interconnect IP network are more in-line with high performance computing. In a more abstract way, one should not look at the interconnect as a network, but rather as a backplane to connect the memory of the cluster nodes.

While observing the bandwidth requirements, Oracle generally recommends maintaining a 1:1 relation when VLANs are used in any possible way and if the usage of VLANs cannot be avoided. In this context, it needs to be noted that bandwidth and latency are not the only concerns. Security, ease of management, and unintended but possible side-effects of using a shared resource such as multicast flooding or spanning tree re-convergence also need to be considered. In detail:

Sharing the same switch (and network channel) for private and public communication

and deploying the interconnect on a VLAN in this environment, there should be a 1:1 mapping of the VLAN to a non-routable subnet and the VLAN should not span multiple VLANs (tagged) or multiple switches.

Sharing the same switch (and network channel) for the private communication of more than one cluster,

one VLAN per cluster is recommended for the purpose of a "cleaner" management and security

Further consolidation, such as using only one VLAN for all clusters, is supported, but not recommended.

It is supported to use the same, consolidated network infrastructure (within the same security domain) for various clusters without the use of VLANs, while separated channels are recommended.

Sharing the same switch (and network channel) for private communication and shared storage access

is supported, if the underlying network infrastructure recognizes and prioritizes network based communication to the storage.

345. Are there any issues for the interconnect when sharing the sameswitch as the public network by using VLAN to separate thenetwork?

RAC and Clusterware deployment best practices recommend that the interconnect be deployed on a stand-alone, physically seperate, dedicated switch. Many customers have consolidated these stand-alone switches into larger managed switches. A consequence of this consolidation is a merging of IP networks on a single shared switch, segmented by VLANs. There are caveats

associated with such deployments. RAC cache fusion exercises the IP network more rigorously than non-RAC Oracle databases. The latency and bandwidth requirements as well as availability requirements of the RAC/Clusterware interconnect IP network are more in-line with high performance computing. Deploying the RAC/Clusterware interconnect on a shared switch, segmented VLAN may expose the interconnect links to congestion and instability in the larger IP network topology. If deploying the interconnect on a VLAN, there should be a 1:1 mapping of VLAN to non-routable subnet and the VLAN should not span multiple VLANs (tagged) or multiple switches. Deployment concerns in this environment include Spanning Tree loops when the larger IP network topology changes, Assymetric routing that may cause packet flooding, and lack of fine grained monitoring of the VLAN/port.

346. Are jumbo frames supported for the RAC interconnect?

Yes. For details see Note:341788.1 Cluster Interconnect and Jumbo Frames

347. We are using Transparent Data Encryption (TDE).We create a wallet on node 1 and copy to nodes 2 & 3. Open thewallet and we are able to select encrypted data on all three nodes. Now, we want to REKEY the MASTER KEY. What do we have todo?

After a re-key on node one, 'alter system set wallet close' on all other nodes, copy the wallet with the new master key to all other nodes, 'alter system set wallet open identified by "password"; on all other nodes to load the (obfuscated) master key into node's SGA.

348. Why does the NOAC attribute need to be set on NFS mounted RACBinaries?

The noac attribute is required because the installer determines sharedness by creating a file and checking for that fileıs existance on remote node. If the noac attribute is not enabled then this test will incorrectly fail. This will confuse installer and opatch. Some other minor issues with spfile in the default $ORACLE_HOME/dbs will definitely be affected.

349. How do I use DBCA in silent mode to set up RAC and ASM?

If I already have an ASM instance/diskgroup then the following creates a RAC database on that diskgroup:

su oracle -c "$ORACLE_HOME/bin/dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbName $SID -sid $SID -sysPassword $PASSWORD -systemPassword $PASSWORD -sysmanPassword $PASSWORD -dbsnmpPassword $PASSWORD -

emConfiguration LOCAL -storageType ASM -diskGroupName $ASMGROUPNAME - datafileJarLocation $ORACLE_HOME/assistants/dbca/templates -nodeinfo $NODE1,$NODE2 - characterset WE8ISO8859P1 -obfuscatedPasswords false -sampleSchema false -oratabLocation /etc/oratab"

The following will create a ASM instance & 1 diskgroup

su oracle -c "$ORA_ASM_HOME/bin/dbca -silent -configureASM -gdbName NO -sid NO -emConfiguration NONE -diskList $ASM_DISKS -diskGroupName $ASMGROUPNAME -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates -nodeinfo $NODE1,$NODE2 -

obfuscatedPasswords false -oratabLocation /etc/oratab -asmSysPassword $PASSWORD -redundancy $ASMREDUNDANCY"

where ASM_DISKS = '/dev/sda1,/dev/sdb1' and ASMREDUNDANCY='NORMAL'

350. How does OCR mirror work? What happens if my OCR is lost/corrupt?

OCR is the Oracle Cluster Registry, it holds all the cluster related information such as instances, services. The OCR file format is binary and starting with 10.2 it is possible to mirror it. Location of file(s) is located in: /etc/oracle/ocr.loc in ocrconfig_loc and ocrmirrorconfig_loc variables. Obviously if you only have one copy of the OCR and it is lost or corrupt then you must restore a

recent backup, see ocrconfig utility for details, specifically -showbackup and -restore flags. Until a valid backup is restored the Oracle Clusterware will not startup due to the corrupt/missing OCR file.

The interesting discussion is what happens if you have the OCR mirrored and one of the copies gets corrupt?

You would expect that everything will continue to work seemlessly. The real answer depends on when the corruption takes place.

--If the corruption happens while the Oracle Clusterware stack is up and running, then thecorruption will be tolerated and the Oracle Clusterware will continue to funtion withoutinterruptions. Despite the corrupt copy. DBA is advised to repair this hardware/software problemthat prevent OCR from accessing the device as soon as possible; alternatively, DBA can replacethe failed device with another healthy device using the ocrconfig utility with -replace flag.

--If however the corruption happens while the Oracle Clusterware stack is down, then it will not be possible to start it up until the failed device becomes online again or some administrative action usingocrconfig utility with -overwrite flag is taken. When the Clusteware attempts to start you will see messages similar to:

total id sets (1), 1st set (1669906634,1958222370), 2nd set (0,0) my

votes (1), total votes (2)

2006-07-12 10:53:54.301: [OCRRAW][1210108256]proprioini:disk 0

(/dev/raw/raw1) doesn't have enough votes (1,2)

2006-07-12 10:53:54.301: [OCRRAW][1210108256]proprseterror: Error in

accessing physical storage [26]

This is because the software can't determin which OCR copy is the valid one. In the above example one of the OCR mirrors was lost while the Oracle Clusterware was down. There are 3 ways to fix this failure:

a) Fix whatever problem (hardware/software?) that prevent OCR from accessing the device.

b) Issue "ocrconfig -overwrite" on any one of the nodes in the cluster. This command will overwrite the vote check built into OCR when it starts up. Basically, if OCR device is configured with mirror, OCR assign each device with one vote. The rule is to have more than 50% of total vote (quorum) in order to safely make sure the available devices contain the latest data. In 2-way mirroring, the total vote count is 2 so it requires 2 votes to achieve the quorum. In the example above there isn't enough vote to start if only one device with one vote is available. (In the earlier example, while OCR is running when the device is down, OCR assign 2 vote to the surviving device and that is why this surviving device now with two votes can start after the cluster is down). See warning below

c) This method is not recommend to be performed by customers. It is possible to manually modify ocr.loc to delete the failed device and restart the cluster. OCR won't do the vote check if the mirror is not configured. See warning below

EXTREME CAUTION should be excersized if chosing option b or c above since data loss can occur if the wrong file is manipulated, please contact Oracle Support for assistance before proceeding.

351. If I use Services with Oracle RAC, do I still need to set up Load Balancing ?

Yes, Services allow you granular definition of workload and the DBA can dynamically define which instances provide the service. Connection Load Balancing (provided by Oracle NetServices) still needs to be set up to allow the user connections to be balanced across allinstances providing a service. With Oracle RAC 10g Release 2 or higher, set the CLB_GOAL on service to define the type of load balancing you want, SHORT for short lived connections (IE connection pool) or LONG (default) for applciations that have connections active for long periods(IE Oracle Forms applicaiton).

352. What is CLB_GOAL and how should I set it?

CLB_GOAL is the connection load balancing goal for a service. There are 2 options,CLB_GOAL_SHORT and CLB_GOAL_LONG (default).Long is for applications that have long-lived connections. This is typical for connection pools and SQL*Forms sessions. Long is the default connection load balancing goal.

Short is for applications that have short-lived connections. The GOAL for a service can be set with EM or DBMS_SERVICE.

Note: You must still configure load balancing with Oracle Net Services

353.How can a customer mask the change in their clustered database configuration from their client or application? (I.E. So I do not haveto change the connection string when I add a node to the RACdatabase)

The combination of Server Side load balancing and Services allows you to easily mask cluster database configuration changes. As long as all instances register with all listeners (use the LOCAL_LISTENER and REMOTE_LISTENER parameters), server side load balancing will allow clients to connect to the service on currently available instances at connect time.

The load balancing advisory (setting a goal on the service) will give advice as to how many connections to send to each instance currently providing a service. When a service is enabled on an instance, as long as the instance registers with the listeners, the clients can start getting connections to the service and the load balancing advisory will include that instance is its advice.

With Oracle RAC 11g Release 2, the Single Client Access Name (SCAN) provides a single name to be put in the client connection string (as the address). Clients using SCAN never have to change even if the cluster configuration changes such as adding nodes.

354. After executing DBMS_SERVICE.START_SERVICE, the serviceresource remains OFFLINE status when confirming it with crs_stat.Is that expected behavior ?

YES this is expected behaviour. Unfortunately, the DBMS_SERVICE.START_SERVICE does not update the clusterware until 11g Release 2. You should use srvctl start service -d dbname then you should see it come online.

Note: With Oracle RAC 11g Release 2, the cluster resource for a Service, contains the values for all the attributes of a service. Oracle Clusterware will update the database with its values when it starts a service. In order to save modifications across restarts, all service modifications should be made with srvctl (or Oracle Enterprise Manager).

355. Is it possible to use SVRCTL start database with a user accountother than oracle ( that is other than the owner of the oraclesoftware)?

YES. When you create a RAC db as a user different than the home/software owner (oracle) user, the db creation assistant would set the correct permissions/ACLs on the CRS resources that control the db/instances etc, assuming that you had setup group membership for this user to the dba group of the home (find it using oracle_home/bin/osdbagrp) and also part of the crs home owners primary group (usually oinstall) and there was group write permission on the oracle_home.

356. I am using shared services which the following set in init.ora SQL>show parameters dispatchers=(protocol=TCP)(listener=listeners_nl01)(con=500)(serv=oltp). I stopped my service with srvctlstop service but it is still registered with the listener and accepting connections. Is this expected?

YES. This is by design of dispatchers which are part of Oracle Net Services. If you specify the service attribute of the dispatchers init.ora parameter, the service specified cannot be managed by the dba.

357. Why am I seeing the following warnings in my listener.log for myRAC 10g environment?

WARNING: Subscription for node down event still pending

This message indicates that the listener was not able to subscribe to the ONS events which it uses to do the connection load balancing. This is most likely due to starting the listener using lsnrctl from the database home. When you start the listener using lsnrctl, make sure you have set the environment variable ORACLE_CONFIG_HOME = {Oracle Clusterware HOME}, also set it in

racgwrap in the $ORACLE_HOME/bin for the database.

358. Will FAN work with SQLPlus?

Yes with Oracle RAC 11g, you can specify the -F (FAILOVER) option. This enables SQL*Plus to interact with the OCI failover mode in a Real Application Cluster (RAC) environment. In this modea service or instance failure is transparently handled with transaction status messages if applicable.

359. Can I use TAF and FAN/FCF?

With Oracle Database 10g Release 1, NO. With Oracle Database 10g Release 2, the answer is YES for OCI and ODP.NET, it is recommended. For JDBC, you should not use TAF and FCFeven with the Thick JDBC driver.

360. What are the changes in memory requirements from moving from single instance to RAC?

If you are keeping the workload requirements per instance the same, then about 10% more buffer cache and 15% more shared pool is needed. The additional memory requirement is due to data structures for coherency management. The values are heuristic and are mostly upper bounds. Actual resource usage can be monitored by querying current and maximum columns for the gcs

resource/locks and ges resource/locks entries in V$RESOURCE_LIMIT.

But in general, please take into consideration that memory requirements per instance arereduced when the same user population is distributed over multiple nodes. In this case:

Assuming the same user population N number of nodes M buffer cache for a single system then

(M / N) + ((M / N )*0.10) [ + extra memory to compensate for failed-over users ]

Thus for example with a M=2G & N=2 & no extra memory for failed-over users

=( 2G / 2 ) + (( 2G / 2 )) *0.10 =1G + 100M

361. What is Runtime Connection Load Balancing?

Runtime connection load balancing enables the connection pool to route incoming work requests to the available database connection that will provide it with the best service. This will provide the best service times globally, and routing responds fast to changing conditions in the system. Oracle has implemented runtime connection load balancing with ODP.NET and JDBC connection pools. Runtime Connection Load Balancing is tightly integrated with the automatic workload balancing features introduced with Oracle Database 10g I.E. Services, Automatic Workload Repository, and the new Load Balancing Advisory.

To enable and use run-time connection load balancing, the connection goal must be set to SHORT and either of the following service-level goals must be set:

· SERVICE_TIME—The Load Balancing Advisory attempts to direct work requests to instances according to their response time. Load Balancing Advisory data is based on the elapsed time for work done by connections using the service, as well as available bandwidth to the service. This goal is best suited for workloads that require varying lengths of time to complete, for example, an internet shopping system.

· THROUGHPUT—The Load Balancing Advisory measures the percentage of the total response time that the CPU consumes for the service. This measures the efficiency of an instance, rather than the response time. This goal is best suited for workloads where each work request completes in a similar amount of time, for example, a trading system.

Client-side load balancing balances the connection requests across the listeners by setting the parameter ‘LOAD_BALANCE=ON’ directive. When you set this parameter to ON, Oracle Database randomly selects an address in the address list, and connects to that node's listener. This balances client connections across the available SCAN listeners in the cluster. When clients connect using SCAN, Oracle Net automatically load balances client connection requests across the three IP addresses you defined for the SCAN, unless you are using EZConnect.

362. How do I enable the load balancing advisory?

The load balancing advisory requires the use of services and Oracle Net connection load balancing. To enable it, on the server: set a goal (service_time or throughput, and set CLB_GOAL=SHORT ) on your service.

For client, you must be using the connection pool.

For JDBC, enable the datasource parameter FastConnectionFailoverEnabled.

For ODP.NET enable the datasource parameter Load Balancing=true.

To enable the load balancing advisory, use the ‘-B’ option when creating or modifying the service using the ‘srvctl’ command.

363. How do I measure the bandwidth utilization of my NIC or myinterconnect?

A more reliable, interactive way on Linux is to use the iptraf utility or the prebuilt rpms from redhat or Novell (SuSE), another option on Linux is Netperf . On other Unix platforms: "snoop -S -tr -s 64 -d hme0", AIX's topaz can show that as well.. Try to look for the peak (not average) usage and see if that is acceptably fast.

Remember that NIC bandwidth is measured in Mbps or Gbps (which is BITS per second) and output from above utilities can sometimes come in BYTES per second, so for comparison, do proper conversion (divide bps value by 8 to get bytes/sec; or, multiple bytes value by 8 to get bps value).

One simple/quick and not very recommended way is to look at output of "ifconfig eth0" and compare values of "RX bytes" and "TX bytes" over time this will show _average_ usage per period of time.

Additionally, you can't expect a network device to run at full capacity with 100% efficiency, due to concurrency, collisions and retransmits that happens more frequently as the utilization gets higher. If you are reaching high levels consider a faster interconnect or NIC bonding (multiple NICs all servicing the same IP address).

Finally, above is measuring bandwidth utilization (how much), not latency (how fast) of the interconnect, you may still be suffering from high latency connection (slow link) even though there is plenty of bandwidth to spare. Most experts agree that low latency is by far more important than a high bandwidth with respect to specifications of the private interconnect in RAC. Latency is best measured by the actual user of the network link (RAC in this case), review statspack for stats on latency. Also, in 10gR2 Grid Control you can view Global Cache Block Access Latency, you can also drill down to the Cluster Cache Coherency page to see the cluster

cache coherency metrics for the entire cluster database.

Keep in mind that RAC is using the private interconnect like it was never used before, to synchronize memory regions (SGAs) of multiple nodes (remember, since 9i, entire data blocks are shipped accross the interconnect), if the network is utilized at 50% bandwidth, this means that 50% of the time it is busy and not available to potential users. In this case delays (due to collisions and concurrency) will increase the latency even though the bandwidth might look "reasonable", it's hiding the real issue.

364. Does Database blocksize or tablespace blocksize affect how thedata is passed across the interconnect?

Oracle ships database block buffers, i.e. blocks in a tablespace configured for 16K will result in a 16K data buffer shipped, blocks residing in a tablespace with base block size (8K) will be shipped as base blocks and so on; the data buffers are broken down to packets of MTU sizes.

365. Does RAC work with NTP (Network Time Protocol)?

YES! NTP and Oracle RAC are compatible, as a matter of fact, it is recommended to setup NTP in an Oracle

RAC cluster, for Oracle 9i Database, Oracle Database 10g, and Oracle Database 11g Release 1.

With Oracle Database 11g Release 2, Oracle Clusterware includes the Cluster Time Synchronization Service (CTSS). On startup, Oracle Clusterware checks for a NTP configuration, if found, CTSS goes into Observer mode. This means it will monitor the clock synchronization and report in the Oracle Clusterware alert log if it finds a problem. If it does not find a NTP configuration, CTSS will be active. In active mode, CTSS synchronizes all the system clocks to the first node in the cluster.

The Oracle Clusterware requires the use of "-x" flag to the ntpd daemon to prevent the clock from going backwards (Enterprise Linux: see /etc/sysconfig/ntpd; Solaris: set "slewalways yes" in /etc/inet/ntp.conf)

"Node Time Requirements

Before starting the installation, ensure that each member node of the cluster is set as closely as possible to the same date and time. Oracle strongly recommends using the Network Time Protocol feature of most operating systems for this purpose, with all nodes using the same reference Network Time Protocol server."

Each machine has a different clock frequency and as a result a slightly different time drift. NTP computes this time drift every about 15 minutes, and stores this information in a "drift" file, it then adjusts the system clock based on this known drift as well as compares it to a given time-server the sys-admins sets up. This is the recommended approach.

Keep the following points in mind:

Minor changes in time (in the seconds range) are harmless for Oracle RAC and the Oracle Clusterware. If you intend on making large time changes it is best to shutdown the instances and the entire Oracle Clusterware stack on that node to avoid a false eviction, especially if you are using the Oracle RAC 10g lowbrownout patches, which allow really low misscount settings.

Backup/recovery aspect of large time changes are documented in Note: 77370.1, basically you can't use RECOVER DATABASE UNTIL TIME to reach the second recovery point, It is possible to overcome with RECOVER DATABASE UNTIL CANCEL or UNTIL CHANGE. If you are doing complete recovery (most of the times) then this is not an issue since the Oracle recovery code uses SCN (System Change Numbers) to advance in the redo/archive logs. The SCN numbers never go back in time (unless a reset-logs operation is performed), there is always an association of an SCN to a human readable timestamp (which may change forward or backwards), hence the issue with recovery until point in time vs. until SCN/Cancel.

If DBMS_SCHEDULER is in usage it will be affected by time changes, as it's using actual clock rather than SCN.

On platforms with OPROCD get fix for <> "OPROCD REBOOTS NODE WHEN TIME IS SET BACK BY XNTPD"

If NTP is not configured correctly (using -x flag), and diagwait not set to 13 Note: 559365.1 10.2/11.1 RAC systems can be rebooted due to OPROCD, during a leap second event, see Note: 759143.1.

Daylight saving time adjustments do not affect the system clock, only the displayed time, hence have no impact on the Oracle software.

Apart from these issues, the Oracle RDBMS server is immuned to time changes, i.e. will not affect transaction/read consistency operations.

366. How do I determine whether or not an OneOff patch is "rollingupgradeable"?

After you have downloaded a patch, you can go into the directory where you unpacked the patch:

>pwd

/ora/install/4933522

Then use the following OPatch command:

>opatch query -is_rolling

...

Query ...

Please enter the patch location:

/ora/install/4933522

---------- Query starts ------------------

Patch ID: 4933522

....

Rolling Patch: True.

---------- Query ends -------------------

367. I have 2 clusters named "crs" (the default), how do I get GridControl to recognize them as targets?

There are 2 options:

a) if the grid control agent install (which is a separate install) has already been done and has picked up the name of the cluster as it was configured as CRS, one can go to the EM console as is, and for the second, manually delete and rediscover the target. When you rediscover the target, give whatever display name you like

b) Prior to performing the Grid control agent install, just set CLUSTER_NAME environment variable and run the install. This variable need to be set only for that install session. No need to set it every time agent starts.

368. When I look at ALL_SERVICES view in my database I see services Idid not create, what are they for?

You will always see a default database service that is the same name as your database. This service is available on all instances in the cluster. You will also see two services used by the database SYS$BACKGROUND (for background processes) and SYS$USERS (users who connect via BEQ or without using a service_name). You may also see services that end with XDB

which are created for the XML DB feature and you will not be able to manage these services.

369. Can you have multiple RAC $ORACLE_HOME's on Linux?

No, there should be only one Oracle Cluster Manager (ORACM) running on each node. All RAC databases should run out of the $ORACLE_HOME that ORACM is installed in.

370. Is the hangcheck timer still needed with Oracle RAC 10g and 11g?

YES! The hangcheck-timer module monitors the Linux kernel for extended operating system hangs that could affect the reliability of the RAC node ( I/O fencing) and cause database corruption. To verify the hangcheck-timer module is running on every node:

as root user:

/sbin/lsmod | grep hangcheck

If the hangcheck-timer module is not listed enter the following command as the root user:

9i: /sbin/insmod hangcheck-timer hangcheck_tick=30 hangcheck_margin=180 hangcheck_reboot=1

10g & 11g: /sbin/insmod hangcheck-timer hangcheck_tick=1 hangcheck_margin=10 hangcheck_reboot=1

To ensure the module is loaded every time the system reboots, verify that the local system startup file (/etc/rc.d/rc.local) contains the command above.

371. Customer did not load the hangcheck-timer before installing RAC,Can the customer just load the hangcheck-timer ?

YES. hangcheck timer is a kernel module that is shipped with the Linux kernel, all you have to do is load it as follows:

9i: /sbin/insmod hangcheck-timer hangcheck_tick=30 hangcheck_margin=180 hangcheck_reboot=1

10g & 11g: /sbin/insmod hangcheck-timer hangcheck_tick=1 hangcheck_margin=10 hangcheck_reboot=1

No need to reboot the nodes.

372. When I try to login to the +ASM2 on node2 with asmcmd (aftersetting ORACLE_HOME and ORACLE_SID correctly) I get: ORA-01031: insufficient privileges (DBD ERROR: OCI SessionBegin). When I try to login to +ASM2 using sqlplus (connect / as sysdba) Iget the same ORA-01031: insufficient privileges. When I try to loginto +ASM2 using sqlplus (connect sys/passwd as sysdba) I get connected successfully.

This sounds like the ORA_DBA group on Node2 is empty, or else does not have the correct username in it. Double-check what user account you are using to logon to Node2 as ( a 'set' command will show you the USERNAME and USERDOMAIN values) and then make sure that this account is part of ORA_DBA.

The other issue to check is that SQLNET.AUTHENTICATION_SERVICES=(NTS) is set in the SQLNET.ORA

373. How to move the OCR location ?

For Oracle RAC 10g Release 1

- stop the CRS stack on all nodes using "init.crs stop"

- Edit /var/opt/oracle/ocr.loc on all nodes and set up ocrconfig_loc=new OCR device

- Restore from one of the automatic physical backups using ocrconfig -restore.

- Run ocrcheck to verify.

- reboot to restart the CRS stack.

For Oracle RAC 10g Release 2 or later Please use the OCR command to replace the OCR with the new location:

# ocrconfig -replace ocr /dev/newocr

# ocrconfig -replace ocrmirror /dev/newocrmirror

Manual editing of ocr.loc or equivalent is not recommended, and will not work.

374. Is it supported to rerun root.sh from the Oracle Clusterwareinstallation ?

Rerunning root.sh after the initial successful install of the Oracle Clusterware is expressly discouraged and unsupported. We strongly recommend not doing it.

In case where root.sh is failing to execute for the on an initial install (or a new node joining an existing cluster), it is OK to re-run root.sh after the cause of the failure is corrected (permissions, paths, etc.). In this case, please run rootdelete.sh to undo the local effects of root.sh before rerunning root.sh.

375. When ct run the command 'onsctl start' receives the message "Unable to open libhasgen10.so".

Any idea why the message "unable to open libhasgen10.so" ?

Most likely you are trying to start ONS from ORACLE_HOME instead of Oracle Clusterware (or Grid Infrastructure in 11.2) home. Please try to start it from the Oracle Clusterware home.

376. Voting Files stored in ASM - How many disks per disk group do I need?

If Voting Files are stored in ASM, the ASM disk group that hosts the Voting Files will place the appropriate number of Voting Files in accordance to the redundancy level. Once Voting Files are managed in ASM, a manual addition, deletion, or replacement of Voting Files will fail, since users are not allowed to manually manage Voting Files in ASM.

If the redundancy level of the disk group is set to "external", 1 Voting File is used.

If the redundancy level of the disk group is set to "normal", 3 Voting Files are used.

If the redundancy level of the disk group is set to "high", 5 Voting Files are used.

Note that Oracle Clusterware will store the disk within a disk group that holds the Voting Files. Oracle Clusterware does not rely on ASM to access the Voting Files.

In addition, note that there can be only one Voting File per failure group. In the above list of rules, it is assumed that each disk that is supposed to hold a Voting File resides in its own, dedicated failure group.

In other words, a disk group that is supposed to hold the above mentioned number of Voting Files needs to have the respective number of failure groups with at least one disk. (1 / 3 / 5 failure groups with at least one disk)

Consequently, a normal redundancy ASM disk group, which is supposed to hold Voting Files, requires 3 disks in separate failure groups, while a normal redundancy ASM disk group that is not used to store Voting Files requires only 2 disks in separate failure groups.

377. OCR stored in ASM - What happens, if my ASM instance fails on a node?

If an ASM instance fails on any node, the OCR becomes unavailable on this particular node, but the node remains operational.

If the (RAC) databases use ASM, too, they cannot access their data on this node anymore during the time the ASM instance is down. If a RAC database is used, access to the same data can be established from another node.

If the CRSD process running on the node affected by the ASM instance failure is the OCR writer, AND the majority of the OCR locations is stored in ASM, AND an IO is attempted on the OCR during the time the ASM instance is down on this node, THEN CRSD stops and becomes inoperable. Hence cluster management is affected on this particular node.

Under no circumstances will the failure of one ASM instance on one node affect the whole cluster

378. Can I change the public hostname in my Oracle Database 10gCluster using Oracle Clusterware? OR

Can I change a node’s hostname?

Hostname changes are not supported in Oracle Clusterware (CRS), unless you want to perform a deletenode followed by a new addnode operation. The hostname is used to store among other things the flag files and Oracle Clusterware stack will not start if hostname is changed.

379. Which processes access the OCR ?

Oracle Cluster Registry (OCR) is used to store the cluster configuration information among otherthings. OCR needs to be accessible from all nodes in the cluster. If OCR became inaccessible the CSS daemon would soon fail, and take down the node. PMON never needs to write to OCR. To confirm if OCR is accessible, try ocrcheck from your ORACLE_HOME and ORA_CRS_HOME.

380. How do I restore OCR from a backup? On Windows, can I useocopy? OR

If you lost OCR, how do you Restore?
The only recommended way to restore an OCR from a backup is "ocrconfig -restore ". The ocopy command will not be able to perform the restore action for OCR.

381. How to Restore a Lost Voting Disk

As long as you can confirm via the CSS daemon logfile that it thinks the voting disk is bad, you can restore the voting disk from backup while the cluster is online. This is the backup that you took with dd (by the manual's request) after the most recent addnode, deletenode, or install operation. If by accident you restore a voting disk that the CSS daemon thinks is NOT bad, then the entire cluster will probably go down.

crsctl add css votedisk - adds a new voting disk

crsctl delete css votedisk - removes a voting disk

Note: the cluster has to be down. You can also restore the backup via dd when the cluster is down.

382. Why is the home for Oracle Clusterware not recommended to be subdirectory of the Oracle base directory?

If anyone other than root has write permissions to the parent directories of the Oracle Clusterware home / Oracle Grid Infrastructure for a Cluster home, then they can give themselves root escalations. This is a security issue.

Consequenely, it is strongly recommended to place the Oracle Grid Infrastructure / Oracle Clusterware home outside of the Oracle Base. The Oracle Universal Installer will confirm deviating settings during the Oracle Grid Infrastructure 11g Release 2 and later installation.

The Oracle Clusterware home itself is a mix of root and non-root permissions, as appropriate to the security requirements. Please, follow the installation guides regarding OS users and groups and how to structure the Oracle software installations on a given system.

383. What are the IP requirements for the private interconnect?

The install guide will tell you the following requirements private IP address must satisfy the following requirements:

1. Must be separate from the public network

2. Must be accessible on the same network interface on each node

3. Must have a unique address on each node

4. Must be specified in the /etc/hosts file on each node

The Best Pratices recommendation is to use the TCP/IP standard for non-routeable networks.

Reserved address ranges for private (non-routed) use (see TCP/IP RFC 1918):

* 10.0.0.0 -> 10.255.255.255

* 172.16.0.0 -> 172.31.255.255

* 192.168.0.0 -> 192.168.255.255

Cluvfy will give you an error if you do not have your private interconnect in the ranges above.

You should not ignore this error. If you are using an IP address in the range used for the public

network for the private network interfaces, you are pretty much messing up the IP addressing,

and possibly the routing tables, for the rest of the corporation. IP addresses are a sparse commodity, use them wisely. If you use them on a non-routable network, there is nothing to prevent someone else to go and use them in the normal corporate network, and then when those RAC nodes find out that there is another path to that address range (through RIP), they just might start sending traffic to those other IP addresses instead of the interconnect. This is just a bad idea.

384. Does Oracle Clusterware have to be the same or higher releasethan all instances running on the cluster?

Yes - Oracle Clusterware must be the same or a higher release with regards to the RDBMS or ASM Homes.

385. How much I/O activity should the voting disk have?

Approximately 2 read + 1 write per second per node.

386. I made a mistake when I created the VIP during the install of OracleClusterware, can I change the VIP?

Yes The details of how to do this are described in Metalink Note.276434.1

387. I have a 2-node RAC running. I notice that it is always node2 that isevicted when I test private network failure scenario bydisconnecting the private network cable. Doesn't matter whether it

is node1's or node2's private network cable that is disconnected, itis always the node2 that is evicted. What happens in a 3-nodesRAC cluster if node1's cable is disconnected?

The node with the lower node number will survive(The first node to join the cluster). In case of 3 nodes, 2 nodes will survive and the one you pulled the cable will go away. 4 nodes - the sub cluster with the lower node number will survive.

388. Can I configure a firewall (iptables) on the cluster interconnect?

Disable all firewalls on the cluster interconnect. See note: 554781.1 for details

389. Can I use ASM to mirror Oracle data in an extended RACenvironment?

This support is for 10gR2 onwards and has the following limitations:

1. As in any extended RAC environments, the additional latency induced by distance will affect I/O and cache fusion performance. This effect will vary by distance and the customer isresponsible for ensuring that the impact attained in their environment is acceptable for theirapplication.

2. OCR must be mirrored across both sites using Oracle provided mechanisms.

3. Voting Disk redundancy must exists across both sites, and at a 3rd site to act as an arbitrage. This third site may be via a WAN.

4. Storage at each site much be setup as seperate failure groups and use ASM mirroring, to ensure at least one copy of the data at each site.

5. Customer must have a seperate and dedicated test cluster also in an extended configuration setup using the same software and hardware components (can be fewer or smaller nodes).

6. Customer must be aware that in 10gR2 ASM does not provide partial resilvering. Should a loss of connectivity between the sites occur, one of the failure groups will be marked invalid. When the site rejoins the cluster, the failure groups will need to be manually dropped and added.

390. What is a stage?

CVU supports the notion of Stage verification. It identifies all the important stages in RAC deployment and provides each stage with its own entry and exit criteria. The entry criteria for a stage define a specific set of verification tasks to be performed before initiating that stage. This pre-check saves the user from entering into a stage unless its pre-requisite conditions are met. The exit criteria for a stage define another specific set of verification tasks to be performed after completion of the stage. The post-check ensures that the activities for that stage have been completed successfully. It identifies any stage specific problem before it propagates tosubsequent stages; thus making it difficult to find its root cause. An example of a stage is "precheckof database installation", which checks whether the system meets the criteria for RACinstall.

391. What is a component?

CVU supports the notion of Component verification. The verifications in this category are not associated with any specific stage. The user can verify the correctness of a specific clustercomponent. A component can range from a basic one, like free disk space to a complex one likeCRS Stack. The integrity check for CRS stack will transparently span over verification of multiplesub-components associated with CRS stack. This encapsulation of a set of tasks within specificcomponent verification should be of a great ease to the user.

392. What is nodelist?

A nodelist is a comma separated list of hostnames without domain. Cluvfy will run the requested verification on all nodes in the nodelist provided. Cluvfy will ignore any domain while processing the nodelist. If duplicate entities after removing the domain exist, cluvfy will eliminate the duplicate names while processing. Wherever supported, you can use '-n all' to check on all the cluster nodes. Check "Do I have to type the nodelist every time for the CVU commands? Is there any shortcut?" for more information on nodelist and shortcuts.

393. What is a configuration file?

CVU supports a configuration file called cvu_config under CV_HOME/cv/admin folder. This file supports property-value style preferences in a persistent way. This might vary depending upon the platform. Here is a brief description of some of those properties:

CV_ORACLE_RELEASE: This property can take a value of the Oracle release that should be assumed when -r option is not specified in the command line. The valid values that can be set are 10gR1, 10gR2, 11gR1 or 11gR2. If this property is not set then the default is 11gR2.

CV_NODE_ALL: This property stores a comma separated list of nodes to be used for all the nodes in the cluster. This value will be used for "-n all" argument on the command line.

CV_RAW_CHECK_ENABLED: If this property is set to TRUE, then CVU will perform scsi disk discovery and sharedness checks. For Linux platforms, CVU requires the cvuqdisk rpm installed on all nodes if this property is set.

CV_ASSUME_DISTID: This property is used in cases where CVU can not detect or support a particular platform or a distribution. It is not recommend to change this property as this might render CVU non-functional.

CV_XCHK_FOR_SSH_ENABLED: If this property is set to TRUE, CVU will also check whether X-Windows is configured with SSH for user equivalence.

ORACLE_SRVM_REMOTESHELL: This property stores alternative remote shell command location.

ORACLE_SRVM_REMOTECOPY: This property stores alternative remote copy command location.

CV_ASSUME_CL_VERSION: By default, command line parser uses CRS active version for the display of command line syntax usage and syntax validation, use this property to pass a version other than CRS active version for command line syntax display and validation.

CV_TRACELOC: Use this property to choose the location in which CVU generates the trace files, set it to the absolute path of the desired trace directory.

394. Do I have to be root to use CVU?

No. CVU is intended for database and system administrators. CVU assumes the current user as oracle user.

395. What about discovery? Does CVU discover installed cponents?

At present, CVU's discovery is limited to the following components. CVU discovers available network interfaces if you do not specify any interface in its command line. For storage related verification, CVU discovers all the supported storage types if you do not specify a particular storage. CVU discovers CRS HOME if one is available. CVU also discovers the statically configured nodelist for the cluster if an Oracle supported vendor clusterware or Oracle Clusterware is available.

396. What about locale? Does CVU support other languages?

Yes. CVU complies to Oracle's NLS guidelines and supports locale.

397. What version of Oracle Clusterware or RAC is supported by CVU?

On Linux x86 and x86_64: The current CVU release supports Oracle Clusterware, Oracle RAC 10g, and Oracle RAC 11g. In other words, "the current version" of CVU can check 10g as well as 11g releases of Oracle Clusterware or RAC. However, it can not check or verify pre-10g products.

On Solaris SPARC64, AIX, HPUX (PARISC and IA64): CVU is limitedly backward compatible to the previous Oracle Clusterware releases up to Oracle Database 10g Release 1. It works on the operating system versions supported by 11gR2, that would be Solaris 9, Solaris 10, AIX 5.3, AIX 6.1, HPUX 11.23 and HPUX 11.31 only.

398. What are the requirements for CVU?

CVU requires:

1.An area with at least 200MB on Linux x86, 285MB on Linux x86_64, 300MB on Solaris SPARC64 and Solaris x64, 158MB on AIX, 160MB on HPUX IA64 and 160MB HPUX PARISC of free space for containing software bits on the invocation node.

2.A work directory with at least 5MB on all the nodes. CVU will attempt to copy the necessary bits as required to this location. Make sure, the location exists on all nodes and it has write permission for CVU user. This directory is set through the CV_DESTLOC environment variable. If this variable is not set, CVU will use the common temporary location such as "/tmp" for Linux and "C:\Temp" for Windows as the work dir.

3.An optional package 'cvuqdisk' is required on all the nodes for Linux distributions. This assists CVU in finding scsi disks and helps CVU to perform storage checks on disks. Please refer to What is 'cvuqdisk' rpm? for detail. Note that, this package should be installed only on RedHat Linux

4(or higher), or Enterprise Linux 4(or higher) or SuSE 9(or higher) distribution or on other Linux flavors of comparable versions.

399. How do I install CVU from OTN?

Here is how one can install CVU from a zip file(cvupack_<platform>.zip) downloaded from OTN:

1. Create a CV home( say /home/username/mycvhome ) directory. It should have at least 35M of free disk space.

2. cd /home/username/mycvhome

3. copy the cvupack_<platform>.zip file to /home/username/mycvhome

4. unzip the file: > unzip cvupack<platform>.zip

5. (Optional) Set the environmental variable CV_DESTLOC. This should point to a writable area on *all* nodes. When invoked, the tool will attempt to copy the necessary bits as required to this location. Make sure the location exists on all nodes and it has write permission for CVU user. It is strongly recommended that you should set this variable. If this variable has not been set, CVU will use "/tmp" as the default.

> setenv CV_DESTLOC /tmp/cvu_temp

To verify, run cluvfy from <CV Home>/bin directory (typically /home/username/mycvhome/bin/cluvfy). This should show the usage.

For Linux platforms, an optional rpm package 'cvuqdisk' is required on all the nodes. Please refer to How do I install 'cvuqdisk' package?

400. What is 'cvuqdisk' rpm? Why should I install this rpm?

cvuqdisk is applicable on Linux platforms only. CVU requires root privilege to gather information about the scsi disks during discovery. A small binary uses the setuid mechanism to query disk information as root. Note that this process is purely a read-only process with no adverse impact on the system. To make this secured, this binary is packaged in the cvuqdisk rpm and need root privilege to install on a machine.

When this package is installed on all the nodes, CVU performs discovery and shared storage accessibility checks for scsi disks. Otherwise, it complains about the missing package 'cvuqdisk'. You can disable the scsi device check feature by setting the CV_RAW_CHECK_ENABLED to FALSE in $CV_HOME/cv/admin/cvu_config file. CVU will not complain about the missing rpm if this variable is set to false.

401. How do I install 'cvuqdisk' package?

Here are the steps to install cvuqdisk package.

1. Become root user

2. Copy the rpm ( cvuqdisk-1.0.7-1.rpm or the latest version ) to a local directory. You can find the rpm in <CV-HOME>/rpm directory where <CVHOME> is the directory in which you have installed CVU from OTN.

3.Set the environment variable to a group, who should own this binary. Typically it is the "dba" group.

export CVUQDISK_GRP=dba

4. Erase any existing package

rpm -e cvuqdisk

5. Install the rpm

rpm -iv cvuqdisk-1.0.7-1.rpm

6. Verify the package

rpm -qa | grep cvuqdisk

402. How do I know about cluvfy commands? The usage text of cluvfy does not show individual commands.

Cluvfy has context sensitive help built into it. Cluvfy shows the most appropriate usage text based on the cluvfy command line arguments. If you type 'cluvfy' on the command prompt, cluvfy displays the high level generic usage text, which talks about valid stage and component syntax. If you type 'cluvfy comp -list', cluvfy will show valid components with brief description on each of them. If you type 'cluvfy comp -help', cluvfy will show detail syntax for each of the valid components. Similarly, 'cluvfy stage -list' and 'cluvfy stage -help' will list valid stages and their syntax respectively.

If you type an invalid command, cluvfy will show the appropriate usage for thatparticular command. For example, if you type 'cluvfy stage -pre dbinst', cluvfy will show the syntax for pre-check of dbinst stage.

403. What are the default values for the command line arguments?

Here are the default values and behavior for different stage and component commands:

For component nodecon: If no -i arguments is provided, then cluvfy runs in the discovery mode.

For component nodereach: If no -srcnode is provided, then the local(node of invocation) will be used as the source node.

For components ssa:

If no -n argument is provided, then the local node will be used.

If no -s argument is provided, then cluvfy runs in the storage discovery mode.

If no -t argument is provided, then the device is assumed to be used for oracle data files.

For components clu:

If no -n argument is provided, then all the nodes in the cluster will be used for verification.

For components cfs, space, clu, clumgr, ocr, crs, nodeapp, asm, gpnp, gns, ohasd, clocksync :

If no -n argument is provided, then the local node will be used.

For components sys :

If no -n argument is provided, then the local node will be used.

If no -r argument is provided, then 11gR2 will be used.

If no -osdba argument is provided, then 'dba' will be used.

If no -orainv argument is provided, then 'oinstall' will be used.

If -fixup argument is provided, but -fixupdir argument is not provided, fixup files will be generated in CVU's work directory.

For components admprv:

If no -n argument is provided, then the local node will be used.

If no -osdba argument is provided, then 'dba' will be used.

If no -orainv argument is provided, then 'oinstall' will be used.

If -fixup argument is provided, but -fixupdir argument is not provided, fixup files will be generated in CVU's work directory.

For component peer:

If no -r argument is provided, then 11gR2 will be used.

If no -osdba argument is provided, then 'dba' will be used.

If no -orainv argument is provided, then 'oinstall' will be used.

For component software:

If no -n argument is provided, then the local node will be used.

If no -d argument is provided, then crs home will be discovered and the files for crs will be verified.

For component acfs:

If no -n argument is provided, then the local node will be used.

For stage -post hwos:

If no -s argument is provided, then cluvfy runs in the discovery mode.

For stage -pre crsinst:

If no -r argument is provided, then 11gR2 will be used.

If no -c argument is provided, then cluvfy will skip OCR related checks.

If no -q argument is provided, then cluvfy will skip voting disk related checks.

If no -osdba argument is provided, then 'dba' will be used.

If no -orainv argument is provided, then 'oinstall' will be used.

If -fixup argument is provided, but -fixupdir argument is not provided, fixup files will be generated in CVU's work directory.

For stage -pre dbinst:

If no -r argument is provided, then 11gR2 will be used.

If no -osdba argument is provided, then 'dba' will be used.

If -fixup argument is provided, but -fixupdir argument is not provided, fixup files will be generated in CVU's work directory.

For stage -pre dbcfg:

If -fixup argument is provided, but -fixupdir argument is not provided, fixup files will be generated in CVU's work directory.

For stage -pre acfscfg:

If no -asmdev argument is provided, default discovery string will be used to discover ASM devices.

For stage -pre hacfg:

If no -osdba argument is provided, then 'dba' will be used.

If no -orainv argument is provided, then 'oinstall' will be used.

If -fixup argument is provided, but -fixupdir argument is not provided, fixup files will be generated in CVU's work directory.

For stage -pre nodeadd:

If -fixup argument is provided, but -fixupdir argument is not provided, fixup files will be generated in CVU's work directory.

NOTE: For each verification command that supports the optional -r option to specify the supported Oracle release, the default release is assumed to be

11gR2 if -r option is not specified. To perform verifications for any previous release, ë-r 10gR1í or ë-r 10gR2í or ë-r 11gR1í must be specified. If the verifications are to be performed for a specific release earlier than 11gR1 then use of -r option can be avoided by setting the intended release value (10gR1 or 10gR2 or 11gR1) for CV_ORACLE_RELEASE property in CVU's configuration file (located under <CVU installation root dir>/cv/admin directory).

404. Do I have to type the nodelist every time for the CVU commands? Is there any shortcut?

You do not have to type the nodelist every time for the CVU commands. Typing the nodelist for a large cluster is painful and error prone. Here are few shortcuts.

To provide all the nodes of the cluster, type '-n all'. Cluvfy will attempt to get the nodelist in the following order:

1. If a vendor clusterware is available, it will pick all the configured nodes from the vendor clusterware using lsnodes utility.

2. If CRS is installed, it will pick all the configured nodes from Oracle clusterware using olsnodes utility.

3. It will look for the CV_NODE_ALL property in the cvu_config file under $CV_HOME/cv/admin.

4. If none of the above, it will look for the CV_NODE_ALL environmental variable.

5. Otherwise, it will complain.

To provide a partial list(some of the nodes of the cluster) of nodes, you can set an environmental variable and use it in the CVU command. For example:

export MYNODES=node1,node3,node5

cluvfy comp nodecon -n $MYNODES

405. How do I get detail output of a check?

Cluvfy supports a verbose feature. By default, cluvfy reports in non-verbose mode and just reports the summary of a test. To get detailed output of a check, use the flag '-verbose' in the command line. This will produce detail output of individual checks and where applicable will show per-node result in a tabular fashion.

406. How do I check network or node connectivity related issues?

Use component verifications commands like 'nodereach' or 'nodecon' for this purpose. For detail syntax of these commands, type cluvfy comp -help on the command prompt. If the 'cluvfy comp nodecon' command is invoked without -i, cluvfy will attempt to discover all the available interfaces and the corresponding IP address & subnet. Then cluvfy will try to verify the node connectivity per

subnet. You can run this command in verbose mode to find out the mappings between theinterfaces, IP addresses and subnets.

Cluvfy will suggest interfaces for VIP and private interconnect if suitable interfaces are available.You can check the connectivity among the nodes byspecifying the interface name(s) through -i argument.

407. Can CVU fix something in the system?

Yes, CVU supports the functionality of fixing up several system parameters that do not meet the requirements. Wherever applicable, the argument '-fixup' can be specified in command line to request generation of fixup scripts. When this argument is specified, CVU auto generates fixup scripts containing appropriate values for those parameters that need to be fixed and their fix is supported by CVU. Instructions are provided at the end of cluvfy command execution results to execute the mentioned fixup script under root privileges.

408. How do I check whether OCFS or OCFS2 is properly configured?

OCFS or OCFS2 is applicable on Linux platforms only. You can use the component command 'cluvfy comp cfs' to check this. Provide the OCFS or OCFS2 file system you want to check through the -f argument. Note that, the sharedness check for the OCFS file system is supported for OCFS version 1.0.14 or higher.

409. Can I check if the storage is shared among the nodes?

Yes, you can use 'comp ssa' command to check the sharedness of the storage.

410. How do I check user accounts and administrative permissions related issues?

Use admprv component verification command. Refer to the usage text for detail instruction and type of supported operations. To check whether the privilege is sufficient for user equivalence, use '-o user_equiv' argument. You can force CVU to check user equivalence using SSH only by the '- sshonly' flag. Similarly, the '-o crs_inst' will verify whether the user has the correct permissions for installing Oracle Clusterware. The '-o db_inst' will check for permissions required for installing RAC and '-o db_config' will check for permissions required for creating a RAC database or modifying a RAC database configuration.

411. How do I check if SSH is configured properly on my cluster?

You can use CVU's admprv component verification command 'comp admprv -n <nodelist> -o user_equiv -sshonly -verbose' to verify this. To check whether X-Windows is configured to work with SSH for user equivalence as per Oracle's requirement, set the following property "CV_XCHK_FOR_SSH_ENABLED=TRUE" in the $CV_HOME/cv/admin/cvu_config file.

412. How do I check minimal system requirements on the nodes?

The component verification command sys is meant for that. Note that, CVU can check the minimum system requirements for Oracle Clusterware versions 10gR1, 10gR2, 11gR1 and 11gR2. Use the '-p crs' argument to check requirements for Oracle Clusterware and -r argument for the desired version.

413. Is there a way to compare nodes?

You can use the peer comparison feature of cluvfy for this purpose. The command 'comp peer' will list the values of different nodes for several pre-selected properties. You can use the peer command with -refnode argument to compare those properties of other nodes against the reference node.

414. Why the peer comparison with -refnode says matched when the group or user does not exist?

Peer comparison with the -refnode feature acts like a baseline feature. It compares the system properties of other nodes against the reference node. If the value does not match( not equal to reference node value ), then it flags that as a deviation from the reference node. If a group or user does not exist on reference node as well as on the other node, it will report this as 'matched' since there is no deviation from the reference node. Similarly, it will report as 'mismatched' for a node with higher total memory than the reference node for the above reason.

415. At what point cluvfy is usable? Can I use cluvfy before installingOracle Clusterware?

You can run cluvfy at any time, even before CRS installation. In fact, cluvfy is designed to assist the user as soon as the hardware and OS is up. If you invoke a command which requires CRS or RAC on local node, cluvfy will report an error if those required products are not yet installed.

416. How do I turn on tracing?

Set the environmental variable SRVM_TRACE to true. For example, in tcsh "setenv SRVM_TRACE true" will turn on tracing.

The tracing is turned on by default. Set the environmental variable SRVM_TRACE to false if you do not want tracing. For example, in bash "export SRVM_TRACE=false" or ìexport SRVM_TRACE=FALSEî will switch off tracing.

Also it may help to run cluvfy with -verbose attribute

$script run.log

$export SRVM_TRACE=TRUE

$cluvfy -blah -verbose

$exit

417. Where can I find the CVU trace files?

CVU log files can be found under $CV_HOME/cv/log directory. The log files are automatically rotated and the latest log file has the

name cvutrace.log.0. It is a good idea to clean up unwanted log files or archive them to reclaim disk place. If you want the trace files to be generated in a location other than $CV_HOME/cv/log then set the environment variable CV_TRACELOC to the desired location of your choice. In recent releases, CVU trace files are generated by default. Setting SRVM_TRACE=false before invoking cluvfy disables the trace generation for that invocation

418. Why cluvfy reports "unknown" on a particular node?

Cluvfy reports unknown when it can not conclude for sure if the check passed or failed. A common cause of this type of reporting is a non-existent location set for the CV_DESTLOC variable. Please make sure the directory pointed by this variable exists on all nodes and is writable by the user.

419. Why does CVU complain "WARNING: Could not find a suitable set of interfaces for VIPs"?

CVU checks for the following criteria before considering a set of interfaces for VIP:

-- the interfaces should have the same name across nodes

-- they should belong to the same subnet

-- they should have the same netmask

-- they should be on public(and routable) network.

Oftentimes, the interfaces planned for the VIPs are configured on 10.*, 172.16.* - 172.31.* or 192.168.* networks, which are not routable. Hence CVU does not consider them as suitable for VIPs. If none of the available interfaces satisfy this criteria, CVU complains "WARNING: Could not find a suitable set of interfaces for VIPs.". It is worth noting that, such addresses will actually work if they're public, but CVU just thinks they're private and reports accordingly.

420. What OS versions and distributions are supported?

CVU is supported on all the OS versions and distributions on which Oracle Clusterware 10g and 11g are supported.

Manoj Simar - The Technology Expert

About Me

Monday, 2 July 2018

Interview Q and A for Oracle RAC Part - 6

No comments:

Post a Comment