Manoj Simar - The Technology Expert: Interview Q and A for Oracle RAC Part

51. When I run 10.2 CLUVFY on a system where RAC 10g Release 1 is running I get following output:

Package existence check failed for "SUNWscucm:3.1".

Package existence check failed for "SUNWudlmr:3.1".

Package existence check failed for "SUNWudlm:3.1".

Package existence check failed for "ORCLudlm:Dev_Release_06/11/04,_64bit_3.3.4.8_reentrant".

Package existence check failed for "SUNWscr:3.1".

Package existence check failed for "SUNWscu:3.1".

Checking this Solaris system I don't see those packages installed. Can I continue my install?

Note that cluvfy checks all possible prerequisites and tells you whether your system passes the check or not. You can then cross reference with the install guide to see if the checks that failed are required for your type of installation. It the above case, if you are not planning on using Sun Cluster, then you can continue the install. The checks that failed are the checks for Sun Cluster required packages and are not needed on your cluster. As long as everything else checks out successfully, you can continue.

52. Why is validateUserEquiv failing during install (or cluvfy run)?

SSH must be set up as per the pre-installation tasks. It is also necessary to have file permissions set as described below for features such as Public Key Authorization to work. If your permissions are not correct, public key authentication will fail, and will fallback to password authentication with no helpful message as to why. The following server configuration files and/or directories must be owned by the account owner or by root and GROUP and WORLD WRITE permission must be disabled.

$HOME

$HOME/.rhosts

$HOME/.shosts

$HOME/.ssh

$HOME/.ssh.authorized-keys

$HOME/.ssh/authorized-keys2 #Openssh specific for ssh2 protocol.

SSH (from OUI) will also fail if you have not connected to each machine in your cluster as per the note in the installation guide:

The first time you use SSH to connect to a node from a particular system, you may see a message similar to the following:

The authenticity of host 'node1 (140.87.152.153)' can't be established. RSA key fingerprint is 7z:ez:e7:f6:f4:f2:4f:8f:9z:79:85:62:20:90:92:z9.

Are you sure you want to continue connecting (yes/no)?

Enter |yes| at the prompt to continue. You should not see this message again when you connect from this system to that node. Answering yes to this question causes an entry to be added to a "known-hosts" file in the .ssh directory which is why subsequent connection requests do not re-ask. This is known to work on Solaris and Linux but may work on other platforms as well.

53. Can I use ASM to mirror Oracle data in an extended RAC environment?

This support is for 10gR2 onwards and has the following limitations:

1. As in any extended RAC environments, the additional latency induced by distance will affect I/O and cache fusion performance. This effect will vary by distance and the customer is responsible for ensuring that the impact attained in their environment is acceptable for their application.

2. OCR must be mirrored across both sites using Oracle provided mechanisms.

3. Voting Disk redundancy must exists across both sites, and at a 3rd site to act as an arbitrage. This third site may be via a WAN.

4. Storage at each site much be setup as seperate failure groups and use ASM mirroring, to ensure at least one copy of the data at each site.

5. Customer must have a seperate and dedicated test cluster also in an extended configuration setup using the same software and hardware components (can be fewer or smaller nodes).

6. Customer must be aware that in 10gR2 ASM does not provide partial resilvering. Should a loss of connectivity between the sites occur, one of the failure groups will be marked invalid. When the site rejoins the cluster, the failure groups will need to be manually dropped and added.

54. How can I register the listener with Oracle Clusterware in RAC 10g Release 2?

NetCA is the only tool that configures listener and you should be always using it. It will register the listener with Oracle Clusterware. There are no other supported alternatives.

55. Can I use ASM as mechanism to mirror the data in an Extended RAC cluster?

Yes, but it cannot replicate everything that needs replication. ASM works well to replicate any object you can put in ASM. But you cannot put the OCR or Voting Disk in ASM.

In 10gR1 they can either be mirrored using a different mechanism (which could then be used instead of ASM) or the OCR needs to be restored from backup and the Voting Disk can be recreated. In the future we are looking at providing Oracle redundancy for both.

56. How should voting disks be implemented in an extended cluster environment? Can I use standard NFS for the third site voting disk?

http://www.oracle.com/technology/products/database/clustering/pdf/thirdvoteonnfs.pdf Standard NFS is only supported for the tie-breaking voting disk in an extended cluster environment. See platform and mount option restrictions at: http://www.oracle.com/technology/products/database/clustering/pdf/thirdvoteonnfs.pdf Otherwise just as with database files, we only support voting files on certified NAS devices, with the appropriate mount options. Pls refer to Metalink Note 359515.1 for a full description of the required mount options. For a complete list of supported NAS vendors refer to OTN at:

http://www.oracle.com/technology/deploy/availability/htdocs/vendors_nfs.html

57. What are the network requirements for an extended RAC cluster?

Interconnect, SAN, and IP Networking need to be kept on separate channels, each with required redundancy. Redundant connections must not share the same Dark Fiber (if used), switch, path, or even building entrances. Keep in mind that cables can be cut.

The SAN and Interconnect connections need to be on dedicated point-to-point connections. No WAN or Shared connection allowed. Traditional cables are limited to about 10 km if you are to avoid using repeaters. Dark Fiber networks allow the communication to occur without repeaters. Since latency is limited, Dark Fiber networks allow for a greater distance in separation between the nodes. The disadvantage of Dark Fiber networks are they can cost hundreds of thousands of dollars, so generally they are only an option if they already exist between the two sites.

If direct connections are used (for short distances) this is generally done by just stringing long cables from a switch. If a DWDM or CWDM is used then then these are directly connected via a dedicated switch on either side.

Note of caution: Do not do RAC Interconnect over a WAN. This is a the same as doing it over the public network which is not supported and other uses of the network (i.e. large FTPs) can cause performance degradations or even node evictions.

For SAN networks make sure you are using SAN buffer credits if the distance is over 10km.

If Oracle Clusterware is being used, we also require that a single subnet be setup for the public connections so we can fail over VIPs from one side to another.

58. Can a customer use SE RAC to implement an "Extended RAC Cluster" ?

YES. Effective with 11g Rel.1 the former restriction to have all nodes co-located in one room when using SE RAC has been lifted. Customers can now use SE RAC clusters in extended environments. However, other SE RAC restrictions still apply (e.g. compulsory usage of ASM, no third party cluster nor volume manager must be installed).

59. What is the maximum distance between nodes in an extended RAC environment?

The high impact of latency create practical limitations as to where this architecture can be deployed. While there is not fixed distance limitation, the additional latency on round trip on I/O and a one way cache fusion will have an affect on performance as distance increases. For example tests at 100km showed a 3-4 ms impact on I/O and 1 ms impact on cache fusion, thus the farther distance is the greater the impact on performance. This architecture fits best where the 2 datacenters are relatively close (<~25km) and the impact is negligible. Most customers implement under this distance w/ only a handful above and the farthest known

example is at 100km. Largest distances than the commonly implemented may want to estimate or measure the performance hit on their application before implementing. Due ensure a proper setup of SAN buffer credits to limit the impact of distance at the I/O layer.

60. How is the voting disk used by Oracle Clusterware?

The voting disk is accessed exclusively by CSS (one of the Oracle Clusterware daemons). This is totally different from a database file. The database looks at the database files and interacts with the CSS daemon (at a significantly higher level conceptually than any notion of "voting disk").

"Non-synchronized access" (i.e. database corruption) is prevented by ensuring that the remote node is down before reassigning its locks. The voting disk, network, and the control file are used to determine when a remote node is down, in different, parallel, indepdendent ways that allow each to provide additional protection compared to the other. The algorithms used for each of these three things are quite different.

As far as voting disks are concerned, a node must be able to access strictly more than half of the voting disks at any time. So if you want to be able to tolerate a failure of n voting disks, you must have at least 2n+1 configured. (n=1 means 3 voting disks). You can configure up to 32 voting disks, providing protection against 15 simultaneous disk failures, however it's unlikely that any customer would have enough disk systems with statistically independent failure characteristics that such a configuration is meaningful. At any rate, configuring multiple voting disks increases the system's tolerance of disk failures (i.e. increases reliability).

Configuring a smaller number of voting disks on some kind of RAID system can allow a customer to use some other means of reliability than the CSS's multiple voting disk mechanisms. However there seem to be quite a few RAID systems that decide that 30-60 second (or 45 minutes in the case of veritas) IO latencies are acceptable. However we have to wait for at least the longest IO latency before we can declare a node dead and allow the database to reassign database blocks. So while using an independent RAID system for the voting disk may appear appealing, sometimes there are failover latency consequenecs

61. Does Oracle Clusterware support application vips?

Yes, with Oracle Database 10g Release 2, Oracle Clusterware now supports an "application" vip. This is to support putting applications under the control of Oracle Clusterware using the new high availability API and allow the user to use the same URL or connection string regardless of which node in the cluster the application is running on. The application vip is a new resource defined to Oracle Clusterware and is a functional vip. It is defined as a dependent resource to the application. There can be many vips defined, typically one per user application under the control of Oracle Clusterware. You must first create a profile (crs_profile), then register it with Oracle Clusterware (crs_register). The usrvip script must run as root.

62. How do I put my application under the control of Oracle Clusterware to achieve higher availability?

First write a control agent. It must accept 3 different parameters: start-The control agent should start the application, check-The control agent should check the application, stop-The Control agent should start the application. Secondly you must create a profile for your application using crs_profile. Thirdly you must register your application as a resource with Oracle Clusterware (crs_register).

63. Can I set up failover of the VIP to another card in the same machine or what do I do if I have different network interfaces on different nodes in my cluster (I.E. eth0 on node1,2 and eth1 on node 3,4)?

With srvctl, you can modify the nodeapp for the VIP to list the NICs it can use. Then VIP will try to start on eth0 interface and if it fails, try eth1 interface.

./srvctl modify nodeapps -n -A //eth0\|eth1

Note how the interfaces are a list separated by the ‘|’ symbol and how you need to quote this with a ‘\’ character or the Unix shell will interpret the character as a ‘pipe’. So on a node called ukdh364 with a VIP address of ukdh364vip and we want a netmask (say) of 255.255.255.0 then we have:

./srvctl modify nodeapps -n ukdh364 -A ukdh364vip/255.255.255.0/eth0\|eth1

To check which interfaces are configured as public or private use oifcfg getif

example output:

eth0 138.2.238.0 global public

eth1 138.2.240.0 global public

eth2 138.2.236.0 global cluster_interconnect

An ifconfig on your machine will show what the hardware names for the interface cards installed.

64. How do I identify the voting file location ?

Run the following command from /bin

"crsctl query css votedisk"

65. Is it supported to allow 3rd Party Clusterware to manage Oracle resources (instances, listeners,

etc) and turn off Oracle Clusterware management of these?

In 10g we do not support using 3rd Party Clusterware for failover and restart of Oracle resources. Oracle Clusterware resources should not be disabled.

66. What is the High Availability API?

An application-programming interface to allow processes to be put under the High Availability infrastructure that is part of the Oracle Clusterware distributed with Oracle Database 10g. A user written script defines how Oracle Clusterware should start, stop and relocate the process when the cluster node status changes. This extends the high availability services of the cluster to any application running in the cluster. Oracle Database 10g Real Application Clusters (RAC) databases and associated Oracle processes (E.G. listener) are automatically managed by the clusterware.

67. Is it a requirement to have the public interface linked to ETH0 or does it only need to be on a ETH

lower than the private interface?: - public on ETH1 - private on ETH2

There is no requirement for interface name ordering. You could have - public on ETH2 - private on ETH0 Just make sure you choose the correct public interface in VIPCA, and in the installer's interconnect classification screen.

68. Does Oracle Clusterware have to be the same or higher release than all instances running on the

cluster?

Yes - Oracle Clusterware must be the same or a higher release with regards to the RDBMS or ASM Homes.Note#337737.1

69. Can I use Oracle Clusterware to monitor my EM Agent?

There is nothing special about the commands, but you do need to follow the startup/shutdown sequence to avoid any discontinuity of monitoring. The agent does start a watchdog that monitors the health of the actual monitoring process. This is done automatically at agent start. Therefore you could use Oracle Clusterware but you should not need to.

70. My customer has noticed tons of log files generated under $CRS_HOME/log//client, is there any way automated way we can setup through Oralce Clusterware to prevent/minimize/remove those aggressively generated files?

Check Note.5187351.8 You can either apply the patchset if it is available for your platform or have a cron job that removes these files until the patch is available.

71. I am trying to move my voting disks from one diskgroup to another and getting the error "crsctl replace votedisk – not permitted between ASM Disk Groups." Why?

You need to review the ASM and crsctl logs to see why the command is failing. To put your voting disks in ASM, you must have the diskgroup set up properly. There must be enough failure groups to support the redundancy of the voting disks as set by the redundancy on the disk group. EG: Normal redundancy, 3 failure groups are requried, High redundancy, 5 failure groups. Note: by default each disk in a diskgroup is put in its own failure group. The compatible.asm attribute of the diskgroup must be set to 11.2

and you must be using 11.2 version of Oracle Clusterware and ASM.

72. Does the hostname have to match the public name or can it be anything else?

When there is no vendor clusterware, only Oracle Clusterware, then the public node name must match the host name. When vendor clusterware is present, it determines the public node names, and the installer doesn't present an opportunity to change them. So, when you have a choice, always choose the hostname.

73. Can I use Oracle Clusterware to provide cold failover of my single instance Oracle Databases?

Oracle does not provide the necessary wrappers to fail over single-instance databases using Oracle Clusterware. It's possible for customers to use Oracle Clusterware to wrap arbitrary applications, it'd be possible for them to wrap single-instance databases this way. A sample can be found in the DEMOs that are

distributed with Oracle Database 11g.

74. Why does Oracle Clusterware use an additional 'heartbeat' via the voting disk, when other cluster software products do not?

Oracle uses this implementation because Oracle clusters always have access to a shared disk environment. This is different from classical clustering which assumes shared nothing architectures, and changes the decision of what strategies are optimal when compared to other environments. Oracle also supports a wide variety of storage types, instead of limiting it to a specific storage type (like SCSI), allowing the customer quite a lot of flexibility in configuration.

75. Why does Oracle still use the voting disks when other cluster sofware is present?

Voting disks are still used when 3rd party vendor clusterware is present, because vendor clusterware is not able to monitor/detect all failures that matter to Oracle Clusterware and the database. For example one known case is when the vendor clusterware is set to have its heartbeat go over a different network than RAC traffic. Continuing to use the voting disks allows CSS to resolve situations which would otherwise end up in cluster hangs.

76. In the course of failure testing in an extended RAC environment we find entries in the cssd logfile which indicate actions like 'diskShortTimeout set to (value)' and 'diskLongTimeout set to (value)'.

Can anyone please explain the meaning of these two timeouts in addition to disktimeout?

Having a short and long disktimeout, and no longer just one disktimeout, is due to patch for bug 4748797 (included in 10.2.0.2). The long disktimeout is 200 sec by default unless set differently via 'crsctl set css disktimeout', and applies to time outside a reconfiguration. The short disktimeout is in effect during a reconfiguration and is misscount-3s. The point is that we can tolerate a long disktimeout when all nodes are just running fine, but have to revert back to a short disktimeout if there's a reconfiguration.

77. During Oracle Clusterware installation, I am asked to define a private node name, and then on the next screen asked to define which interfaces should be used as private and public interfaces.

What information is required to answer these questions?

The private names on the first screen determine which private interconnect will be used by CSS.

Provide exactly one name that maps to a private IP address, or just the IP address itself. If a logical name is used, then the IP address this maps to can be changed subsequently, but if you IP address is specified CSS will always use that IP address. CSS cannot use multiple private interconnects for its communication hence only one name or IP address can be specified.

The private interconnect enforcement page determines which private interconnect will be used by the RAC instances.

It's equivalent to setting the CLUSTER_INTERCONNECTS init.ora parameter, but is more convenient because it is a cluster-wide setting that does not have to be adjusted every time you add nodes or instances.

RAC will use all of the interconnects listed as private in this screen, and they all have to be up, just as their IP addresses have to be when specified in the init.ora paramter. RAC does not fail over between cluster interconnects; if one is down then the instances using them won't start.

78. I am trying to install Oracle Clusterware (10.2) and when I run the OUI, at the Specify Cluster

Configuration screen, the Add, Edit and Remove buttons are grayed out. Nothing comes up in

the cluster nodes either. Why?

Check for 3rd Party Vendor clusterware (such as Sun Cluster or Veritas Cluster) that was not completely removed. IE Look for /opt/ORCLcluster directory, it should be removed.

79. What happens if I lose my voting disk(s)?

If you lose 1/2 or more of all of your voting disks, then nodes get evicted from the cluster, or nodes kick themselves out of the cluster. It doesn't threaten database corruption. Alternatively you can use external redundancy which means you are providing redundancy at the storage level using RAID.

For this reason when using Oracle for the redundancy of your voting disks, Oracle recommends that customers use 3 or more voting disks in Oracle RAC 10g Release 2. Note: For best availability, the 3 voting files should be physically separate disks. It is recommended to use an odd number as 4 disks will not be any more highly available than 3 disks, 1/2 of 3 is 1.5...rounded to 2, 1/2 of 4 is 2, once we lose 2 disks, our cluster will fail with both 4 voting disks or 3 voting disks.

Restoring corrupted voting disks is easy since there isn't any significant persistent data stored in the voting disk.

80. How should I test the failure of the public network (IE Oracle VIP failover) in my Oracle RAC environment?

Prior to 10.2.0.3, It was possible to test VIP failover by simply running ifconfig <interface_name> down.

The intended behaviour was that the VIP would failover to the another node. In 10.2.0.3 this is the same behaviour on Linux, however on other operating systems the VIP will NOT failover, instead the interface will be plumbed again. To test VIP failover on platforms other than Linux, the switch can be turned off or the physical cable pulled. The is best way to test. NOTE: if you have other DB’s that share the same IP’s then they will be affected. Your tests should simulate Production failures which are generally Switch errors or interface errors.

81. What is the voting disk used for?

A voting disk is a backup communications mechanism that allows CSS daemons to negotiate which subcluster will survive. These voting disks keep a status of who is currently alive and counts votes in case of a cluster reconfiguration. It works as follows:

a) Ensures that you cannot join the cluster if you cannot access the voting disk(s)

b) Leave the cluster if you cannot communicate with it (to ensure we do not have aberrant nodes)

c) Should multiple sub-clusters form, it will only allow one to continue. It prefers a greater number of nodes, and secondly the node with the lowest incarnation number.

d) Is kept redundant by Oracle in 10g Release 2 (you need to access a majority of existing voting disks) At most only one sub-cluster will continue and a split brain will be avoided.

82. I am installing Oracle Clusterware with a 3rd party vendor clusterware however in the "Specify

Cluster Configuration Page" , Oracle Clusterware installer doesn't show the existing nodes. Why?

This shows that Oracle Clusterware does not detect the 3rd Party clusterware is installed. Make sure you have followed the installation instructions provided by the vendor for integrating with Oracle RAC. Make sure LD_LIBRARY_PATH is not set.

For example with Sun Cluster, make sure the libskgxn* files to the /opt/ORCLcluster directory. Check that lsnodes returns the correct list of nodes in the Sun Cluster.

83. Can I run the fixup script generated by the 11.2 OUI or CVU on a running system?

It depends on what the problem that were listed to be fixed. The fixup scripts can change system parameters so you should not change system parameters while applications are running. However, if an earlier version of Oracle Database is already running on the system, there should not be any need to change the system parameters.

84. What should the permissions be set to for the voting disk and ocr when doing an Oracle RAC Install?

The Oracle Real Application Clusters install guide is correct. It describes the PRE-INSTALL ownership/permission requirements for ocr and voting disk. This step is needed to make sure that the Oracle Clusterware install succeeds. Please don't use those values to determine what the ownership/permmission should be POST INSTALL. The root script will change the ownership/permission of ocr and voting disk as part of install. The POST INSTALL permissions will end up being : OCR - root:oinstall - 640 Voting Disk -

oracle:oinstall - 644

85. Oracle Clusterware fails to start after a reboot due to permissions on raw devices reverting to default values. How do I fix this?

After a successful installation of Oracle Clusterware a simple reboot and Oracle Clusterware fails to start.

This is because the permissions on the raw devices for the OCR and voting disks e.g. /dev/raw/raw{x} revert to their default values (root:disk) and are inaccessible to Oracle. This change of behavor started with the 2.6 kernel; in RHEL4, OEL4, RHEL5, OEL5, SLES9 and SLES10. In RHEL3 the raw devices maintained their permissions across reboots so this symptom was not seen.

The way to fix this is on RHEL4, OEL4 and SLES9 is to create /etc/udev/permission.d/40-udev.permissions (you must choose a number that's lower than 50). You can do this by copying /etc/udev/permission.d/50- udev.permissions, and removing the lines that are not needed (50-udev.permissions gets replaced with upgrades so you do not want to edit it directly, also a typo in the 50-udev.permissions can render the system non-usable). Example permissions file:

# raw devices

raw/raw[1-2]:root:oinstall:0640

raw/raw[3-5]:oracle:oinstall:0660

Note that this applied to all raw device files, here just the voting and OCR devices were specified.

On RHEL5, OEL5 and SLES10 a different file is used /etc/udev/rules.d/99-raw.rules, notice that now the number must be (any number) higher than 50. Also the syntax of the rules is different than the permissions file, here's an example:

KERNEL=="raw[1-2]*", GROUP="oinstall", MODE="640"

KERNEL=="raw[3-5]*", OWNER="oracle", GROUP="oinstall", MODE="660"

86. Can the Network Interface Card (NIC) device names be different on the nodes in a cluster, for both public and private?

All public NICs must have the same name on all nodes in the cluster. Similarly, all private NICs must also have the same names on all nodes. Do not mix NICs with different interface types (infiniband, ethernet, hyperfabric, etc.) for the same subnet/network.

87. What are the Best Practices for using a clustered file system with Oracle RAC?

Can I use a cluster file system for OCR, Voting Disk, Binaries as well as database files?

Oracle Best Practice for using Cluster File Systems (CFS) with Oracle RAC

* Oracle Clusterware binaries should not be placed on a CFS as this reduces cluster functionality while CFS is recovering, and also limits the ability to perform rolling upgrades of Oracle Clusterware.

* Oracle Clusterware voting disks and the Oracle Cluster Registry (OCR) should not be placed on a CFS as the I/O freeze during CFS reconfiguration can lead to node eviction, or cluster management activities to fail (I.E start, stop, or check of a resource).

* Oracle Database 10g binaries are supported on CFS for Oracle RAC 10g and for Oracle Database. The system should be configured to support multiple ORACLE_HOME’s in order to maintain the ability to perform a rolling patch application.

* Oracle Database 10g database files (e.g. datafiles, trace files, and archive log files) are supported on CFS.

Check Certify for certified cluster file systems.

Rolling Upgrades with Cluster File Systems in General

It is not recommended to use a cluster file system (CFS) for the Oracle Clusterware binaries. Oracle Clusterware supports in-place rolling upgrades. Using a shared Oracle Clusterware home results in a global outage during patch application and upgrades. A workaround is available to clone the Oracle Clusterware home for each upgrade. This is not common practice.

If a patch is marked for rolling upgrade, then it can be applied to a Oracle RAC database in a rolling fashion. Oracle supports rolling upgrades for the Oracle Database Automatic Storage Management (ASM) after you have upgraded to Oracle Database 11g. When using a CFS for the database and ASM Oracle homes, the CFS should be configured to use of context dependent links (CDSLs) or equivalent and these should configured to work in conjunction with rolling upgrades and downgrades. This includes updating the database and ASM homes in the OCR to point to the current home.

88. Do I need to have user equivalence (ssh, etc...) set up after GRID/RAC is already installed?

Yes. Many assistants and scripts depend on user equivalence being set up.

89. Is Sun QFS supported with Oracle RAC? What about Sun GFS?

From certify, check there for the latest details.

Sun Cluster - Sun StorEdge QFS (9.2.0.5 and higher,10g and 10gR2): No restrictions on placement of files on QFS

Sun StorEdge QFS is supported for Oracle binary executables, database data files, archive logs, Oracle Cluster Registry (OCR), Oracle Cluster ReadyServices voting disk and recovery area can be placed on QFS.

Solaris Volume Manager for Sun Cluster can be used for host-based mirroring. Supports up to 8 nodes

90. With GNS, do ALL public addresses have to be DHCP managed (public IP, public VIP, public SCAN VIP)?

No, The choice to use DHCP for the public IPs is outside of Oracle. Oracle Clusterware and Oracle RAC will work with both static and DHCP assigned IP for the hostnames. When using GNS, Oracle Clusterwre will use DHCP for all VIPs in the cluster, which means node vips and SCAN vips.

91. How is the Oracle Cluster Registry (OCR) stored when I use ASM?

The OCR is stored similar to how Oracle Database files are stored. The extents are spread across all the disks in the diskgroup and the redundancy (which is at the extent leve) is based on the redundancy of the disk group. You can only have one OCR in a diskgroup. Best Practice for ASM is to have 2 diskgroups. Best Practice for OCR in ASM is to have a copy of the OCR in each diskgroup.

92. When does the Oracle node VIP fail over to another node and subsequently return to its home node?

The handling of the VIP with respect to a failover to another node and subsequent return to its home node is handled differently depending on the Oracle Clusterware version. In general, one can distinguish between Oracle Clusterware 10g & 11g Release 1 and Oracle Clusterware 11g Release 2 behavior.

For Oracle Clusterware 10g & 11g Release 1 the VIP will fail over to another node either after a network or a node failure. However, the VIP will automatically return to its home node only after a node failure and a subsequent restart of the node. Since the network is not constantly monitored in this Oracle Clusterware version, there is no way that Oracle Clusterware can detect the recovery of the network and initiate an automatic return of the node VIP to its home node.

Exception: With Oracle Patch Set 10.2.0.3 a new behavior was introduced that allowed the node VIP to return to its home node after the network recovered. The required network check was part of the database instance check. However, this new check introduced quite some side effects and hence, was disabled with subsequent bundle patches and the Oracle Patch Set 10.2.0.4

Starting with 10.2.0.4 and for Oracle Clusterware 11g Release 1 the default behavior is to avoid an automatic return of the node VIP to its home node after the network recovered. This behavior can be activated, if required, using the "ORA_RACG_VIP_FAILBACK" parameter. This parameter should only be used after reviewing support note 805969.1 (VIP does not relocate back to the original node starting from 10.2.0.4 and 11.1 even after the public network problem is resolved.)

With Oracle Clusterware 11g Release 2 the default behavior is to automatically initiate a return of the node VIP to its home node as soon as the network recovered after a failure. It needs to be noted that this behavior is not based on the parameter mentioned above and therefore does not induce the same side effects.

Instead, a new network resource is used in Oracle Clusterware 11g Release 2, which monitors the network constantly, even after the network failed and the resource became "OFFLINE". This feature is called "OFFLINE resource monitoring" and is per default enabled for the network resource.

93. How do I protect the OCR and Voting in case of media failure?

In Oracle Database 10g Release 1 the OCR and Voting device are not mirrored within Oracle,hence both must be mirrored via a storage vendor method, like RAID 1.

Starting with Oracle Database 10g Release 2 Oracle Clusterware will multiplex the OCR and Voting Disk (two for the OCR and three for the Voting).

94. How do I use multiple network interfaces to provide High Availability and/or Load Balancing for my interconnect with Oracle Clusterware?

This needs to be done externally to Oracle Clusterware usually by some OS provided nic bonding which gives Oracle Clusterware a single ip address for the interconnect but provide failover (High Availability) and/or load balancing across multiple nic cards. These solutions are provided externally to Oracle at a much lower level than the Oracle Clusterware, hence Oracle supports using them, the solutions are OS dependent and therefore the best source of information is from your OS Vendor. However, there are several articles in Metalink on how to do this. For example for Sun Solaris search for IPMP (IP network MultiPathing).

Note: Customer should pay close attention to the bonding setup/configuration/features and ensure their objectives are met, since some solutions provide only failover and some only loadbalancing still others claim to provide both. As always, it's always important to test your setup to ensure it does what it was designed to do.

When bonding with Network Interfaces that connect to separate switches (for redundancy) you must test if the NIC's are configured for active/active mode. The most reliable configuration for this architecture is to configure the NIC's for Active/Passive.

95. Is Server Side Load Balancing supported/recommended/proven technology in Oracle EBusiness Suite?

Yes, Customers are using it successfully today. It is recommended to set up both Client and Server side load balancing. Note that the pieces coming from 8.0.6 home (forms and ccm), connections are directed to RAC instance based on the sequence its listed in the TNS entry description list and may not get load balanced optimally. For Oracle RAC 10.2 or higher do not set PREFER_LEAST_LOADED_NODE = OFF in your listener.ora, please set the CLB_GOAL on the service.

96. What are the maximum number of nodes under OCFS on Linux ?

Oracle 9iRAC on Linux, using OCFS for datafiles, can scale to a maximum of 32 nodes. According to the ** OCFS2 User Guide User Guide, OCFS 2 can support up to 255 nodes.

97. Can I use OCFS with SE Oracle RAC?

It is not supported to use OCFS with Standard Edition Oracle RAC. All database files must use ASM (redo logs, recovery area, datafiles, control files etc). You can not place binaries on OCFS as part of the SE Oracle RAC terms. We recommend that the binaries and trace files (non-ASM supported files) to be replicated on all nodes. This is done automatically by install.

98. Can I use TAF with e-Business in a RAC environment?

TAF itself does not work with e-Business suite due to Forms/TAF limitations, but you can configure the tns failover clause. On instance failure, when the user logs back into the system, their session will be directed to a surviving instance, and the user will be taken to the navigator tab. Their committed work will be available; any uncommitted work must be re-started.

We also recommend you configure the forms error URL to identify a fallback middle tier server for Forms processes, if no router is available to accomplish switching across servers.

99. How to configure concurrent manager in a RAC environment?

Large clients commonly put the concurrent manager on a separate server now (in the middle tier) to reduce the load on the database server. The concurrent manager programs can be tied to a specific middle tier (e.g., you can have CMs running on more than one middle tier box). It is advisable to use specilize CM. CM middle tiers are set up to point to the appropriate database instance based on product module being used.

100. Should functional partitioning be used with Oracle Applications?

We do not recommend functional partitioning unless throughput on your server architecture demands it. Cache fusion has been optimized to scale well with non-partitioned workload.

If your processing requirements are extreme and your testing proves you must partition your workload in order to reduce internode communications, you can use Profile Options to designate that sessions for certain applications Responsibilities are created on a specific middle tier server. That middle tier server would then be configured to connect to a specific database instance.

To determine the correct partitioning for your installation you would need to consider several factors like number of concurrent users, batch users, modules used, workload characteristics etc.

101. Can I use Automatic Undo Management with Oracle Applications?

Yes. In a RAC environment we highly recommend it.

102. What is the optimal migration path to be used while migrating the E-Business suite to Oracle RAC?

Following is the recommended and most optimal path to migrate you E-Business suite to an Oracle RAC

environment:

1. Migrate the existing application to new hardware. (If applicable).

2. Use Clustered File System (ASM recommended) for all data base files or migrate all database files to raw devices. (Use dd for Unix or ocopy for NT)

3. Install/upgrade to the latest available e-Business suite.

4. Ensure the database version is supported with Oracle RAC

5. In step 4, install Oracle RAC option and use Installer to perform install for all the nodes.

6. Clone Oracle Application code tree.

103. How do I gather all relevant Oracle and OS log/trace files in an Oracle RAC cluster to provide to

Support?

Use RAC-DDT (RAC Diagnostic Data Tool), User Guide is in Note: 301138.1. Quote from the User Guide:

RACDDT is a data collection tool designed and configured specifically for gathering diagnostic data related to Oracle's Real Application Cluster (RAC) technology. RACDDT is a set of scripts and configuration files that is run on one or more nodes of an Oracle RAC cluster. The main script is written in Perl, while a number of proxy scripts are written using Korn shell. RACDDT will run on all supported Unix and Linux platforms, but is not supported on any Windows platforms.

Newer versions of RDA (Remote Diagnostic Agent) have the RAC-DDT functionality, so going forward RDA is the tool of choice.

104. My customer wants to understand what type of disk caching they can use with their Windows RAC Cluster, the install guide tells them to disable disk caching?

If the write cache identified is local to the node then that is bad for RAC. If the cache is visible to all nodes as

a 'single cache', typically in the storage array, and is also 'battery backed' then that is OK.

105. My customer has a failsafe cluster installed, what are the benefits of moving their system to RAC?

Fail Safe development is continuing. Most work on the product will be around accomodating changes in the supported resources (new releases of RDBMS, AS, etc.) and the underlying Microsoft Cluster Services and Windows operating system.

A failsafe protected instance is an Active/Passive instance so, as such, does not benefit that much at all from adding more nodes to a cluster. Microsoft have a limit of nodes in a MSCS cluster. (typically 8 nodes - but it does vary). RAC is active active so you get dual benefits of increased scalability and availability every time you add a node to a cluster. We have a limit of 100 nodes in a RAC cluster (we don't use MSCS). Your customer should really consider more than 2 nodes. (because of aggregate computer power on node failure). If the choice is 2 of 4 CPU nodes or 4 of 2CPU node then I would go for 2 CPU nodes. Customers are using both Windows Itanium RAC and Windows X64 RAC. Windows X64 seems more popular.

Keep in mind, though, that for Fail Safe, if the server is 64-Bit, regardless of flavor, Fail Safe Manager must be installed on a 32-Bit client, which will complicate things just a bit. There is no such restriction for RAC, as all management for RAC can be done via Grid Control or Database Control. For EE RAC you can implement an 'extended cluster' where there is a distance between the nodes in the cluster (usually less than 20 KM).

106. Do I need HACMP/GPFS to store my OCR/Voting file on a shared device.

The prerequisites doc for AIX clearly says:

"If you are not using HACMP, you must use a GPFS file system to store the Oracle CRS files"

==> this is a documentation bug and this will be fixed with 10.1.0.3

Note also that on AIX it is important to use the reserve_lock=no/reserve_policy =no_reserve per shared, concurrent device in order to allow AIX to access the devices from more than one node simultaneously. Check the current setting using: "/usr/sbin/lsattr -El hdiskn |grep reserve".

Depending on the type of storage used, the command should return "no_reserve" or a similar value for all disks meant to be used for Oracle RAC. If requiredd, use the /dev/rhdisk devices (character special) for the crs and voting disk and change the attribute with the following command

chdev -l hdiskn -a reserve_lock=no

(for ESS, EMC, HDS, CLARiiON, and MPIO-capable devices you have to do an chdev -l hdiskn -a reserve_policy=no_reserve)

107. Is VIO supported with RAC on IBM AIX?

VIO is supported on IBM AIX.

108. Is HACMP needed for RAC on AIX 5.2 using GPFS file system?

The newest version of GPFS can be used without HACMP, if it is available for AIX 5.2 then you do not need HACMP.

109. Can I run Oracle RAC 10g on my IBM Mainframe Sysplex environment (z/OS)?

YES! There is no separate documentation for RAC on z/OS. What you would call "clusterware" is built in to the OS and the native file systems are global. IBM z/OS documentation explains how to set up a Sysplex Cluster; once the customer has done that it is trivial to set up a RAC database. The few steps involved are covered in in Chapter 14 of the Oracle for z/OS System Admin Guide, which you can read here. There is also an Install Guide for Oracle on z/OS ( here) but I don't think there are any RAC-specific steps in the installation. By the way, RAC on z/OS does not use Oracle's clusterware (CSS/CRS/OCR).

110. Can I use Oracle Clusterware for failover of the SAP Enqueue and VIP services when running SAP in a RAC environment?

Oracle has created sapctl to do this and it is available for certain platforms. SAPCTL will be available for download on SAP Services Marketplace on AIX and Linux. For Solaris, it will not be available in 2007, use Veritas or Sun Cluster.

111. Does the Oracle Cluster File System (OCFS) support network access through NFS or Windows Network Shares?

No, in the current release the Oracle Cluster File System (OCFS) is not supported for use by network access approaches like NFS or Windows Network Shares.

112. Why should I use RAC One Node instead of Oracle Fail Safe on Windows?

Oracle RAC One Node provides better high availability than Oracle Fail Safe. RAC One Node's ability to online relocate a database offers protection from both unplanned failures and maintenance outages. Fail Safe only protects from failures and cannot online relocate a database. RAC One Node supports online maintenance operations such as online database patches, online OS patches and upgrades, online database relocation for load balancing, online server migrations, and online upgrade to full RAC. In an environment where it is difficult to get windows of downtime for maintenance, this is a big advantage. Also, where Fail Safe

is only available on Windows, RAC One Node is available on all platforms. A customer with a mixed platform environment would benefit from having a standard HA solution across all their platforms.

113. Can I configure HP's Autoport aggregation for NIC Bonding after the install?(i.e. not present beforehand)

You are able to add NIC bonding after the installation although this is more complicated than the other way round. There are several notes on webiv regarding this. Note 276434.1 Modifying the VIP of a Cluster Node Regarding the private interconnect, please use oifcfg delif / setif to modify this.

Configure Redundant Network Cards / Switches for Oracle Database 10g Release 1 Real Application Cluster on Linux

114. What do I do when I get an ORA-01031 error logging into the ASM instance?

This sounds like the ORA_DBA group on Node2 is empty, or else does not have the correct username in it. Double-check what user account you are using to logon to Node2 as ( a 'set' command will show you the USERNAME and USERDOMAIN values) and then make sure that this account is part of ORA_DBA.

The other issue to check is that SQLNET.AUTHENTICATION_SERVICES=(NTS) is set in the SQLNET.ORA

115. The OracleCRService does not start with my windows Oracle RAC implementation, what do I do?

If OracleCRService doesn't start that's quite a different issue than say OracleCSService not starting -because due to dependencies, this is the last of the three Oracle Clusterware services that we expect to start.

This could be caused by a few different things. It could be caused by a change from to auto-negotiate instead of 100/full on the interconnect. Once set back to 100/full on all NICs as well as the network switch associated with the interconnect the problem is resolved. This could also be: - inability to access the shared disk housing your OCR - permissions issue OR - Bug:4537790 which introduced OPMD to begin with - which for reference sake was logged against 9.2.0.8 ... and is still relevant today in 10.2.0.3 times. For OPMD, see Metalink Note 358156.1

116. How do I verify that Host Bus Adapter Node Local Caching has been disabled for the disks I will be using in my RAC cluster?

Disabling write caching is a standard practice while using the volume managers/file systems are shared. Go to My computer -> Manage->Storage->Disk Management->Disk-Properties->Policies-> and uncheck the "Enable Write Caching on Disk". This will disable the write caching.

3rd party HBA's may have their own management tools to modify these settings. Just remember that centralized, shared cache is generally OK. It's the node local cache that you need to turn off. How exactly you do this will vary from HBA vendor to HBA vendor.

117. Can I run my Oracle 9i RAC and Oracle RAC 10g on the same Windows cluster?

Yes but the Oracle 9i RAC database must have the 9i Cluster Manager and you must run Oracle Clusterware for the Oracle Database 10g. 9i Cluster Manager can coexsist with Oracle Clusterware 10g. Be sure to use the same 'cluster name' in the appropriate OUI field for both 9i and 10g when you install both together in the same cluster.

The OracleCMService9i service will remain intact during the Oracle Clusterware 10g install, as a Oracle 9i RAC database would require that the 9i OracleCMService9i, it should be left running. The information for the 9i database will get migrated to the OCR during the Oracle Clusterware installation. Then, for future database management, you would use the 9i srvctl to manage the 9i database, and the 10g srvctl to manage any new 10g databases. Both srvctl commands will use the OCR. The same applies for Oracle RAC 11g

118. When using MS VSS on Windows with Oracle RAC, do I need to run the VSS on each node where I have an Oracle RAC instance?

There is no need to run Oracle VSS writer instance on each Oracle RAC node (even though it is installed and enabled by default on all nodes). And the documentation in Windows Platform Doc for Oracle VSS writer is applicable to Oracle RAC also.

The ability of clustered file system to create a Windows Shadow copy is a MUST to backup Oracle RAC database using Oracle VSS writer. The only other requirement is that, all the archived logs generated by database must be accessible on node where backup is initiated using Oracle VSS writer.

VSS coordinates storage snapshot of db files - the VSS writer places the db in hot backup mode so that the VSS provider can initiate the snapshot. So, RMAN is not backing up anything in this case. When a VSS restore of a db is issued, the writer automatically invokes RMAN to perform needed recovery actions after the snapshot is restored by the provider - that is the real value add of the writer.

119. How do I configure raw devices in order to install Oracle Clusterware 10g on RHEL5 or OEL5?

The raw devices OS support scripts like /etc/sysconfig/rawdevices are not shipped on RHEL5 or OEL5, this is because raw devices are being deprecated on Linux. This means that in order to install Oracle Clusterware 10g you'd have to manually bind the raw devices to the block devices for the OCR and voting disks so that the 10g installer will proceed without error.

Refer to Note 465001.1 for exact details on how to do the above.

Oracle Clusterware 11g doesn't require this configuration since the installer can handle block devices directly.

120. How to reorder or rename logical network interface (NIC) names in Linux

Although this is rarely needed, since most hardware will detect the cards in the correct order on all nodes, if you still need to change/control the ordering, see external website, here is more help on writing UDEV rules.

121. Can I configure IPMP in Actie/Active to increase bandwidth of my interconnect?

For IPMP For active/active configurations please follow the sun doc instructions http://docs.sun.com/app/docs/doc/816-4554/6maoq027i?a=view IPMP active/active is known to load balance on transmit but serialize on a single interface for receive. So you are likely not to get the throughput you might have expected. Unless you experience explicit bandwidth limitations that require active/active, it is a recommended best practice to configure for maximum availability, as described in webiv note 283107.1.

Please note too that debugging active/active interfaces at the network layer is cumbersome and time consuming. In an active/active configuration and the switch side link fails, you are likely to lose both interconnect connections, whereas active/standby, you would failover.

122.Does Sun Solaris have a multipathing solution ?

Sun Solaris includes an inherent Multipathing tool: MPXIO - this is part of Solaris. You need to have the SanFoundation Kit installed (newest version). Please, be aware that the machines are installed following the EIS-standard. This is a quality assurance standard introduced by Sun that mainly takes care that you always have the newest patches.

MPXIO is free of charge and comes with Solaris 8,9,10. BTW, if you have a Sun LVM, it would use this feature indirectly. Therefore, Sun confirmed that MPXIO will work with RAWs.

123. Are Red Hat GFS and GULM certified for DLM?

Both are part of Red Hat RHEL 5. For Oracle Database 10g Release 2 on Linux x86 and Linux x86-64, it is certified on OEL5 and RHEL5 as per certify. GFS is not certified yet , certification in progress by RedHat. OCFS2 is certified and it's the preferred choice for Oracle. ASM is recommended storage for the database. Since GFS is part of the RHEL5 distribution and Oracle fully supports RHEL under the Unbreakable Linux Progam, Oracle will support GFS as part of RHEL5 for customers buying the Unbreakable Linux Support. This only applies to RHEL5 and not to RHEL4 where GFS is distributed with an additional fee

124. In Solaris 10, do we need Sun Cluster to provide redundancy for the interconnect and multiple switches?

Link Aggregation (GLDv3) is bundled in the OS as of Solaris 10. IPMP is available for Solaris 10 and Solaris 9. Neither require Sun Cluster to be installed. For the interconnect and switch redundancy, as a best practice, avoid VLAN trunking across the switches. We can configure stand-alone redundant switches that do not require the VLAN to be trunked between them, nor the need for an inter-switch link (ISL). If the interconnect VLAN is trunked with other VLANS between the redundant switches, insure that the interconnect VLAN is pruned from the trunk to avoid unnecessary traffic propagation through the corportate network. For ease of

configuration (e.g. fewer IP address requirements), use IPMP with link mode failure detection in primary/standby configuration. This will give you a single failover IP which you will define in cluster_interconnects init.ora parameter. Remove any interfaces for the interconnect from the OCR using `oifcfg delif`. AND TEST THIS RIGOROUSLY. For now, as Link Aggregation (GLDv3) cannot span multiple switches from a single host, you will need to configure the switch redundancy and the host NICs with IPMP.

When configuring IPMP for the interconnect with multiple switches available, configure IPMP as active/standby and *not* active/active. This is to avoid potential latencies in switch failure detection/failover which may impact the availability of the rdbms. Note, IPMP spreads/load balances outbound packets on the bonded interfaces, but inbound packets are received on a single interface. In an active/active configuration this makes send/receive problems difficult to diagnose. Both Link Aggregation (GLDv3) and IPMP are core OS packages SUNWcsu, SUNWcsr respectively and do not require Sun Clusterware.

125. Is OCFS2 certified with Oracle RAC 10g?

Yes. See Certify to find out which platforms are currently certified.

126. How do I configure my RAC Cluster to use the RDS Infiniband?

The configuration takes place below Oracle. You need to talk to your Infiniband vendor. Check certify for what is currently available as this will change as vendors adopt the technology. The database must be at least 10.2.0.3. If you want to switch a database running with IP over IB, you will need to relink Oracle.

$ cd $ORACLE_HOME/rdbms/lib $ make -f ins_rdbms.mk ipc_rds ioracle

You can check your interconnect through the alert log at startup. Check for the string “cluster interconnect IPC version:Oracle RDS/IP (generic)” in the alert.log file. See Note: 751343.1 for more details.

127.Can different releases of Oracle RAC be installed and run on the same physical Linux cluster?

Yes - However Oracle Clusterware (CRS) will not support a Oracle 9i RAC database so you will have to leave the current configuration in place. You can install Oracle Clusterware and Oracle RAC 10g or 11g into the same cluster. On Windows and Linux, you must run the 9i Cluster Manager for the 9i Database and the Oracle Clusterware for the 10g Database. When you install Oracle Clusterware, your 9i srvconfig file will be converted to the OCR. Oracle 9i RAC, Oracle RAC 10g, and Oracle RAC 11g will use the OCR. Do not restart the 9i gsd after you have installed Oracle Clusterware. Remember to check certify for details of what

vendor clusterware can be run with Oracle Clusterware. Oracle Clusterware must be the highest level (down to the patchset). IE Oracle Clusterware 11g Release 2 will support Oracle RAC 10g and Oracle RAC 11g databases. Oracle Clusterware 10g can only support Oracle RAC 10g databases.

128. Is 3rd Party Clusterware supported on Linux such as Veritas or Redhat?

No, Oracle RAC 10g and Oracle RAC 11g do not support 3rd Party clusterware on Linux. This means that if a cluster file system requires a 3rd party clusterware, the cluster file system is not supported.

129. Can the Oracle Database Configuration Assistant (DBCA) be used to create a database with Veritas DBE / AC 3.5?

DBCA can be used to create databases on raw devices in 9i RAC Release 1 and 9i Release 2. Standard database creation scripts using SQL commands will work with file system and raw. DBCA cannot be used to create databases on file systems on Oracle 9i Release 1. The user can choose to set up a database on raw devices, and have DBCA output a script. The script can then be modified to use cluster file systems instead.

With Oracle 9i RAC Release 2 (Oracle 9.2), DBCA can be used to create databases on a cluster filesystem. If the ORACLE_HOME is stored on the cluster filesystem, the tool will work directly. If ORACLE_HOME is on local drives on each system, and the customer wishes to place database files onto a cluster file system, they must invoke DBCA as follows: dbca -datafileDestination /oradata where /oradata is on the CFS filesystem. See 9iR2 README and bug 2300874 for more info.

130. Is Oracle Database on VMware support? Is Oracle RAC on VMware supported?

Oracle Database support on VMware is outlined in Metalink Note 249212.1. Effectively, for most customers, this means they are not willing to run production Oracle databases on VMware. Regarding Oracle RAC - the explicit mention not to run RAC on vmware was removed in 11.2.0.2 (Novemeber 2010)

Manoj Simar - The Technology Expert

About Me

Monday, 2 July 2018

Interview Q and A for Oracle RAC Part - 2

No comments:

Post a Comment