Manoj Simar - The Technology Expert: Interview Q and A for Oracle RAC Part

421. What is runfixup.sh script in Oracle Clusterware 11g release 2 installation

With Oracle Clusterware 11g release 2, Oracle Universal Installer (OUI) detects when the minimum requirements for an installation are not met, and creates shell scripts, called fixup scripts, to finish incomplete system configuration steps. If OUI detects an incomplete task, then it generates fixup scripts (runfixup.sh). You can run the fixup script after you click the Fix and Check Again Button. The Fixup script does the following:

If necessary sets kernel parameters to values required for successful installation, including:

--Shared memory parameters.

--Open file descriptor and UDP send/receive parameters.

Sets permissions on the Oracle Inventory (central inventory) directory.Reconfigures primary and secondary group memberships for the installation owner, if necessary, for the Oracle Inventory directory and the operating system privileges groups.

--Sets shell limits if necessary to required values

422. What is OLR? Which of the following statements regarding the Oracle Local Registry (OLR) is true?

1.Each cluster node has a local registry for node-specific resources.

2.The OLR should be manually created after installing Grid Infrastructure on each node in the cluster.

3.One of its functions is to facilitate Clusterware startup in situations where the ASM stores the OCR and voting disks.

4.You can check the status of the OLR using ocrcheck.

What is the purpose of OLR?

Oracle Local repository contains information that allows the cluster processes to be started up with the OCR being in the ASM storage ssytem. Since the ASM file system is unavailable until the Grid processes are started up a local copy of the contents of the OCR is required which is stored in the OLR.

423. How to check the node connectivity in Oracle Grid Infrastructure?

$ cluvfy comp nodecon -n all –verbose

424. Can I stop all nodes in one command? Meaning that stopping whole cluster ?

In 10g its not possible, where in 11g it is possible

[root@pic1]# crsctl start cluster -all
[root@pic2]# crsctl stop cluster –all

425. Managing or Modifying SCAN in Oracle RAC?

To add a SCAN VIP resource:

$ srvctl add scan -n cluster01-scan

To remove Clusterware resources from SCAN VIPs:

$ srvctl remove scan [-f]

To add a SCAN listener resource:

$ srvctl add scan_listener

$ srvctl add scan_listener -p 1521

To remove Clusterware resources from all SCAN listeners:

$ srvctl remove scan_listener [-f]

426. How to change Cluster interconnect in RAC?

On a single node in the cluster, add the new global interface specification:

$ oifcfg setif -global eth2/192.0.2.0:cluster_interconnect

Verify the changes with oifcfg getif and then stop Clusterware on all nodes by running the following command as root on each node:

# oifcfg getif

# crsctl stop crs

Assign the network address to the new network adapters on all nodes using ifconfig:

#ifconfig eth2 192.0.2.15 netmask 255.255.255.0 \ broadcast 192.0.2.255

Remove the former adapter/subnet specification and restart Clusterware:

$ oifcfgdelif -global eth1/192.168.1.0

# crsctl start crs

427. How to find the cluster network settings?

To determine the list of interfaces available to the cluster:

$ oifcfg iflist –p -n

To determine the public and private interfaces that have been configured:

$ oifcfg getif

eth0 192.0.2.0 global public

eth1 192.168.1.0 global cluster_interconnect

To determine the Virtual IP (VIP) host name, VIP address, VIP subnet mask, and VIP interface name:

$ srvctl config nodeapps -a

VIP exists.:host01

VIP exists.: /192.0.2.247/192.0.2.247/255.255.255.0/eth0

428. How to check the cluster (all nodes) status?

To check the viability of Cluster Synchronization Services (CSS) across nodes:

$ crsctl check cluster

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

429. How to check the cluster (one node) status?

$ crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

430. How to control Oracle Clusterware?

To start or stop Oracle Clusterware on a specific node:

# crsctl stop crs

# crsctl start crs

To enable or disable Oracle Clusterware on a specific node:

# crsctl enable crs

# crsctl disable crs

431. What is node listener?

In 11gr2 the listeners will run from Grid Infrastructure software home

 The node listener is a process that helps establish network connections from ASM clients to the ASM instance.

 Runs by default from the Grid $ORACLE_HOME/bin directory

 Listens on port 1521 by default

 Is the same as a database instance listener

 Is capable of listening for all database instances on the same machine in addition to the ASM instance

 Can run concurrently with separate database listeners or be replaced by a separate database listener

 Is named tnslsnr on the Linux platform

432. What is GNS?

Grid Naming service is alternative service to DNS , which will act as a sub domain in your DNS but managed by Oracle, with GNS the connection is routed to the cluster IP and manages internally.

433. How do I define a service for a Policy-Managed Database?

When you define services for a policy-managed database, you define the service to a server pool where the database is running. You can define the service as either UNIFORM (running on all instances in the server pool) or SINGLETON (running on only one instance in the server pool). For SINGLETON services, Oracle RAC chooses on which instance in the server pool the service is active. If that instance fails, then the service fails over to another instance in the server pool. A service can only run in one server pool.

Services for administrator-managed databases continue to be defined by the PREFERRED and AVAILABLE definitions.

434. How do I convert from a Policy-Managed Database to Administrator-Managed Database?

You cannot directly convert a policy-managed database to an administrator-managed database. Instead, you can remove the policy-managed configuration using the 'srvctl remove database' and 'srvctl remove service' commands, and then create a new administrator-managed database with the 'srvctl add database' command.

435. What is Grid Plug and Play (GPnP)?

Grid Plug and Play (GPnP) eliminates per-node configuration data and the need for explicit add and delete node steps. This allows a system administrator to take a template system image and run it on a new node with no further configuration. This removes many manual operations, reduces the opportunity for errors, and encourages configurations that can be changed easily. Removal of the per-node configuration makes the nodes easier to replace, because they do not need to contain individually-managed state.

Grid Plug and Play reduces the cost of installing, configuring, and managing database nodes by making their per-node state disposable. It allows nodes to be easily replaced with regenerated state.

To add a node, simply connect the server to the cluster and allow the cluster to configure the node.

To make it happen, Oracle uses the profile located in $GI_HOME/gpnp/profiles/peer/profile.xml which contains the cluster resources, for example disk locations of ASM. etc.

So this profile will be read local or from the remote machine when plugged into cluster and dynamically added to cluster.

436. What is a Server Pool?

Server Pools are a new management entity introduced in Oracle Clusterware 11g to give IT administrators the ability to better manage their applications and datacenters along actual workload lines. Server Pools are a logical container, where like hardware and work can be organized and given importance and availability semantics. This allows administrators as well as QoS Management to actively grow and shrink these groups to meet the hour-to-hour, day-to-day application demands with optimum utilization of available resources. The use of Server Pools does not require any application code changes, re-compiling or re-linking. Server Pools also allow older non-QoS Management supported databases and middleware to co-exist in a single cluster

without interfering with the management of newer supported versions.

Server pools enable the cluster administrator to create a policy which defines how Oracle Clusterware allocates resources. An Oracle RAC policy-managed database runs in a server pool. Oracle Clusterware attempts to keep the required number of servers in the server pool and, therefore, the required number of instances of the Oracle RAC database. A server can be in only one server pool at any time. However, a database can run in multiple server pools. Cluster-managed services run in a server pool where they are defined as either UNIFORM (active on all instances in the server pool) or SINGLETON (active on only one instance in the server pool).
You should create redo log groups only if you are using administrator-managed databases. For policy-managed databases, increase the cardinality and when the instance starts, if you are using Oracle Managed Files and Oracle ASM, then Oracle automatically allocates the thread, redo, and undo.
If you remove an instance from your Oracle RAC database, then you should disable the instance’s thread of redo so that Oracle does not have to check the thread during database recovery.
For policy-managed databases, Oracle automatically allocates the undo tablespace when the instance starts if you have OMF enabled.

Read here

437. How does the database register with the Listener?

When a listener starts after the Oracle instance starts, and the listener is listed for service registration, registration does not occur until the next time the Oracle Database process monitor (PMON) discovery routine starts. By default, PMON discovery occurs every 60 seconds.

To override the 60-second delay, use the SQL ‘ALTER SYSTEM REGISTER’ statement. This statement forces the PMON process to register the service immediately.

If you run this statement while the listener is up and the instance is already registered, or while the listener is down, then the statement has no effect.

438. Can I configure both failure notifications with Universal Connection Pool (UCP)?

Connection failure notification is redundant with Fast Connection Failover (FCF) as implemented by the UCP. You should not configure both within the same application.

439. Should I configure Transparent Application Failure (TAF) in my service definition if using Fast Connection Failure (FCF)?

Do not configure Transparent Application Failover (TAF) with Fast Connection Failover (FCF) for JDBC clients as TAF processing will interfere with FAN ONS processing.

440. Can I use Fast Connection Failover (FCF) and Transparent Application Failover (TAF) together?

No. Only one of them should be used at a time.

441. What is the status of Fast Connection Failover (FCF) with Universal Connection Pool (UCP)?

FCF is now deprecated along with the Implicit Connection Caching in favor of using the Universal Connection Pool (UCP) for JDBC.

442. Do I still need to backup my Oracle Cluster Registry (OCR) and Voting Disks?

You no longer have to back up the voting disk. The voting disk data is automatically backed up in OCR as part of any configuration change and is automatically restored to any voting disk added. If all voting disks are corrupted, however, you can restore.

Oracle Clusterware automatically creates OCR backups every four hours. At any one time, Oracle Database always retains the last three backup copies of OCR. The CRSD process that creates the backups also creates and retains an OCR backup for each full day and at the end of each week. You cannot customize the backup frequencies or the number of files that Oracle Database retains.

443. How is DBMS_JOB functionality affected by RAC?

DBMS jobs can be set to run either on database (i.e. any active instance), or a specific instance.

444. What is PARELLEL_FORCE_LOCAL?

By default, the parallel server processes selected to execute a SQL statement can operate on any or all Oracle RAC nodes in the cluster. By setting PARALLEL_FORCE_LOCAL to TRUE, the parallel server processes are restricted to just one node, the node where the query coordinator resides (the node on which the SQL statement was executed). However, in 11.2.0.1 when this parameter is set to TRUE the parallel degree calculations are not being adjusted correctly to only consider the CPU_COUNT for a single node. The parallel degree will be calculated based on the RAC-wide CPU_COUNT and not the single node CPU_COUNT. Due to this bug 9671271 it is not recommended that you set PARALLEL_FORCE_LOCAL to TRUE in 11.2.0.1, instead you should setup a RAC service to limit where parallel statements can execute.

445. Why does my user appear across all nodes when querying GV$SESSION when my service does not span all nodes?

The problem is you are querying GV$SESSION as the ABC user and this results in the "strange" behaviour. If you select gv$session, 2 parallel servers are spawned to query the v$session on each node. This happens as the same user. Hence when you query gv$session as ABC you are seeing 3 (one real and 2 parallel slaves querying v$session on each instance). The reason you are seeing 1 on one node and 3 on the other is the order in which the parallel processes query the v$session. Take the sys (or any other) user to query the session of ABC and you will not see this problem.

446. What is the Service Management Policy?

When you use automatic services in an administrator-managed database, during planned database startup, services may start on the first instances to start rather than their preferred instances. Prior to Oracle RAC 11 g release 2 (11.2), all services worked as though they were defined with a manual management policy.

447. What is the Oracle Database Quality of Service Management?

Oracle Database QoS Management is an automated, policy-based product that monitors the workload requests for an entire system. Oracle Database QoS Management manages the resources that are shared across applications and adjusts the system configuration to keep the applications running at the performance levels needed by your business. Oracle Database QoS Management responds gracefully to changes in system configuration and demand, thus avoiding additional oscillations in the performance levels of your applications. If you use Oracle Database Quality of Service Management (Oracle Database QoS Management), then you cannot have SINGLETON services in a server pool, unless the maximum size of that server pool is one.

448. What are the different types of failover mechanisms available?

· JDBC-THIN driver supports Fast Connection Failover (FCF)

· JDBC-OCI driver supports Transparent Application Failover (TAF)

· JDBC-THIN 11gR2 supports Single Client Access Name (SCAN)

449. Is FCF provided by Oracle JDBC 9i drivers ?

No. FCF is built on the pooling feature known as 'Implicit Connection Caching' and this is available only with JDBC 10g or higher versions. Please also note that in version 11gR2 the FCF is now deprecated along with the Implicit Connection Caching in favor of using the Universal Connection Pool (UCP)

450. We have been told by Oracle that FCF is not available when using XA JDBC drivers. What HA options are available to us should we wish to use XA?

In 10g JDBC, the driver-embedded connection pool (ICC) cannot pool XA connections i.e., XAConnection objects. FCF only works for cached connections from OracleDataSource.

As a workaround, you can use Universal Connection Pool (UCP) the new connection pool, which supports pooling XA connections, on top of 11g JDBC. And this works against both 11g and 10g database.

451. Are there any special database setup configurations that we need to take into consideration when using XA and HA together?

In general, when doing XA with a RAC database, you need to use affinity to control on which node you're connected to. In short, you want to ensure that given a global transaction, all the work goes to a single database instance.

To add to the above information, it should be noted that starting in RDBMS 11gR1, transaction branches can go to different instances even though there could be a potential performance cost (for row locking synchronization, etc.). Before 11gR1, all transaction branches had to the same instance.

You can refer to this old but still interesting whitepaper:

http://www.oracle.com/technetwork/database/enterprise-edition/bestpracticesforxaandrac-128676.pdf

452. What is Connection Affinity?

Connection affinity is a performance feature that allows a connection pool to select connections that are directed at a specific Oracle RAC instance. The pool uses run-time connection load balancing (if configured) to select an Oracle RAC instance to create the first connection and then subsequent connections are created with an affinity to the same instance.

453. Can a vendor clusterware be used to start or stop the Oracle Clusterware stack?

The Oracle Clusterware commands to start and stop, enable and disable the stack are documented commands. Customers are free to use them any way they need to operate a coherent cluster environment. Best practice is to not change the default method of starting Oracle Clusterware

454. Can a vendor clusterware be used to control an Oracle-managed resource?

Oracle-managed resources include the Oracle RAC instance, Oracle ASM instance, TNS listener and Oracle virtual IP address (VIP). (For a complete list of Oracle-managed resources, use: "crsctl stat res |grep NAME=ora").

As of Oracle Database 10g Release 2, the Oracle Clusterware APIs are documented. Customers are free to use these programmatic interfaces in their custom or non-Oracle software to operate and maintain coherent cluster environment. To start and stop oracle resources named with 'ora.', you must use srvctl.

Oracle does not support 3rd party applications that check oracle resources and take corrective actions on those resources.

Best practice is to leave Oracle resources controlled by Oracle Clusterware.

For any other resource, either Oracle or vendor clusterware (not both) can manage the resource directly.

455. Can Oracle Clusterware be used to model a volume or disk group that is already controlled by a non-Oracle clusterware framework?

Yes, but Oracle Clusterware should only monitor the resource and leave the management of the resource to the framework that is currently controlling the resource.

As of Oracle Database 10g Release 2, the new Oracle Clusterware APIs can be invoked to register and start/stop new resources for HA monitoring. Modifying the dependency of Oracle resources (i.e. those names begin with 'ora.') can be done only after verifying with Oracle Support. Use srvctl to manage Oracle defined resources (ie. with names that start with 'ora.') If an Oracle-managed resource is modelled to depend on a new, 3rd party resource (such as a disk group or cluster filesystem), those scripts will cause the Oracle-managed resources to fail as well. In addition, creating a resource for an entity that is managed by a different HA framework (I.E. Vendor clusterware) also leads to the dueling HA framework problems. Extreme care must be taken to implement this type of relationship.

In Oracle Database 10g Release 1, the Oracle Clusterware APIs are not public, use srvctl only to manage Oracle resources. Best practice is to leave Oracle resources unmodified.

456. Can Oracle Clusterware be configured to not auto-start some Oracle resources at node boot time?

By default, Oracle Clusterware is configured to start the VIP, listener, instance, ASM, database services and other resources during system boot. Starting with 10.1.0.4, it is possible to modify some to have their profile parameter AUTO_START set to the value 2. This means that after node reboot, or when Oracle Clusterware is started, resources with AUTO_START=2 will need to be started

manually via srvctl. Please work with Oracle Support if you need to modify this attribute.

In Oracle Database 10g (10.1.0.4 onwards), such a manual configuration of the AUTO_START parameter is not recommended, except under advice by Oracle Support. This was designed to assist in problem troubleshooting and system maintenance.

In Oracle Database 10g Release 2, when changing resource profiles through srvctl, the command tool will automatically modify the profile attributes of other dependent resources given the current pre-built dependencies. The command to accomplish this is:

srvctl modify database -d <dbname> -y manual

Best practice for implementing Oracle Clusterware and Real Application Clusters is to have Oracle Clusterware start the defined Oracle resources during system boot.

With Oracle RAC 10g Release 2 and Oracle RAC 11g, if you are using the manual option on the database resource, the vendor clusterware must use srvctl to start/stop the database.

457. Is it possible to disable Oracle Clusterware and still run RAC?

No. Disabling Oracle Clusterware in a RAC environment is not supported by Oracle.

458. Can we use Oracle Clusterware without nodeapps?

No. Nodeapps (Node Applications) are a required part of the Oracle Clusterware stack.

459. Can I remove Oracle resources from the cluster? How can I hide unused resources when listing

all resources in the cluster?

One must not remove or delete any Oracle resources from the cluster. Oracle resources are typically preconfigured during the installation of Oracle Clusterware / Grid Infrastructure or added in the course of a default installation or configuration process of other Oracle products. With Oracle Clusterware 11g Releasae Oracle has taken the approach to pre-configure some resources, but activate them (or have them activated) only once required. As long as the components represented by these resources are not used, the Oracle Clusterware proxy resources are set to offline or are disabled, which means, they are not running and should therefore not cause any issues.

If one wants to list only those resources that are actively used in the cluster, the following command can be used:

"crsctl stat res -w 'ENABLED != 0'" (the -t option can be used to get a "tabular view") to display all resources deployed in the cluster but the ones that are disabled.

Note: If ASM is not used for the cluster at all, disable the ASM proxy resource in Oracle Clusterware in order to not list it using this command. To disable the ASM proxy resource in Oracle Clusterware, the "srvctl disable asm [-n (node_name)]" command can be used. If ASM is used to store the Voting Disks and / or OCRs, DB files, or an ACFS file system, ASM must be enabled!

460. Can one use third-party clusterware to manage network interfaces used by Oracle Clusterware for redundancy or IP failover?

Yes. With Oracle Clusterware 11g Release 2 (11.2.0.2), the Redundant Interconnect Usage feature has been introduced, which eliminates the need for such functionality and solutions (or even OS-specific bonding solutions). This feature is always enabled, if supported on the respective OS and with the vendor solution used. If not supported, it will be disabled automatically. Otherwise, the feature cannot be disabled and must be used in conjunction with any vendor provided network management solution, if such a solution is really required. Oracle recommends to use the Redundant Interconnect Usage feature as the only solution with Oracle RAC 11g Release 2 - 11.2.0.2 or higher.

461. What is split brain?

When database nodes in a cluster are unable to communicate with each other, they may continue to process and modify the data blocks independently. If the same block is modified by more than one instance, synchronization/locking of the data blocks does not take place and blocks may be overwritten by others in the cluster. This state is called split brain.

462. What is the difference between Crash recovery and Instance recovery?

When an instance crashes in a single node database on startup a crash recovery takes place. In a RAC enviornment the same recovery for an instance is performed by the surviving nodes called Instance recovery.

463. What is the interconnect used for?

It is a private network which is used to ship data blocks from one instance to another for cache fusion. The physical data blocks as well as data dictionary blocks are shared across this interconnect.

464. How do you determine what protocol is being used for Interconnect traffic?

One of the ways is to look at the database alert log for the time period when the database was started up.

465. What methods are available to keep the time synchronized on all nodes in the cluster?

Either the Network Time Protocol(NTP) can be configured or in 11gr2, Cluster Time Synchronization Service (CTSS) can be used.

466. What files components in RAC must reside on shared storage?

Spfiles, ControlFiles, Datafiles and Redolog files should be created on shared storage.

467. If your OCR is corrupted what options do have to resolve this?

You can use either the logical or the physical OCR backup copy to restore the Repository.

468. How do you find out what object has its blocks being shipped across the instance the most?

You can use the dba_hist_seg_stats.

469. What is OCLUMON used for in a cluster environment?

The Cluster Health Monitor (CHM) stores operating system metrics in the CHM repository for all nodes in a RAC cluster. It stores information on CPU, memory, process, network and other OS data, This information can later be retrieved and used to troubleshoot and identify any cluster related issues. It is a default component of the 11gr2 grid install. The data is stored in the master repository and replicated to a standby repository on a different node.

470. What would be the possible performance impact in a cluster if a less powerful node (e.g. slower CPU’s) is added to the cluster?

All processing will show down to the CPU speed of the slowest server.

471. What is the future of the Oracle Grid?

The Grid software is becoming more and more capable of not just supporting HA for Oracle Databases but also other applications including Oracle’s applications. With 12c there are more features and functionality built-in and it is easier to deploy these pre-built solutions, available for common Oracle applications.

472. What components of the Grid should I back up?

The backups should include OLR, OCR and ASM Metadata.

Manoj Simar - The Technology Expert

About Me

Monday, 2 July 2018

Interview Q and A for Oracle RAC Part - 7