Manoj Simar - The Technology Expert: Interview Q and A for Oracle Backup and Recovery Part

66. What is memory corruption?

Memory in Oracle is classified in the following manner:

- the duration of the instance [SGA],

- the process [PGA],

- the session [UGA],

- the duration of a call [CGA]

When any of the structures in the memory gets corrupted or altered, it become inaccessible, rendering the data unusable. The structures might be related a data block in the memory, a heap structure, etc. Such memory corruptions are well handled by the Oracle, safe guarding the underlying data.

67. What is a PGA corruption?

Program Global Area (PGA) is a memory region containing data and control information for a single process (server or background). One PGA is allocated for each server process; the PGA is exclusive to that server process and is read and written only by Oracle code acting on behalf of that process.

When any memory which is allocated for PGA gets corrupted, we call it a PGA corruption, where in access to that portion of memory (PGA) which is corrupt can lead to the termination of the session which is accessing it. Since the corruption is only to the PGA which is specific to a single process, it does not brings down the instance.

When we get such errors only on a specific session, we can try to close the session and invoke a new session and check if the error occurs, as closing the session will relinquish the memory allocated for that particular session.

68. What is a SGA corruption?

System Global Area is made up of the database buffer cache, shared pool and redo log buffer. Basically this is any data and program caches that are shared among database users.

Since this portion of memory is shared it is very critical that it is consistent. In case there is a portion of this SGA which is corrupted, then when the process accesses this corrupted portion of memory, the instance will get terminated.

Many of the cases, bringing the instance up will solve the problem, unless it is a hardware problem, an Oracle bug or a third party application problem, which continues to get the SGA corrupted.

69. What is heap corruption?

Heap Corruption errors occur when addresses or pointers get corrupted in the memory, thereby rendering the corresponding heap inaccessible. When Oracle tries to read that part of the heap and encounters incorrect information errors will be signaled.

70. What are the symptoms of heap corruption?

Heap corruption can occur in any heap, SGA, PGA, UGA or CGA.

Generally corruption errors in the PGA and UGA might be triggered when that particular heap is accessed or when the session is terminated.

When corruption is in SGA, we can expect a possible termination of the instance.

Corruption errors occurs, in the form of:

- ORA-00600 [17XXX]

- Core dump in any function related to manipulation of memory structures.

Also the following memory leak errors can occur:

- ORA-00600 [711] : Freeing memory and stack discipline violated

- ORA-00600 [723] : PGA memory leak

- ORA-00600 [729] : UGA memory leak

- ORA-00600 [733] : Memory requested size to big for this port

- ORA-00600 [736] : Problems with number of elements in segmented loop macro

71. What are the causes of heap corruption?

The cause of heap corruption is difficult to identify. The reason being, that the process which is corrupting the memory is not the process which is reporting the corruption.

The various possible causes of heap corruption are as follows:

OS/Hardware:

There are chances for problems on the OS/Hardware which can corrupt memory pointers or even zero it out, and make it inaccessible. In such cases, its better to check the system logs to check for possible problems. Also performing OS/Hardware diagnostics might help.

Oracle Bug:

Oracle can also internally cause memory corruption if it is a bug with the Oracle code which manipulates the memory structures. For example, Dereferencing pointers, overwriting memory chunks, etc. In such cases, Oracle support will help you to identify if it might be a possible bug or not.

Other applications:

It is possible for other third party application to cause memory corruption, if such applications mistakenly overwrites memory chunks, or dereferences pointers.

However, the above reason will help only if the error is reproducible.

72. How are memory corruption errors reported?

Generally, memory corruption errors gets reported when the session is getting terminated. When the session is getting terminated, the heap allocated for that session will get dissolved and all the chunks will be freed. During this process if Oracle identifies any corrupted chunks, it will report in the alert log file. Also a trace file will be generated.

In case of corruption in the SGA, it might result in a possible termination of the instance. Before crashing the instance the error message will be reported in the alert log file and corresponding trace file will also be generated.

73. How can we identify which heap is corrupted?

At the beginning of the trace file, after the trace header dump, you will find such message. In this case the heap

corruption error is ORA-00600 [17147].

********** Internal heap ERROR 17147 addr=ffffffff7b55dfa8 *********

***** Dump of memory around addr ffffffff7b55dfa8:

After this you will find a hex dump around the chunk which was corrupted. After the hex dump, you will find formatted heap dump which will contain the name of the heap dump which contains the corrupted chunk.

******************************************************

HEAP DUMP heap name="pga heap" desc=106528de0

extent sz=0x20c0 alt=184 het=32767 rec=0 flg=2 opc=3

parent=0 owner=0 nex=0 xsz=0xfff0

In our case the corrupted heap is "PGA HEAP". Similarly we can identify the corresponding heap which is corrupted.

74. What is a bad magic number?

Magic number in the chunk basically is used to make sure the chunk header is consistent. If you get an ORA-600 [17xxx] error that possibly indicates that the chunk is corrupted with a BAD MAGIC NUMBER.

In the formatted heap dump, you might find a similar pattern which indicates the corrupted chunk.

Chunk ffffffff7b55bf68 sz= 8256 BAD MAGIC NUMBER IN NEXT CHUNK (2059)

75. What is the impact of heap corruption?

In most of the cases heap corruption errors are short lived, except situation when they are Oracle bugs or OS/Hardware problems. In such cases the memory chunks might get corrupted due to some memory overwrites,

however, oracle while flushing out the corrupted chunks and bringing in new chunks will automatically resolve such corruptions and Oracle will stop reporting of such corruptions in the alert log file.

Since this affects only the chunks in the memory, there is no data corruption due to this error. The data in the database is safe.

76. Is it possible to find the offending SQL if any?

The trace file should contain this information. If the error occurs while running particular SQL, then in the trace file, we can check out the section "Current SQL statement for this session". Below this section header, you should find the offending SQL if any. You can run this SQL to check if the error occurs.

77. Are there any initial diagnosis that can be done?

Possible things to check,

a. Operations that was performed when the error occured.

b. Is the error consistently occuring?

c. Any specific statements, if run generates this error.

d. Check if Third Party Applications involved.

e. Check the system logs to find if there were any OS/Hardware error messages.

78. Are there any possible workarounds that can be tried?

Incase of heap corruptions, we can try out the possible workaround:

a. If possible we can try to bounce the database. This will clear out all the allocated memory chunks and when the database comes up, it will allocate fresh chunks of memory. This can possibly resolve the memory corruption, unless the error reproduces after bouncing the database.

b. Flush the shared pool.

We can try to flush the shared pool, which will flush out all the chunks and will bring in fresh chunks.

SQL> alter system flush shared_pool;

This can possibly resolve the memory corruption, unless the error reproduces after flushing the shared pool.

79. What is Full Backup ?

A full backup is an operating system backup of all data files, on-line redo log files and control file that constitute ORACLE database and the parameter.

80. Can Full Backup be performed when the database is open ?

No.

81. What is Partial Backup ?

A Partial Backup is any operating system backup short of a full backup, taken while the database is open or shut down.

82. What are the different methods of backing up oracle database ?

- Logical Backups

- Cold Backups

- Hot Backups (Archive log)

83. What is a logical backup ?

Logical backup involves reading a set of databse records and writing them into a file. Export utility is used for taking backup and Import utility is used to recover from backup.

84. What is cold backup ? What are the elements of it ?

Cold backup is taking backup of all physical files after normal shutdown of database.

We need to take.

- All Data files. - All Control files. - All on-line redo log files. - The init.ora file (Optional)

85. What is hot backup and how it can be taken ?

Taking backup of archive log files when database is open. For this the ARCHIVELOG

mode should be enabled. The following files need to be backed up.

All data files. All Archive log, redo log files. All control files.

86. Explain the difference between a hot backup and a cold backup and the

benefits associated with each.

A hot backup is basically taking a backup of the database while it is still up and

running and it must be in archive log mode. A cold backup is taking a backup of the

database while it is shut down and does not require being in archive log mode. The

benefit of taking a hot backup is that the database is still available for use while the

backup is occurring and you can recover the database to any point in time. The

benefit of taking a cold backup is that it is typically easier to administer the backup

and recovery process. In addition, since you are taking cold backups the database

does not require being in archive log mode and thus there will be a slight performance

gain as the database is not cutting archive logs to disk.

87. You have just had to restore from backup and do not have any control files. How would you go about bringing up this database?

I would create a text based backup control file, stipulating where on disk all the data files where and then issue the recover command with the using backup control file clause.

88. What command would you use to create a backup control file?

Alter database backup control file to trace.

89. What is cold backup and hot backup (in case of Oracle)?

Cold Backup: It is copying the three sets of files (database files, redo logs, and control file)

when the instance is shut down. This is a straight file copy, usually from the disk directly to

tape. You must shut down the instance to guarantee a consistent copy. If a cold backup is

performed, the only option available in the event of data file loss is restoring all the files from

the latest backup. All work performed on the database since the last backup is lost.

Hot Backup: Some sites (such as worldwide airline reservations systems) cannot shut down the

database while making a backup copy of the files. The cold backup is not an available option.

So different means of backing up database must be used — the hot backup. Issue a SQL

command to indicate to Oracle, on a tablespace-by-tablespace basis, that the files of the

tablespace are to backed up. The users can continue to make full use of the files, including

making changes to the data. Once the user has indicated that he/she wants to back up the

tablespace files, he/she can use the operating system to copy those files to the desired backup

destination. The database must be running in ARCHIVELOG mode for the hot backup option. If

a data loss failure does occur, the lost database files can be restored using the hot backup and

the online and offline redo logs created since the backup was done. The database is restored to

the most consistent state without any loss of committed transactions.

90. What is the difference between using recovery catalog and control file?
When new incarnation happens, the old backup information in control file will be lost. It will be preserved in recovery catalog.
In recovery catalog we can store scripts.
Recovery catalog is central and can have information of many databases.

91. How do you know that how much RMAN task has been completed?
By querying v$rman_status or v$session_longops

92. From where list & report commands will get input?
Both the commands command quering v$ and recovery catalog views. V$BACKUP_FILES or many of the recovery catalog views such asRC_DATAFILE_COPY or RC_ARCHIVED_LOG.

93. Command to delete archive logs older than 7days?
RMAN> delete archivelog all completed before sysdate-7;

94. How many times does oracle ask before dropping a catalog?
The default is two times one for the actual command, the other for confirmation.

95. How to view the current defaults for the database.
RMAN> show all;

96. What is the use of crosscheck command in RMAN?
Crosscheck will be useful to check whether the catalog information is intact with OS level information. This command only updates repository records with the status of the backups.
e.g. If user removes archived logs from disk with an operating system command, the repository still indicates that the logs are on disk, when in fact they are not.

97. What are the differences between crosscheck and validate commands?
Validate command is to examine a backup set and report whether it can be restored. RMAN scans all of the backup pieces in the specified backup sets and looks at the checksum to verify that the contents are intact so that backup can be successfully restored if necessary.
Crosscheck command is to verify the status of backups and copies recorded in the RMAN repository against media such as disk or tape. The crosscheck command only processes files created on the same device type as the channel running crosscheck.

98. Which one is good, differential (incremental) backup or cumulative (incremental) backup?
A differential backup, which backs up all blocks changed after the most recent incremental backup at level 1 or 0
RMAN> BACKUP INCREMENTAL LEVEL 1 DATABASE;
A cumulative backup, which backs up all blocks changed after the most recent incremental backup at level 0
RMAN> BACKUP INCREMENTAL LEVEL 1 CUMULATIVE DATABASE;
Cumulative backups are preferable to differential backups when recovery time is more important than disk space, because during recovery each differential backup must be applied in succession. Use cumulative incremental backups instead of differential, if enough disk space is available to store cumulative incremental backups.
This is command for taking Level 0 backup.
RMAN> BACKUP INCREMENTAL LEVEL 0 DATABASE;

99. What is the difference between backup set and backup piece?
Backup set is logical and backup piece is physical.

100. RMAN command to backup for creating standby database
RMAN> duplicate target database

Manoj Simar - The Technology Expert

About Me

Sunday, 1 July 2018

Interview Q and A for Oracle Backup and Recovery Part - 3

No comments:

Post a Comment