51. What
parameters can be used to configure Oracle GoldenGate to extract data from
Archived Redo log files only?
Use
the TRANLOGOPTIONS ARCHIVEDLOGONLY option in the parameter file.
52. I have a
one-way replication setup. The system administration team wants to apply an OS
patch to both the OGG source host and the target servers. Provide the sequence
of steps that you will carry before and after applying this patch.
Procedure:
- Check to make sure that
the Extract has processed all the records in the data source (Online
Redo/archive logs)
GGSCI> send extract , logend
(The above command should print YES)
- Verify the extract, pump
and replicat has zero lag.
GGSCI> send extract , getlag
GGSCI> send extract , getlag
GGSCI> send replicat , getlag
(The above command should pring “At EOF, no more records to process.”)
- Stop all application and
database activity.
- Make sure that the
primary extract is reading the end of the redolog and that there is no LAG
at all for the processes.
- Now proceed with
stopping the processes:
Source:
- Stop the primary extract
- Stop the pump extract
- Stop the manager process
- Make sure all the
processes are down.
Target:
- Stop replicat process
- Stop mgr
- Make sure that all the
processes are down.
- Proceed with the
maintenance
- After the maintenance,
proceed with starting up the processes:
Source:
- Start the manager
process
- Start the primary
extract
- Start the pump extract
(Or simply all the extract processes as GGSCI> start extract *) - Make sure all that the
processes are up.
Target:
- Start the manager
process
- Start the replicat
process.
- Make sure that all the
processes are up.
53. What are
the basic resources required to configure Oracle GoldenGate high availability
solution with Oracle Clusterware?
There
are 3 basic resources required:
- Virtual IP
- Shared storage
- Action script
54. Does Oracle GoldenGate replicate DDL statements or DML
for Oracle Database?
Oracle
GoldenGate replicates DML and DDL operations for Oracle Database and Teradata.
55. Does GoldenGate support SQL Server 2012 extraction? In
the past only apply was supported.
Yes,
starting with the new 12c release GoldenGate captures from SQL Server 2012 in
addition to delivery capabilities.
56. Which RDBMS does GoldenGate 12c support?
GoldenGate
supports all major RDBMS. For a full list of supported
57. How can GoldenGate be better than database trigger to
push data into custom tables?
Triggers
can cause high CPU overhead, in some cases almost double compared to reading
from redo or transaction logs. In addition, they are intrusive to the
application and cause management overhead as application changes. Oracle
GoldenGate's log-based change data capture is not only low-impact in terms of
CPU utilization, but also non-intrusive to the application with low maintenance
requirements.
58. Does GoldenGate 12c support compressed objects for
replication? Also, does it supports BLOB/CLOB columns?
Yes,
GoldenGate 12c and GoldenGate 11gR2 both support compressed objects. GoldenGate
has been supporting BLOB/CLOB columns since version 10.
59. Is Oracle Database 11.2.0.4 mandatory to use GoldenGate
12c Integrated Delivery? Not earlier versions?
Yes.
To use GoldenGate 12c’s Integrated Delivery, for the target environment Oracle
Database 11.2.04 and above is required.
60. We have Oracle Streams implementation for more than 5
years. We would like to migrate to GoldenGate, however older version of
GoldenGate were not supporting filtering individual transactions. Is it
supported in GoldenGate 12c?
Yes,
it is supported in GoldenGate 12c.
61. How to Resync a Single Table with Minimum Impact to Another
Tables' Replication?
1. Stop your Replicat on the Target side, most likely it
already abended, goto step 2.
GGSCI>STOP REPLICAT <NAME>
2. Comment out the MAP statement of that table in your Replicat parameter file
3. Restart your Replicat, so all other tables are still being replicated.
4. Use the LAG command to check the LAG of the Replicat, wait till the Replicat catches up.
ggsci>lag replicat <name>
How to tell it catches up? When the LAG result shows AT EOF
5. Stop both the target Replicat and source extract.
GGSCI>STOP REPLICAT <NAME>
2. Comment out the MAP statement of that table in your Replicat parameter file
3. Restart your Replicat, so all other tables are still being replicated.
4. Use the LAG command to check the LAG of the Replicat, wait till the Replicat catches up.
ggsci>lag replicat <name>
How to tell it catches up? When the LAG result shows AT EOF
5. Stop both the target Replicat and source extract.
6. In source
extract parameter file, add:
FETCHOPTIONS FETCHPKUPDATECOLS
N.B.
FETCHOPTIONS FETCHPKUPDATECOLS is not valid for NSK Guardian extracts.
Omit it.
7. restart the
extract.
8. reload the
table from the source to target using your initial load method after deleting
existing rows in the target table.
9. Wait till
initial load is finished, get the table back in your Replicat parameter file,
put on handle collisions parameter
10. Restart your Replicat
11. Wait till the Replicat catches up, stop the source Extract and target Replicat
12. Comment out handle collisions in Replicat parameter file, and comment out "FETCHOPTIONS FETCHPKUPDATECOLS" in Extract parameter.
10. Restart your Replicat
11. Wait till the Replicat catches up, stop the source Extract and target Replicat
12. Comment out handle collisions in Replicat parameter file, and comment out "FETCHOPTIONS FETCHPKUPDATECOLS" in Extract parameter.
13. restart the
Extract and Replicat.
This procedure applies to all the platforms, all databases.
This procedure applies to all the platforms, all databases.
62.
What Causes The 'Bad Column Index(xxxx)' Error in Replicat?
There
are some potential reasons and workarounds for this error:
The GGS ERROR 160 Bad Column Index(xxxx) error in Replicat is
caused by a Source Column Index "xxxx" greater than the number of
columns in the Source Table Definition File Input to the Replicat process or if
the ASSUMETARGETDEFS parameter is used and the Source Table and Target Table do
not have the same structure, the Source Table has more columns than the Target
Table.
Example
GGS ERROR 160 Bad column index(129) specified for table {table name}, max columns = 127
Explanation
The source table trail record has an index and data for column number 129 but only 127 columns are defined in the Source Table Definition File or when the ASSUMETARGETDEFS parameter is used in the Replicat Parameter File the Target Table contains 127 columns.
This is generally caused by changes in the Source Table or Target Table(i.e. columns have been added or deleted and a new Source Definition File has not been created to reflect the Source Table structure to match the Trail Records that Replicat is trying to process.
To resolve this error, run DEFGEN on the Source System for the Table causing the Replicat abend, copy that Definition File to the Target system. Add this SOURCEDEFS file to the Replicat Parameter file and restart the Replicat process.
2) Encryption and decrypttrail
Example
GGS ERROR 160 Bad column index(129) specified for table {table name}, max columns = 127
Explanation
The source table trail record has an index and data for column number 129 but only 127 columns are defined in the Source Table Definition File or when the ASSUMETARGETDEFS parameter is used in the Replicat Parameter File the Target Table contains 127 columns.
This is generally caused by changes in the Source Table or Target Table(i.e. columns have been added or deleted and a new Source Definition File has not been created to reflect the Source Table structure to match the Trail Records that Replicat is trying to process.
To resolve this error, run DEFGEN on the Source System for the Table causing the Replicat abend, copy that Definition File to the Target system. Add this SOURCEDEFS file to the Replicat Parameter file and restart the Replicat process.
2) Encryption and decrypttrail
- Check
if the trail source is encrypted and if it is, then add the decrypttail
parameter to the Replicat to eliminate the error.
- If
the extract does not use encryption, and the replicat has decrypttrail enabled,
replicat will read an un-encrypted number and translate it to a column
index that does not conform to the proper index number. In that case
remove the decrypttail parameter from the Replicat.
- Note:
DECRYPT/ENCRYPT parameters must be set before RMTTRAIL parameter for a
correct behavior of encryption/decription
63. Delete extract/replicat forcefully
To delete extract/replicat, if simple delete command is not
working, then we can do it forcibly by adding exclamation mark (!) at the end.
1. How to stop extract/replicat?
kill extract group_name
kill replicat group_name
kill replicat group_name
2. How to delete extracat / replicat
delete extract group_name [!]
delete replicat group_name [!]
delete replicat group_name [!]
3. How to cleanup (stop first)
cleanup extract group_name
cleanup replicat group_name
Error:
cleanup replicat group_name
Error:
GGSCI (XXXn1) 27> delete replicat r_XXX
ERROR: Could not delete DB checkpoint for REPLICAT R_XXX (SQL operation failed: Deleting from checkpoint table GGATE.CHECKPOINT, group R_XXX, key 2648753342 (0x9de0c4be), SQL SQL Error 972: ORA-00972: identifier is too long).
ERROR: Could not delete DB checkpoint for REPLICAT R_XXX (SQL operation failed: Deleting from checkpoint table GGATE.CHECKPOINT, group R_XXX, key 2648753342 (0x9de0c4be), SQL SQL Error 972: ORA-00972: identifier is too long).
GGSCI (xxn1) 35> delete replicat r_XXX !
ERROR: REPLICAT R_XXX does not exist.
ERROR: REPLICAT R_XXX does not exist.
64.
How to Handle Tables without Primary Keys or Unique Indexes with Oracle
GoldenGate
In order to
maintain data integrity when replicating transactional data, Oracle GoldenGate
will use primary key columns or unique index columns to uniquely identify a row
when issuing update or delete statements against the target database. If no
primary keys or unique indexes exist on the table being replicated, Oracle
GoldenGate will use all columns to uniquely identify a row.
It is perfectly acceptable to use all columns to uniquely identify
a row under the following conditions:
- A
logical key column cannot be defined for the table using the KEYCOLS
parameter.
- No
duplicate rows exist in the table
- Table
contains a small number of rows, so full table lookups on the target
database are minimal
- Table
DML activity is very low, so "all column" table supplemental log
groups do not negatively impact the source database redo logs
If
the table being replicated does not meet all of the conditions listed above, it
is recommended to add a column to the table with a SYS_GUID default value to
uniquely identify the row
65.
Why Does GoldenGate Report the Lag Is Unknown or a Timestamp Mismatch?
The first thing
to check is to make sure the timezone environmental parameter is set correctly.
Here are the steps to set the correct timezone TZ environmental parameter.
Example
$
> env |grep TZ
TZ=PST9PDT
$ > date
Mon Mar 12 12:40:57 PDT 2007
$ > export TZ=PST8PDT
$ > date
Mon Mar 12 13:41:08 PDT 2007
TZ=PST9PDT
$ > date
Mon Mar 12 12:40:57 PDT 2007
$ > export TZ=PST8PDT
$ > date
Mon Mar 12 13:41:08 PDT 2007
You must re-start the Manager and all other GoldenGate groups.
If
the TZ environmental parameter is not the issue, then there is another condition
under which the "timestamp mismatch" message may be displayed. The
"timestamp mismatch" may also be displayed when the Extract or
Replicat is completely caught up and a LAG command is issued. The LAG command
will try to compute the difference between current timestamp on target versus
the actual timestamp of the processed record (which comes from the source).
When the difference is actually a negative value, then we will display the
"timestamp mismatch" message. If these times are very close, this can
be when we calculate a negative lag (target timestamp - record timestamp).
Following
can be done from sqlplus
sqlplus
ggadmin@FILCRDP1
ALTER SESSION set NLS_DATE_FORMAT='MM/DD/YY HH24:Mi:SS';
select sysdate from dual;
host date
sqlplus ggadmin@FILCRDP2
ALTER SESSION set NLS_DATE_FORMAT='MM/DD/YY HH24:Mi:SS';
select sysdate from dual;
ALTER SESSION set NLS_DATE_FORMAT='MM/DD/YY HH24:Mi:SS';
select sysdate from dual;
host date
sqlplus ggadmin@FILCRDP2
ALTER SESSION set NLS_DATE_FORMAT='MM/DD/YY HH24:Mi:SS';
select sysdate from dual;
If TZ difference
is in another direction, the related extract and replicat may display
unexpected lag. If the process is at EOF, the lag may be on and off, when
on it shows several hours (the TZ difference) lag.
66.
GoldenGate Extract Abends with "Unable to lock file" Error For Trail
File
Cause
The trails
cannot be exclusively lock for writes by the server/collector process running
on the target. As of v10.4, Server/Collector locks the trail file to prevent
multiple processes from writing to the same trail file, so new Server/Collector
processes are unable to lock the trail files.
Network outages
that last longer than the time the TCP/IP stack is configured to
retransmit unacknowledged packets may result in "orphan" TCP/IP
connections on the RMTHOST system. Since the local system has closed the
connections and the "RST" packets were lost due to the network
outage, no packets (data or "control") will ever be sent for these connections.
Since the RST packets were not delivered to the RMTHOST, the TCP/IP stack will not present an error to the Server/Collector process The Server/Collector process will continue to wait, passively, forever, for new data that will never arrive because the Extract process on the other system is no longer running.
A second cause for this symptom is that the remote server was rebooted and the Network-Attached Storage (NAS) device where the target trails reside did not detect and was not notified of the reboot, so the locks acquired prior to the reboot are still considered to be in force.
Since the RST packets were not delivered to the RMTHOST, the TCP/IP stack will not present an error to the Server/Collector process The Server/Collector process will continue to wait, passively, forever, for new data that will never arrive because the Extract process on the other system is no longer running.
A second cause for this symptom is that the remote server was rebooted and the Network-Attached Storage (NAS) device where the target trails reside did not detect and was not notified of the reboot, so the locks acquired prior to the reboot are still considered to be in force.
Solution
1) Investigate why a server/collector process
is still running when a new server/collector process is started to access the
same trail. You can kill orphan server/collector to resolve the immediate
issue.
2) You can
overwrite the server/collector by using the RMTHOST UNLOCKEDTRAILS option. Use
this option with CAUTION as it can cause trail corruption. You must investigate
why the trails are locked by another server or kill these server/collector
processes.
NOTE that if an
extract pump is stopped normally, the server/collector process stops
immediately. By default, current versions (11.1/11.2 onwards) has a default
timeout of 5 mins. One can overwrite this value using the RMTHOST TIMEOUT
option. Example setting timeout to 40 seconds.
RMTHOST
192.168.10.1, MGRPORT 7809, PARAMS TIMEOUT 40
This tells the
Server/Collector to terminate if it doesn't receive any checkpoint information
for more than 40 seconds. DO NOT set too low a value, TCPIP communication
performance varies throughout the day.
Other notes:
NAS related
issue:
In the case
where the NAS was unaware that the system had been rebooted, the best long-term
solution is to contact the NAS vendor, who might be able to provide an utility
program that can be run early in the system startup process to notify the NAS
that it should release all locks owned by this system. The following procedure
might offer a short-term work-around:
- Stop all REPLICAT
processes that read the trail file.
- Stop the target MGR
process.
- Copy trail file xx000000
to xx000000.bk
- Delete trail file
xx000000.
- mv xx000000.bk to
xx000000.
- Repeat steps 2-5 for
each trail file that can't be locked.
- From the shell, kill the
server (collector) process that was writing to the trail.
ie Check on OS level for orphan processes, e.g. on unix style
OS's: ps -ef | grep server
8. If any such orphan servers exist, e.g.:
9. oracle 27165 1 0 11:20 ? 00:00:00 ./server -p 7840 -k -l /opt/oracle/gg/ggserr.log
10. Then: kill 27165 (or, kill -9 27165) (for this particular case)
- Start MGR.
- Start the REPLICAT
processes.
- Re-start the extract
that abended and gave this error message.
Note that this
may not work, depending on the NAS and the way it keeps track of advisory file
locks acquired using fcntl( F_GETLK ).
Cluster
failover:
When a system is
failover to another node, the GoldenGate processes should be stopped typically
by using ggsci > stop * and > stop mgr commands, however processes such a
server/collectors remain running. Stop the extract pumps manually or kill the
processes. You should check that no processes or running from the GoldenGate
directory before switching GoldenGate to run on another node.
67.Does
GoldenGate Support Tables With An UNUSED Column?
Yes.
starting in version 9.5, GoldenGate supports tables with unused columns, but
the support is disabled by default, and Extract abends on them. You can use the
DBOPTIONS parameter with the ALLOWUNUSEDCOLUMN option to force Extract to
generate a warning and continue processing. When using ALLOWUNUSEDCOLUMN,
either the same unused column must exist in the target table, or a source
definitions file must be created for Replicat with the DEFGEN utility. You can
include the appropriate ALTER TABLE...SET UNUSED statements in a DDL
replication configuration.
Prior to version 9.5, GoldenGate does not support tables with UNUSED columns. The Extract will abend with an error:
table Schema.TABLENAME;
2007-02-21 13:58:08 GGS ERROR 118 unable to validate table
Schema.TABLENAME 57148 due to unused column.
The only solution if running a pre 9.5 version of GoldenGate is to drop the unused columns.
Prior to version 9.5, GoldenGate does not support tables with UNUSED columns. The Extract will abend with an error:
table Schema.TABLENAME;
2007-02-21 13:58:08 GGS ERROR 118 unable to validate table
Schema.TABLENAME 57148 due to unused column.
The only solution if running a pre 9.5 version of GoldenGate is to drop the unused columns.
68.
Extract abends,mss: missing trailing blocks, unable to queue I/O, I/O beyond
file size
Symptoms
extract that reads archived logs on ASM hits following
error:
Problem now occuring with a 1 Extract configuration. Extract abends with:
Problem now occuring with a 1 Extract configuration. Extract abends with:
SQL <BEGIN
dbms_diskgroup.read(:handle, :offset, :length, :buffer); END;>: (27091)
ORA-27091: unable to queue I/O
212146455571218590 Redo Thread 2: ORA-17510: Attempt to do i/o beyond file size
212146455571218590 Redo Thread 2: ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 124
212146455571218590 Redo Thread 2: ORA-06512: at line 1 )-> 500
Cause
212146455571218590 Redo Thread 2: ORA-17510: Attempt to do i/o beyond file size
212146455571218590 Redo Thread 2: ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 124
212146455571218590 Redo Thread 2: ORA-06512: at line 1 )-> 500
Cause
There can be 2 causes for this error:
1. ASM diskgroup is frequently mounted and dismounted, which makes the ASM diskgroup temporarily unavailable.
2. When extract is processing archived log, extract waits for the log to be completely written out before starting to processing them. However if extract could not validate this via check on the block header from very last block and file size is not increasing, then extract abends after a wait of about 10 seconds.
1. ASM diskgroup is frequently mounted and dismounted, which makes the ASM diskgroup temporarily unavailable.
2. When extract is processing archived log, extract waits for the log to be completely written out before starting to processing them. However if extract could not validate this via check on the block header from very last block and file size is not increasing, then extract abends after a wait of about 10 seconds.
Solution
Based on the cause, following solution/workaround may be
used:
1. Check database alert.logs. If there are many mounted and dismounted messages for the ASM diskgroup, this is likely a known issue particularly in an environment where multiple database are being supported where when an archive log is written then we dismount the diskgroup. If another environment tries to access the diskgroup while it is being dismounted we get errors. A small dummy tablespace in the related diskgroup may be created. This will then prevent the diskgroup from constantly being mounted and dismounted. This will then eliminate the possibility for this to be the cause.
e.g, CREATE TABLESPACE dummy DATAFILE '+FLASH01' SIZE 10M;
In this way there is always a file open in the diskgroup and therefore it will not be repeatedly mounted and dismounted.
2. If the above is not the cause of the problem, following parameter may be used:
"TRANLOGOPTIONS COMPLETEARCHIVEDLOGTIMEOUT 600 "
When extract is processing archived log, extract would wait for the log to be completely written out before it start processing them. However if extract could not validate this via check on the block header from very last block and file size is not increasing, then extract would abend after a wait of about 10 seconds. The parameter "COMPLETEARCHIVEDLOGTIMEOUT" is to be followed by integer number to indicate the number of seconds to wait.
1. Check database alert.logs. If there are many mounted and dismounted messages for the ASM diskgroup, this is likely a known issue particularly in an environment where multiple database are being supported where when an archive log is written then we dismount the diskgroup. If another environment tries to access the diskgroup while it is being dismounted we get errors. A small dummy tablespace in the related diskgroup may be created. This will then prevent the diskgroup from constantly being mounted and dismounted. This will then eliminate the possibility for this to be the cause.
e.g, CREATE TABLESPACE dummy DATAFILE '+FLASH01' SIZE 10M;
In this way there is always a file open in the diskgroup and therefore it will not be repeatedly mounted and dismounted.
2. If the above is not the cause of the problem, following parameter may be used:
"TRANLOGOPTIONS COMPLETEARCHIVEDLOGTIMEOUT 600 "
When extract is processing archived log, extract would wait for the log to be completely written out before it start processing them. However if extract could not validate this via check on the block header from very last block and file size is not increasing, then extract would abend after a wait of about 10 seconds. The parameter "COMPLETEARCHIVEDLOGTIMEOUT" is to be followed by integer number to indicate the number of seconds to wait.
69. When
Replicat is Running but Not Applying Records
Occasionally, Replicat will show a status of
running but a report will show no records have been updated. There are 3 common
causes for this.
Troubleshooting Steps
These causes are:
1) The wrong trail is
being read
2) The wrong trail file
is being read
3) The table name format
is incorrectly specified
4) Wrong table
name specified
Each of these cases will
show a status of running. A review of the report will show that no records have
been replicated.
Case 1: The wrong
trail is being read
Do an info detail of the
extract or look at the rmttrail statement in the extract params
Do an info detail of the
replicat or review the replicat report
The replicat trail file
being read must be the same name, machine and location of the trail the extract
is writing.
Common errors are
specifying the wrong target machine or the wrong subdirectory for the trail
Erroneous extract param
files and add exttrail or add replicat statements are often the cause
This is a setup error
and needs to be reconfigured
Case 2: The wrong
trail file is being read
Do an info detail of the
replicat or review the replicat report
Examine the trail files
(ls -l) on the target.
Does the trail file
being read exist?
Is the replicat at the
end of a trail file not advancing to the next available file?
The solution is to do an
> alter rep <repname>, extseqno <NNNN>, extrba 0 to cause the
replicat to advance to the next file and restart the replicat
This is commonly caused
by doing an etrollover on the source. This generally requires doing an alter on
all downstream processes. This is an administrative error.
Case 3: The table name format is
incorrectly specified
The schema.table name in
the replicat param file must match the schema.table name in the trail.
Do a logdump on a trail
file and do a count command with detail turned on
Or use logdump to look
directly at a row to be applied with ghdr turned on
The table.schema in the
replicat must match that in the trail file. On NSK, this includes matching any
system name occurrence or absence.
Additionally, changes
may be made on the source side that may change the table specification
This is commonly a setup
error that is easily corrected. Adjust the replicat specification, repoint the
replicat to the beginning of data and rerun.
Case4: Wrong
table name specified
This could be a case
where the source table name of our interest is not the one which is intended.
Need to check the
parameter file and correct the table name. Once the check is done, need
to alter the replicat back to earlier sequence or to a particular RBA and
restart it to apply the transactions.
70. How to recover from Extract ERROR 180 encountered commit SCN
that is not greater than the highest SCN already processed
How to recover from Extract ERROR 180 encountered commit
SCN<n.xxx> that is not greater than the highest SCN already processed (GG
Version 10)
Solution
This applies for Oracle GoldenGate Version 10 and above only
using Version 10 formatted trail and NOT using the RMTTRAIL or EXTTRAIL FORMAT
RELEASE parameter.
Description:
This error occurs in an Oracle RAC environment after a transaction is written to the idle node but does not yet appear in the redo log when the current transaction is processed. This transaction will have a higher SCN then the previous transaction.
Solution Overview:
Verify that the Extract Pump and Replicat have completely processed the records in trail then alter Extract Pump and Replicat to process new trail created by the ETROLLOVER from the main Extract.
Solution Details:
1. Do an ETROLLOVER on Extract, and take note of the new sequence number of the trail file.
ALTER EXTRACT [name], ETROLLOVER
2. Start extract
START EXTRACT [name]
3. Send PUMP, LOGEND, to see if it's at the end of the previous trail.
SEND EXTRACT [pump_name], LOGEND
4. Once it is at the end of the trail file, You must stop the pump, and do an ETROLLOVER for it too. Take note of the new trail file sequence number that is created from this step.
STOP EXTRACT [pump_name]
ALTER EXTRACT [pump_name], ETROLLOVER
5. Alter the pump to SEQNO to the new trail file created from step #1.
ALTER EXTRACT [pump_name], EXTSEQNO ##### EXTRBA 0
6. Restart pump
START EXTRACT [pump_name]
7. Send Replicat, LOGEND to make sure it has processed all the remaining data, and stop Replicat.
SEND REPLICAT [name], LOGEND
STOP REPLICAT [name]
8. If replicat is not at end of trail, generate a report and forcestop replicat
SEND REPLICAT [name], REPORT
STOP REPLICAT [name]!
9. Add the following parameters to replicat parameter file to allow replicat to process each trail record as a single transaction, or set them to 1 if you have any of these parameters:
GROUPTRANSOPS 1
MAXTRANSOPS 1
10. Restart replicat
START REPLICAT [name]
11. Once replicat has completely processed the trail, stop the replicat
STOP REPLICAT [name]
12. Edit the replicat parameter file:
- Add parameter HANDLECOLLISIONS to Replicat parameter file
- Remove or comment out GROUPTRANSOPS and MAXTRANSOPS or revert them back to their original values.
Description:
This error occurs in an Oracle RAC environment after a transaction is written to the idle node but does not yet appear in the redo log when the current transaction is processed. This transaction will have a higher SCN then the previous transaction.
Solution Overview:
Verify that the Extract Pump and Replicat have completely processed the records in trail then alter Extract Pump and Replicat to process new trail created by the ETROLLOVER from the main Extract.
Solution Details:
1. Do an ETROLLOVER on Extract, and take note of the new sequence number of the trail file.
ALTER EXTRACT [name], ETROLLOVER
2. Start extract
START EXTRACT [name]
3. Send PUMP, LOGEND, to see if it's at the end of the previous trail.
SEND EXTRACT [pump_name], LOGEND
4. Once it is at the end of the trail file, You must stop the pump, and do an ETROLLOVER for it too. Take note of the new trail file sequence number that is created from this step.
STOP EXTRACT [pump_name]
ALTER EXTRACT [pump_name], ETROLLOVER
5. Alter the pump to SEQNO to the new trail file created from step #1.
ALTER EXTRACT [pump_name], EXTSEQNO ##### EXTRBA 0
6. Restart pump
START EXTRACT [pump_name]
7. Send Replicat, LOGEND to make sure it has processed all the remaining data, and stop Replicat.
SEND REPLICAT [name], LOGEND
STOP REPLICAT [name]
8. If replicat is not at end of trail, generate a report and forcestop replicat
SEND REPLICAT [name], REPORT
STOP REPLICAT [name]!
9. Add the following parameters to replicat parameter file to allow replicat to process each trail record as a single transaction, or set them to 1 if you have any of these parameters:
GROUPTRANSOPS 1
MAXTRANSOPS 1
10. Restart replicat
START REPLICAT [name]
11. Once replicat has completely processed the trail, stop the replicat
STOP REPLICAT [name]
12. Edit the replicat parameter file:
- Add parameter HANDLECOLLISIONS to Replicat parameter file
- Remove or comment out GROUPTRANSOPS and MAXTRANSOPS or revert them back to their original values.
Note: There are pre-conditions for using HANDLECOLLISIONS .
There must be either
a. no pkupdates
or
b. extract has "FETCHOPTIONS FETCHPKUPDATECOLS"
Also all the tables should have Primary key or unique index on the
table to avoid data integrity issues when using handlecollisions.
13. ALTER REPLICAT, SEQNO to the new trail file created in step
#4.
ALTER REPLICAT [name], EXTSEQNO ###### EXTRBA 0
14. Start Replicat
START REPLICAT [name]
15. Once Replicat has processed the out of order SCN operations, disable HANDLECOLLISIONS. You could also look for the CSN and wait for Replicat to checkpoint past it.
SEND REPLICAT [name], NOHANDLECOLLISIONS.
16.Edit the replicat parameter and comment out the HANDLECOLLISIONS parameter. You do not need to stop/restart replicat. This will ensure that on any subsequent replicat restarts the parameter is disabled.
Note:
If the out of order transactions happen to the SAME record on the SAME table, then the Replicat will probably abend in step 7 or step 10. If that is the case, look at the discard file find out that particular record (the primary key value). Then add handlecollisions, continue with the steps below. Later, once the Replicat caught up, that particular record needs to be manually synced.
However, the chance of an out of order transactions happening to the SAME record on SAME table is rare. Given an example, this will mean that the user application inserts record A from node 1, then immediately it updated this record A on node 2. Out of order transactions usually happens to different tables or different records on same table.
In other words, if you do see collisions been handled in this recovery procedure, that means you have to re-sync ONLY those particular records that have collisions.
ALTER REPLICAT [name], EXTSEQNO ###### EXTRBA 0
14. Start Replicat
START REPLICAT [name]
15. Once Replicat has processed the out of order SCN operations, disable HANDLECOLLISIONS. You could also look for the CSN and wait for Replicat to checkpoint past it.
SEND REPLICAT [name], NOHANDLECOLLISIONS.
16.Edit the replicat parameter and comment out the HANDLECOLLISIONS parameter. You do not need to stop/restart replicat. This will ensure that on any subsequent replicat restarts the parameter is disabled.
Note:
If the out of order transactions happen to the SAME record on the SAME table, then the Replicat will probably abend in step 7 or step 10. If that is the case, look at the discard file find out that particular record (the primary key value). Then add handlecollisions, continue with the steps below. Later, once the Replicat caught up, that particular record needs to be manually synced.
However, the chance of an out of order transactions happening to the SAME record on SAME table is rare. Given an example, this will mean that the user application inserts record A from node 1, then immediately it updated this record A on node 2. Out of order transactions usually happens to different tables or different records on same table.
In other words, if you do see collisions been handled in this recovery procedure, that means you have to re-sync ONLY those particular records that have collisions.
No comments:
Post a Comment