Recently I came across “ORA-00353: log corruption near block change time” error.
Errors in file /oracle/app/diag/rdbms/test/test/trace/test_ms00_54284.trc (incident=1490189): ORA-00353: log corruption near block change time ORA-00353: log corruption near block 4631374 change 263470473806052 time 04/16/2015 06:53:16 ORA-00312: online log 43 thread 3: '+REDO02/test/onlinelog/group_43.258.875922665' ORA-00312: online log 43 thread 3: '+REDO01/test/onlinelog/group_43.258.875922633' Incident details in: /oracle/app/diag/rdbms/test/test/incident/incdir_1490189/test_ms00_54284_i1490189.trc Thu Apr 16 08:14:42 2015 Sweep [inc][1490188]: completed Thu Apr 16 08:14:55 2015 LOGMINER: WARNING: error 353 encountered, failed to read corrupt logfile +REDO01/test/onlinelog/group_43.258.875922633
As per ORA-354 ORA-353 and ORA-312 on Redo Log Group members (Doc ID 332672.1)
Possible corruption in Online Redo Log File Members in a Redo Log Group.
If Oracle detects a corruption when archiving a log member, it will try to read the same redo block from the second member of the group. If the block is found corrupt in all the members of the redo log group, archiving will not proceed.
I started validating the online redo logfile member and luckily none of the members were corrupt. Incase you find corrupt members please refer to above mentioned Doc “Doc ID 332672.1” for solution.
As a precaution I validated all the online redo logfiles to confirm for corruption. If the redo log file is corrupt it should produce an error.
SQL> alter system dump logfile '+REDO01/test/onlinelog/group_43.258.875922633' validate; System altered. Elapsed: 00:01:21.89 SQL> alter system dump logfile '+REDO02/test/onlinelog/group_43.258.875922665' validate; System altered. Elapsed: 00:01:10.61
Interestingly I noticed trace file “test_ms00_54284.trc” of size 49Gb and continuously increasing. The file was being continuously written by GoldenGate process.
The issue seemed similar to one mentioned in
Bug 13840711 – ORA-353 in Standby / Streams Data Capture or ORA-272 in PRIMARY: Redo log corruption by ASYNC redo shipping (Doc ID 13840711.8)
Corrupt redo blocks are read from in memory log buffer (not from disk) by ASYNC redo shipping or hotmining clients.
ORA-353 may be reported by the process reading the redo log block from in memory log buffer (e.g. Logminer Reader – MSnn background process, Streams data capture).
Looking at the goldengate status showed
Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING JAGENT RUNNING EXTRACT RUNNING DPUMP 00:00:00 00:00:03 EXTRACT RUNNING EXTL 02:05:36 00:00:07 REPLICAT RUNNING REPL 00:00:00 00:00:06
Meanwhile we received another error, this time the extract process which failed reading from online redo logfile, failed reading from archivelog file too. I checked the standby database, and it was insync with Primary.
Errors in file /oracle/app/diag/rdbms/test/test/trace/test_ms00_54284.trc (incident=1490190):
ORA-00353: log corruption near block 4631374 change 605163911323 time 01/06/1988 07:26:12
ORA-00334: archived log: ‘+FLASH01/test/archivelog/2015_04_16/thread_3_seq_9370.7790.877158359′
Incident details in: /oracle/app/diag/rdbms/test/test/incident/incdir_1490190/test_ms00_54284_i1490190.trc
Thu Apr 16 09:19:13 2015
Dumping diagnostic data in directory=[cdmp_20150416091913], requested by (instance=1, osid=54284 (MS00)), summary=[incident=1490190].
Thu Apr 16 09:19:27 2015
Errors in file /oracle/app/diag/rdbms/test/test/trace/test_ms00_54284.trc (incident=1490191):
ORA-00356: inconsistent lengths in change description
ORA-00353: log corruption near block 4631374 change 605163911323 time 01/06/1988 07:26:12
ORA-00334: archived log: ‘+FLASH01/test/archivelog/2015_04_16/thread_3_seq_9370.7790.877158359’
Incident details in: /oracle/app/diag/rdbms/test/test/incident/incdir_1490191/test_ms00_54284_i1490191.trc
At this point, client stopped all the GoldenGate processes, deleted the trace file and started the processes.I started monitoring the Extract Process
send extract EXTL status Sending STATUS request to EXTRACT EXTL ... EXTRACT EXTL (PID 102390) Current status: In recovery[1]: Processing data Current read position: Redo thread #: 3 Sequence #: 9351 RBA: 4023516192 Timestamp: 2015-04-16 03:37:35.000000 SCN: 140.3838686301 Current write position: Sequence #: 1637 RBA: 1551 Timestamp: 2015-04-16 09:31:56.423132 Extract Trail: /oracle2/app/product/ogg12.1.2/dirdat/capture/sd
Monitored the Extract, and it was successfully able to read and extract from SEQ 9370, for which it had earlier failed.
Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING JAGENT RUNNING EXTRACT RUNNING DPUMP 00:00:01 00:00:07 EXTRACT RUNNING EXTL 00:00:01 00:00:00 REPLICAT RUNNING REPL 00:00:00 00:00:06 send extract EXTL status Sending STATUS request to EXTRACT EXTL ... EXTRACT EXTL (PID 102390) Current status: Recovery complete: Processing data Current read position: Redo thread #: 3 Sequence #: 9395 RBA: 1721029512 Timestamp: 2015-04-16 12:08:00.000000 SCN: 140.3917853576 Current write position: Sequence #: 1641 RBA: 997030325 Timestamp: 2015-04-16 12:07:57.056673 Extract Trail: /oracle2/app/product/ogg12.1.2/dirdat/capture/sd lag extract EXTL Sending GETLAG request to EXTRACT EXT1OL ... Last record lag 0 seconds. lag extract DPUMP Sending GETLAG request to EXTRACT DPUMP ... Last record lag 2 seconds. At EOF, no more records to process.
Current DB version is 11.2.0.3.13. As mentioned in the doc, fix for bug 13840711 is first included in 11.2.0.3.14 (Apr 2015) Database Patch Set Update (DB PSU) , we do plan to apply it in coming weeks.
Many thanks for the good concise note on the subject. Though I would have loved brief contents of Doc ID 332672.1, telling us how do we fix the corruption, if all of my log members do have corrupt blocks.
Regards
SKG