ORA-00353: log corruption near block change time – 11.2.0.3

Recently I came across “ORA-00353: log corruption near block change time” error.

Errors in file /oracle/app/diag/rdbms/test/test/trace/test_ms00_54284.trc  (incident=1490189):
ORA-00353: log corruption near block  change  time
ORA-00353: log corruption near block 4631374 change 263470473806052 time 04/16/2015 06:53:16
ORA-00312: online log 43 thread 3: '+REDO02/test/onlinelog/group_43.258.875922665'
ORA-00312: online log 43 thread 3: '+REDO01/test/onlinelog/group_43.258.875922633'
Incident details in: /oracle/app/diag/rdbms/test/test/incident/incdir_1490189/test_ms00_54284_i1490189.trc
Thu Apr 16 08:14:42 2015
Sweep [inc][1490188]: completed
Thu Apr 16 08:14:55 2015
LOGMINER: WARNING: error 353 encountered, failed to read corrupt logfile +REDO01/test/onlinelog/group_43.258.875922633

As per ORA-354 ORA-353 and ORA-312 on Redo Log Group members (Doc ID 332672.1)

Possible corruption in Online Redo Log File Members in a Redo Log Group.

If Oracle detects a corruption when archiving a log member, it will try to read the same redo block from the second member of the group. If the block is found corrupt in all the members of the redo log group, archiving will not proceed.

I started validating the online redo logfile member and luckily none of the members were corrupt. Incase you find corrupt members please refer to above mentioned Doc “Doc ID 332672.1” for solution.

As a precaution I validated all the online redo logfiles to confirm for corruption. If the redo log file is corrupt it should produce an error.

SQL> alter system dump logfile '+REDO01/test/onlinelog/group_43.258.875922633' validate;

System altered.

Elapsed: 00:01:21.89

SQL> alter system dump logfile '+REDO02/test/onlinelog/group_43.258.875922665' validate;

System altered.

Elapsed: 00:01:10.61

Interestingly I noticed trace file “test_ms00_54284.trc” of size 49Gb and continuously increasing. The file was being continuously written by GoldenGate process.

The issue seemed similar to one mentioned in

Bug 13840711 – ORA-353 in Standby / Streams Data Capture or ORA-272 in PRIMARY: Redo log corruption by ASYNC redo shipping (Doc ID 13840711.8)

Corrupt redo blocks are read from in memory log buffer (not from disk) by ASYNC redo shipping or hotmining clients.

ORA-353 may be reported by the process reading the redo log block from in memory log buffer (e.g. Logminer Reader – MSnn background process, Streams data capture).

Looking at the goldengate status showed

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
JAGENT      RUNNING
EXTRACT     RUNNING     DPUMP       00:00:00      00:00:03
EXTRACT     RUNNING     EXTL        02:05:36      00:00:07
REPLICAT    RUNNING     REPL        00:00:00      00:00:06

Meanwhile we received another error, this time the extract process which failed reading from online redo logfile, failed reading from archivelog file too. I checked the standby database, and it was insync with Primary.

Errors in file /oracle/app/diag/rdbms/test/test/trace/test_ms00_54284.trc (incident=1490190):
ORA-00353: log corruption near block 4631374 change 605163911323 time 01/06/1988 07:26:12
ORA-00334: archived log: ‘+FLASH01/test/archivelog/2015_04_16/thread_3_seq_9370.7790.877158359′

Incident details in: /oracle/app/diag/rdbms/test/test/incident/incdir_1490190/test_ms00_54284_i1490190.trc
Thu Apr 16 09:19:13 2015
Dumping diagnostic data in directory=[cdmp_20150416091913], requested by (instance=1, osid=54284 (MS00)), summary=[incident=1490190].
Thu Apr 16 09:19:27 2015
Errors in file /oracle/app/diag/rdbms/test/test/trace/test_ms00_54284.trc (incident=1490191):
ORA-00356: inconsistent lengths in change description
ORA-00353: log corruption near block 4631374 change 605163911323 time 01/06/1988 07:26:12
ORA-00334: archived log: ‘+FLASH01/test/archivelog/2015_04_16/thread_3_seq_9370.7790.877158359’
Incident details in: /oracle/app/diag/rdbms/test/test/incident/incdir_1490191/test_ms00_54284_i1490191.trc

At this point, client stopped all the GoldenGate processes, deleted the trace file and started the processes.I started monitoring the Extract Process

 send extract EXTL status

Sending STATUS request to EXTRACT EXTL ...


EXTRACT EXTL (PID 102390)
  Current status: In recovery[1]: Processing data

  Current read position:
  Redo thread #: 3
  Sequence #: 9351
  RBA: 4023516192
  Timestamp: 2015-04-16 03:37:35.000000
  SCN: 140.3838686301
  Current write position:
  Sequence #: 1637
  RBA: 1551
  Timestamp: 2015-04-16 09:31:56.423132
  Extract Trail: /oracle2/app/product/ogg12.1.2/dirdat/capture/sd

Monitored the Extract, and it was successfully able to read and extract from SEQ 9370, for which it had earlier failed.

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
JAGENT      RUNNING
EXTRACT     RUNNING     DPUMP       00:00:01      00:00:07
EXTRACT     RUNNING     EXTL        00:00:01      00:00:00
REPLICAT    RUNNING     REPL        00:00:00      00:00:06

send extract EXTL status

Sending STATUS request to EXTRACT EXTL ...


EXTRACT EXTL (PID 102390)
  Current status: Recovery complete: Processing data

  Current read position:
  Redo thread #: 3
  Sequence #: 9395
  RBA: 1721029512
  Timestamp: 2015-04-16 12:08:00.000000
  SCN: 140.3917853576
  Current write position:
  Sequence #: 1641
  RBA: 997030325
  Timestamp: 2015-04-16 12:07:57.056673
  Extract Trail: /oracle2/app/product/ogg12.1.2/dirdat/capture/sd

lag extract EXTL

Sending GETLAG request to EXTRACT EXT1OL ...
Last record lag 0 seconds.

lag extract DPUMP

Sending GETLAG request to EXTRACT DPUMP ...
Last record lag 2 seconds.
At EOF, no more records to process.

Current DB version is 11.2.0.3.13. As mentioned in the doc, fix for bug 13840711 is first included in 11.2.0.3.14 (Apr 2015) Database Patch Set Update (DB PSU) , we do plan to apply it in coming weeks.

Advertisements