ORA-00600: [4194], [a], [b]

One of the servers having 2 single instance databases hosted on it,was having some I/O issue.The control files, redo log files and the data files for both the database were present on these two mount points.

bash-3.00$ cd /disk1
bash-3.00$ ls -lrt
.: I/O error
bash-3.00$ cd /disk2
bash-3.00$ ls -lrt
.: I/O error

Cause if this I/O issue, both the databases were down.The SysAdmin team, dismounted and then mounted back these two disk,and the I/O issue was no more.It was time to startup the database.One of the database started up successfully.The other one, started throwing ORA-00600: [4194], [38], [20]

Successfully onlined Undo Tablespace 29.
Sun May 16 01:52:16 2010
SMON: enabling tx recovery
Sun May 16 01:52:16 2010
Database Characterset is WE8ISO8859P1
Sun May 16 01:52:17 2010
Errors in file /disk1/oracle/admin/idcdb/udump/idc_ora_22161.trc:
ORA-00600: internal error code, arguments: [4194], [38], [20], [], [], [], [], []
Doing block recovery for file 66 block 213923
Block recovery from logseq 15340, block 64 to scn 5970802072110
Sun May 16 01:52:20 2010
Recovery of Online Redo Log: Thread 1 Group 4 Seq 15340 Reading mem 0
  Mem# 0: /disk1/oradata/redo/redolog6.log
  Mem# 1: /disk1/oradata/redo/redomlplx/redolog6a.log
Block recovery stopped at EOT rba 15340.66.16
Block recovery completed at rba 15340.66.16, scn 1390.797530529
Doing block recovery for file 66 block 361
Block recovery from logseq 15340, block 64 to scn 5970802071968
Sun May 16 01:52:20 2010
Recovery of Online Redo Log: Thread 1 Group 4 Seq 15340 Reading mem 0
  Mem# 0: /disk1/oradata/redo/redolog6.log
  Mem# 1: /disk1/oradata/redo/redomlplx/redolog6a.log
Block recovery completed at rba 15340.66.16, scn 1390.797530529
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=20, OS id=23281
Sun May 16 01:52:25 2010
Completed: ALTER DATABASE OPEN

After few minutes, the instance was terminated by PMON process

Errors in file /disk1/oracle/admin/idcdb/bdump/idc_mmon_21649.trc:
ORA-00600: internal error code, arguments: [4194], [38], [20], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [38], [20], [], [], [], [], []
ORA-00600: internal error code, arguments: [4194], [38], [20], [], [], [], [], []
Sun May 16 01:52:31 2010
Errors in file /disk1/oracle/admin/idcdb/bdump/idc_pmon_21599.trc:
ORA-00600: internal error code, arguments: [4194], [9], [5], [], [], [], [], []
Sun May 16 01:52:31 2010
Errors in file /disk1/oracle/admin/idcdb/udump/idc_ora_24006.trc:
ORA-00600: internal error code, arguments: [4194], [54], [39], [], [], [], [], []
Sun May 16 01:52:32 2010
Errors in file /disk1/oracle/admin/idcdb/bdump/idc_pmon_21599.trc:
ORA-00600: internal error code, arguments: [4194], [9], [5], [], [], [], [], []
PMON: terminating instance due to error 472
Sun May 16 01:52:32 2010
DEBUG: Replaying xcb 0x47fb9e970, pmd 0x47fde0728 for failed op 8
Doing block recovery for file 66 block 293914
Block recovery from logseq 15340, block 76 to scn 5970802072130
Sun May 16 01:52:32 2010
Recovery of Online Redo Log: Thread 1 Group 4 Seq 15340 Reading mem 0
  Mem# 0: /disk1/oradata/redo/redolog6.log
  Mem# 1: /disk1/oradata/redo/redomlplx/redolog6a.log
Block recovery completed at rba 15340.78.16, scn 1390.797530691
Sun May 16 01:52:38 2010
Instance terminated by PMON, pid = 21599

As per MOS

DESCRIPTION: A mismatch has been detected between Redo records and rollback (Undo) records.
We are validating the Undo record number relating to the change being applied against the maximum undo record number recorded in the undo block.
This error is reported when the validation fails.

ARGUMENTS:
Arg [a] Maximum Undo record number in Undo block
Arg [b] Undo record number from Redo block

FUNCTIONALITY:
Kernel Transaction Undo called from Cache layer

IMPACT:
PROCESS FAILURE
POSSIBLE ROLLBACK SEGMENT CORRUPTION

SUGGESTIONS:This error may indicate a rollback segment corruption.
This may require a recovery from a database backup depending on the situation.

Now, it was time to come in action 🙂 . Below is the workaround steps :-

1. Startup database in mount mode

 SQL> startup mount
ORACLE instance started.

Total System Global Area 4294967296 bytes
Fixed Size                  2078264 bytes
Variable Size            4160752072 bytes
Database Buffers          117440512 bytes
Redo Buffers               14696448 bytes
Database mounted.

2. Check the values for Undo related parameter

SQL> show parameter undo

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      AUTO
undo_retention                       integer     900
undo_tablespace                      string      UNDOTBS1

3. Change the UNDO_MANAGEMENT to ‘MANUAL’

SQL>
SQL> alter system set undo_management='MANUAL' scope=spfile;

4. As, the above parameter is a static one, bounce the database

SQL> shu immediate;
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 4294967296 bytes
Fixed Size                  2078264 bytes
Variable Size            4160752072 bytes
Database Buffers          117440512 bytes
Redo Buffers               14696448 bytes
Database mounted.
Database opened.

5. In the meantime, keep checking the alert log.Check if there is some error after opening the database successfully with undo_management set to manual.

6.If everything seems fine, then go ahead and create a new undo tablespace.

SQL> create undo tablespace UNDOTBS datafile '/disk1/oradata/idcdb/undotbs1.dbf' size 1G autoextend on;

Tablespace created.

SQL> show parameter undo

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      MANUAL
undo_retention                       integer     900
undo_tablespace                      string      UNDOTBS1
SQL>

7. Set the undo_tablespace to the new Undo Tablespace name and undo_management back to ‘AUTO’

SQL> alter system set undo_tablespace='UNDOTBS' scope=spfile;

System altered.

SQL> alter system set undo_management='AUTO' scope=spfile;

System altered.

8. Bounce the database

SQL> shu immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 4294967296 bytes
Fixed Size                  2078264 bytes
Variable Size            4160752072 bytes
Database Buffers          117440512 bytes
Redo Buffers               14696448 bytes
Database mounted.
Database opened.

9. Check the Alert log for if any errors and check the undo parameters

SQL> show parameter undo

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
undo_management                      string      AUTO
undo_retention                       integer     900
undo_tablespace                      string      UNDOTBS

After this the database was open successfully without any error.But,things don’t end up here.
After few minutes started getting another ORA-600
ORA-00600: [13009], [5000], [1], [17],[1], [], [], []
As per MOS

DESCRIPTION: Select for update fetches the rows, satisfying the where-clause, from the Consistent Read (CR) buffers and matches each row with the current buffer version of the row.
If the row is still unchanged in the current version, then the row is locked.If the CR version and Current version does not match, the select for update repeats the whole operation, starting from fetch.
It will stop when all the interested rows are locked or the number of
locking attempt exceeds 5000.
The latter case results in ORA-600 [13009]

Tried to analyze the trace file getting generated by this ORA-600 and checked that only when a particular job ran, this error occurred.After few more analysis found 2 application tables doing select for update.Checked for the indexes on these tables and did a rebuild.This resolved the ORA-600 [13009].It might have been that the index would have got corrupted cause of the disk issue.But still i am not very sure for it.

Advertisements

15 thoughts on “ORA-00600: [4194], [a], [b]

  1. GREAT!!!

    Thank you very much! I got the same problem (I guess from a disk failure) and I was faced to reinstall my test database. Your hints saved me lots of time!

  2. Thanks a lot!!! It solved our problem.

    The Oracle Support team requested us that this was due to BUG 13931044 – ORA-600 [13009], [5000], [1], [17]

    Three solutions were proposed:
    – Apply patch 13931044 which is specific for por OS version and RDBMS
    – Update RDBMS from 11.2.0.2 to 11.2.0.4
    – As a workaround set hidden parameter _NLJ_BATCHING_ENABLED to 0

    As we haven’t faced a problem like this before, we were not sure about applying those solutions. However, it makes sense that a problem like this may be solved after a rebuild of the index involved.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s