ORA-17629 ORA-20079

Yesterday I received page for ” ORA-17629: Cannot connect to the remote database server”  reported in Primary Database alert log.

Fri Apr 10 06:47:41 2015
Errors in file /oracle/app/diag/rdbms/testdb/testdb/trace/testdb_ora_161991.trc:
ORA-17627:
ORA-17629: Cannot connect to the remote database server
Errors in file /oracle/app/diag/rdbms/testdb/testdb/trace/testdb_ora_161991.trc:
ORA-17629: Cannot connect to the remote database server
ORA-17627:
ORA-17629: Cannot connect to the remote database server
Fri Apr 10 06:48:22 2015

The error trace file shows

*** 2015-04-10 06:47:13.005
*** SESSION ID:(3024.45081) 2015-04-10 06:47:13.005

*** CLIENT ID:() 2015-04-10 06:47:13.005
*** SERVICE NAME:(testdb) 2015-04-10 06:47:13.005
*** MODULE NAME:(oracle@oracle2.testdbds.com (TNS V1-V3)) 2015-04-10 06:47:13.005

*** ACTION NAME:() 2015-04-10 06:47:13.005

krbmsrvgrcf:Expected db_unique_name: testdb
krbmsrvgrcf:mydbuname = :testdb:
krbmsrvgrcf:Instance with connect id testdbSBY requested controlfile.
krbmsrvgrcf:Filespec: /oracle/app/product/11.2_1/dbs/snappcf_testdb.file

*** 2015-04-10 06:47:41.467
OCI error val is 184283056 and errmsg is ”
ORA-17627:
ORA-17629: Cannot connect to the remote database server
******************** WARNING ***************************
The errors during Server autobackup are not fatal, as it
is attempted after sucessful completion of the command.
However, it is recomended to take an RMAN control file
backup as soon as possible because the Autobackup failed
with the following error:
ORA-17629: Cannot connect to the remote database server
ORA-17627:
ORA-17629: Cannot connect to the remote database server
******************** END OF WARNING *******************
ORA-17629: Cannot connect to the remote database server
ORA-17627:
ORA-17629: Cannot connect to the remote database server

The trace file mentions the SID,SERIAL# and the module name. The module shows “oracle@oracle2.testdbds.com” which is the standby database.

I thought to check the sid details from v$active_session_history and the session is coming in from standby database and definitely related to RMAN.

SQL> @ash_sid
Enter value for sid: 3024
Enter value for serial: 45081
old  17: where    session_id = &sid and session_serial# = &serial
new  17: where    session_id = 3024 and session_serial# = 45081

                                                                         SQL                                                                                        Dur   CPU DB Time
Sid,Serial<Blk  STIME    Module               SqlId:ChildNo    SqlExecId Start CPH S PRXJBZ Sta    SEQ# EVENT                            P1:P2:P3   WCLASS           ms    ms      ms
--------------- -------- -------------------- ---------------- --------- ----- ------------ ---- ------ ------------------------ ------------------ ----------- ------- ----- -------
3024,45081      06:47:13 oracle@oracle2.testdbds.com                           NNN N NNNNN  Wait    129 control file single writ 9999999:  1024:  9 System I/O      105     8     113
                06:47:14                                                       NNN N NNNNN  Wait    269 control file single writ 9999999:  5504:  9 System I/O
                06:47:15                                                       NNN N NNNNN  Wait    395 control file single writ 9999999:  9536:  9 System I/O
                06:47:16                                                       NNN N NNNNN  Wait    473 control file single writ 9999999: 12032:  9 System I/O
                06:47:17                                                       NNN N NNNNN  Wait    560 control file sequential  9999999: 14848:  9 System I/O
                06:47:18                                                       NNN N NNNNN  Wait    673 control file single writ 9999999: 18432:  9 System I/O
                06:47:19                                                       NNN N NNNNN  Wait    768 control file sequential  9999999: 21504:  9 System I/O
                06:47:20                                                       NNN N NNNNN  Wait    863 control file single writ 9999999: 24512:  9 System I/O
                06:47:21                                                       NNN N NNNNN  Wait    959 control file single writ 9999999: 27584:  9 System I/O
                06:47:22                                                       NNN N NNNNN  cpu    1044                          9999999: 30336:  9
                06:47:23                                                       NNN N NNNNN  cpu    1132                          9999999: 33152:  9
                06:47:24                                                       NNN N NNNNN  Wait   1220 control file sequential  9999999: 35968:  9 System I/O
                06:47:25                                                       NNN N NNNNN  Wait   1318 control file sequential  9999999: 39104:  9 System I/O
                06:47:26                                                       NNN N NNNNN  Wait   1405 control file single writ 9999999: 41856:  9 System I/O
                06:47:27                                                       NNN N NNNNN  Wait   1489 control file single writ 9999999: 44544:  9 System I/O
                06:47:28                                                       NNN N NNNNN  Wait   1567 control file single writ 9999999: 47040:  9 System I/O
                06:47:29                                                       NNN N NNNNN  Wait   1649 control file single writ 9999999: 49664:  9 System I/O
                06:47:30                                                       NNN N NNNNN  Wait   1733 control file single writ 9999999: 52352:  9 System I/O
                06:47:31                                                       NNN N NNNNN  cpu    1817                          9999999: 55040:  9
                06:47:32                                                       NNN N NNNNN  Wait   1895 control file single writ 9999999: 57536:  9 System I/O
                06:47:33                                                       NNN N NNNNN  Wait   1965 control file single writ 9999999: 59776:  9 System I/O
                06:47:34                                                       NNN N NNNNN  Wait   2041 control file single writ 9999999: 62208:  9 System I/O
                06:47:35                                                       NNN N NNNNN  Wait   2113 control file single writ 9999999: 64512:  9 System I/O
                06:47:36                                                       NNN N NNNNN  Wait   2179 control file single writ 9999999: 66624:  9 System I/O
                06:47:37                                                       NNN N NNNNN  Wait   2238 control file sequential  9999999: 68544:  9 System I/O
                06:47:38                                                       NNN N NNNNN  Wait   2307 control file single writ 9999999: 70720:  9 System I/O
                06:47:39                                                       NNN N NNNNN  Wait   2369 control file single writ 9999999: 72704:  9 System I/O
                06:47:40                                                       NNN N NNNNN  Wait   2444 control file sequential  9999999: 75136:  9 System I/O


28 rows selected.

Looking into MOS I found “RMAN-06820 ORA-17629 During Backup at Standby Site (Doc ID 1616074.1)”. It stated something interesting, so I thought to mention it here, which I did not know

Change in 11.2.0.4 onward

Per ‘unpublished’ Bug 8740124, as of 11.2.0.4, we now include the current standby redo log as part of an RMAN archivelog backup at the standby site. This is achieved by forcing a log switch at the primary site.

I looked at the archive log backup trace file on standby database and there was no error reported. Then I started looking into the backup script and saw, after the backup is completed, resync function is called which connects to rcat database and perform resync catalog.

cle2.testdbs logs]$ more testdb_backup_testdb_resync_201504100617_Fri.log

Recovery Manager: Release 11.2.0.3.0 – Production on Fri Apr 10 06:46:55 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

RMAN>
echo set on

RMAN>

RMAN> connect target *
connected to target database: testdb (DBID=3948365078)

RMAN> connect catalog *
connected to recovery catalog database

RMAN> resync catalog;
starting partial resync of recovery catalog
ORA-20079: full resync from primary database is not done

doing automatic resync from primary
resyncing from database with DB_UNIQUE_NAME testdb
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of resync command on default channel at 04/10/2015 06:47:13
ORA-17629: Cannot connect to the remote database server

ORA-17628: Oracle error 17629 returned by remote Oracle server

RMAN> exit;

Recovery Manager complete.

Tried “show all for db_unique_name testdb;” from standby

RMAN> show all for db_unique_name testdb;

ORA-20079: full resync from primary database is not done

doing automatic resync from primary
resyncing from database with DB_UNIQUE_NAME testdb
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of show command at 04/10/2015 07:53:40
RMAN-03014: implicit resync of recovery catalog failed
RMAN-03009: failure of partial resync command on default channel at 04/10/2015 07:53:40
ORA-17629: Cannot connect to the remote database server
ORA-17628: Oracle error 17629 returned by remote Oracle server

Connected to Primary Database and ran the same command

[oracle@oracle1 anand]$ rman target /

Recovery Manager: Release 11.2.0.3.0 – Production on Fri Apr 10 07:55:43 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

connected to target database: testdb (DBID=3948365078)

RMAN> connect catalog rcat/xxxxxxxx@rcat

connected to recovery catalog database

RMAN> show all for db_unique_name testdb;

starting full resync of recovery catalog

full resync complete
RMAN configuration parameters for database with db_unique_name testdb are:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 10 DAYS;
CONFIGURE BACKUP OPTIMIZATION OFF;
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO ‘%F’;
CONFIGURE DEVICE TYPE DISK PARALLELISM 8 BACKUP TYPE TO BACKUPSET;
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1;
CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT ‘/mnt/oracle-backup/testdb/%U’;
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM ‘AES192’;
CONFIGURE COMPRESSION ALGORITHM ‘BASIC’ AS OF RELEASE ‘DEFAULT’ OPTIMIZE FOR LOAD FALSE;
CONFIGURE DB_UNIQUE_NAME ‘testdbSBY’ CONNECT IDENTIFIER ‘testdbSBY’;
CONFIGURE DB_UNIQUE_NAME ‘testdb’ CONNECT IDENTIFIER ‘testdb’;
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE;
CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/oracle/app/product/11.2_1/dbs/snappcf_testdb.file’;

RMAN>
RMAN> exit

The full resync completed on Primary database and then I performed “resync catalog” on standby as it completed successfully.

RMAN> resync catalog
2> ;

starting partial resync of recovery catalog
partial resync complete

RMAN> exit

Advertisements

ORA-00600:[kcbsor_2], [3], [2]

Recently we had a node crash of one node in 3-node 11.2.0.3 RAC database configuration.The alert log showed –

Thu Apr 26 00:15:16 2012
Hex dump of (file 488, block 345039) in trace file /u01/oradiag/diag/rdbms/matrix_adc/matrix3/trace/matrix3_dbw1_16218.trc
Corrupt block relative dba: 0x7a0543cf (file 488, block 345039)
Bad header found during preparing block for write
Data in bad block:
 type: 6 format: 2 rdba: 0x644ebb3d
 last change scn: 0x0b7a.f77b7353 seq: 0x1 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x73530601
 check value in block header: 0x80ae
 computed block checksum: 0x0
Thu Apr 26 00:15:16 2012
Block Corruption Recovery: buf=0x453d5e9b8 seqno=1
Block recovery from logseq 58471, block 224143 to scn 12695053893631
Recovery of Online Redo Log: Thread 3 Group 19 Seq 58471 Reading mem 0
  Mem# 0: /u02/oraredo/matrix/redot3g19m1.rdo
  Mem# 1: /u05/oraredo/matrix/redot3g19m2.rdo
Hex dump of (file 488, block 345039) in trace file /u01/oradiag/diag/rdbms/matrix_adc/matrix3/trace/matrix3_pmon_16123.trc
Reading datafile '/u10/oradata/matrix/test_idx_01_177.dbf' for corruption at rdba: 0x7a0543cf (file 488, block 345039)
Reread (file 488, block 345039) found same corrupt data (logically corrupt)
Errors in file /u01/oradiag/diag/rdbms/matrix_adc/matrix3/trace/matrix3_pmon_16123.trc  (incident=48017):
ORA-00600: internal error code, arguments: [kcbsor_2], [3], [2], [], [], [], [], [], [], [], [], []
Errors in file /u01/oradiag/diag/rdbms/matrix_adc/matrix3/trace/matrix3_pmon_16123.trc:
ORA-00600: internal error code, arguments: [kcbsor_2], [3], [2], [], [], [], [], [], [], [], [], []
PMON (ospid: 16123): terminating the instance due to error 472
Thu Apr 26 00:15:18 2012
opiodr aborting process unknown ospid (30802) as a result of ORA-1092
Thu Apr 26 00:15:18 2012
opiodr aborting process unknown ospid (15384) as a result of ORA-1092
Thu Apr 26 00:15:18 2012
opiodr aborting process unknown ospid (1661) as a result of ORA-1092
Thu Apr 26 00:15:18 2012
System state dump requested by (instance=3, osid=16123 (PMON)), summary=[abnormal instance termination].
Thu Apr 26 00:15:30 2012
Termination issued to instance processes. Waiting for the processes to exit
Thu Apr 26 00:15:33 2012
ORA-1092 : opitsk aborting process

The trace file “/u01/oradiag/diag/rdbms/matrix_adc/matrix3/trace/matrix3_pmon_16123.trc” shows –

PMON: received BCR msg

*** 2012-04-26 00:15:16.373
Block Corruption Recovery: buf=0x453d5e9b8 seqno=1 tries=1
Trying to recover a block. Dump of buffer header.
----------------------------------------------------------
BH (0x453d5e9b8) file#: 488 rdba: 0x7a0543cf (488/345039) class: 1 ba: 0x45321a000
  set: 38 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,25
  dbwrid: 1 obj: 61329 objn: 61329 tsn: 7 afn: 488 hint: f
  hash: [0x83f1fa230,0x83f1fa230] lru-req: [0x8345f9390,0x8345f9390]
  obj-flags: object_ckpt_list
  ckptq: [0x547bd2c78,0xb3cfff70] fileq: [0xdbc48778,0x8395dec88] objq: [0x287d85f68,0x7c833f750] objaq: [0x46bd59140,0x303b53fc8]
  st: XCURRENT md: NULL fpin: 'kdiwh22: kdifind' tch: 1 le: 0x40fdd8170
  flags: buffer_dirty being_written block_written_once corrupt_being_recovered
          redo_since_read remote_transfered
  LRBA: [0xe467.36b8f.0] LSCN: [0xb8b.cc2d7fff] HSCN: [0xb8b.cc2d7fff] HSUB: [1]
GLOBAL CACHE ELEMENT DUMP (address: 0x40fdd8170):
  id1: 0x543cf id2: 0x1e8 pkey: OBJ#61329 block: (488/345039)
  lock: XG rls: 0x0 acq: 0x0 latch: 10
  flags: 0x20 fair: 0 recovery: 0 fpin: 'kdiwh22: kdifind'
  bscn: 0xb8b.cc2bc912 bctx: (nil) write: 0 scan: 0x0
  lcp: (nil) lnk: [NULL] lch: [0x453d5eae8,0x453d5eae8]
  seq: 3347 hist: 225 502 72 257 324 21 143:0 19 16 352 32 197 48 121
  LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
    flg: 0x0aa00021 state: XCURRENT tsn: 7 tsh: 1 foq: 4
      addr: 0x453d5e9b8 obj: 61329 cls: DATA bscn: 0xb7a.f77b7353
Trying to recover a block with no users.
Doing block recovery for file 488 block 345039
Block header before block recovery:
buffer tsn: 7 rdba: 0x7a0543cf (488/345039)
scn: 0x0b8b.c8d4d9f4 seq: 0x01 flg: 0x04 tail: 0xd9f40601
frmt: 0x02 chkval: 0xb26b type: 0x06=trans data
Resuming block recovery (PMON) for file 488 block 345039
Block recovery from logseq 58471, block 224143 to scn 12695053893631

*** 2012-04-26 00:15:16.692
Recovery of Online Redo Log: Thread 3 Group 19 Seq 58471 Reading mem 0

*** 2012-04-26 00:15:16.933
CHANGE IN FUTURE OF BLOCK
CHANGE #2 TYP:0 CLS:1 AFN:488 DBA:0x7a0543cf OBJ:61329 SCN:0x0b8b.cc2bc912 SEQ:1 OP:10.2 ENC:0 RBL:0
buffer tsn: 7 rdba: 0x7a0543cf (488/345039)
scn: 0x0b8b.c8d4d9f4 seq: 0x01 flg: 0x04 tail: 0xd9f40601
frmt: 0x02 chkval: 0xb26b type: 0x06=trans data
GLOBAL CACHE ELEMENT DUMP (address: 0x40fdd8170):
  id1: 0x543cf id2: 0x1e8 pkey: OBJ#61329 block: (488/345039)
  lock: X rls: 0x0 acq: 0x0 latch: 10
  flags: 0x20 fair: 0 recovery: 0 fpin: 'kdiwh22: kdifind'
  bscn: 0xb8b.cc2bc912 bctx: (nil) write: 0 scan: 0x0
  lcp: (nil) lnk: [NULL] lch: [0x453d5eae8,0x453d5eae8]
  seq: 3350 hist: 104 147:0 68 225 502 72 257 324 21 143:0 19 16
  LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
    flg: 0x0aa00005 state: XCURRENT tsn: 7 tsh: 1 mode: EXCL foq: 4
      pin: 'ktuwh61: ktugus:ktucmt'
      addr: 0x453d5e9b8 obj: 61329 cls: DATA bscn: 0xb8b.c8d4d9f4
kcra_dump_redo_internal: skipped for critical process
Hex dump of (file 488, block 345039)
Dump of memory from 0x000000045321A000 to 0x000000045321C000
45321A000 0000A206 7A0543CF C8D4D9F4 04010B8B  [......z........]
........................
45321BFF0 73736563 FFFFFFFF 4E01FFFF D9F40601  [c.s.......N....]
Reading datafile '/u10/oradata/matrix/test_idx_01_177.dbf' for corruption at rdba: 0x7a0543cf (file 488, block 345039)
Reread (file 488, block 345039) found same corrupt data (logically corrupt)
Reread for stuck block recovery failed
Incident 48017 created, dump file: /u01/oradiag/diag/rdbms/matrix_adc/matrix3/incident/incdir_48017/matrix3_pmon_16123_i48017.trc
ORA-00600: internal error code, arguments: [kcbsor_2], [3], [2], [], [], [], [], [], [], [], [], []

error 472 detected in background process
ORA-00600: internal error code, arguments: [kcbsor_2], [3], [2], [], [], [], [], [], [], [], [], []
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+461<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+53<-ksuitm()+1332<-ksbrdp()+3344<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<
-main()+201<-__libc_start_main()+244<-_start()+37
----- End of Abridged Call Stack Trace -----

*** 2012-04-26 00:15:18.418
PMON (ospid: 16123): terminating the instance due to error 472

*** 2012-04-26 00:15:20.990
ksuitm: waiting up to [5] seconds before killing DIAG(16141)

This relates to Bug 9451143 where PMON starts online block recovery from a wrong LOW RBA which is provided in the block header.As per the MOS note 9451143.8 , the issue is fixed in 11.2.0.3, though we still had a hit. Instance startup was hung at –

Thu Apr 26 00:47:41 2012
Started redo application at
 Thread 1: logseq 86819, block 358549
 Thread 2: logseq 62762, block 134200
 Thread 3: logseq 58471, block 195623
Recovery of Online Redo Log: Thread 1 Group 2 Seq 86819 Reading mem 0
  Mem# 0: /u02/oraredo/matrix/redot1g2m1.rdo
  Mem# 1: /u05/oraredo/matrix/redot1g2m2.rdo
Recovery of Online Redo Log: Thread 2 Group 6 Seq 62762 Reading mem 0
  Mem# 0: /u02/oraredo/matrix/redot2g6m1.rdo
  Mem# 1: /u05/oraredo/matrix/redot2g6m2.rdo
Recovery of Online Redo Log: Thread 3 Group 19 Seq 58471 Reading mem 0
  Mem# 0: /u02/oraredo/matrix/redot3g19m1.rdo
  Mem# 1: /u05/oraredo/matrix/redot3g19m2.rdo
Thu Apr 26 00:47:42 2012
Hex dump of (file 488, block 345039) in trace file /u01/oradiag/diag/rdbms/matrix_adc/matrix3/trace/matrix3_p000_30168.trc
Reading datafile '/u10/oradata/matrix/test_idx_01_177.dbf' for corruption at rdba: 0x7a0543cf (file 488, block 345039)
Reread (file 488, block 345039) found same corrupt data (logically corrupt)
RECOVERY OF THREAD 3 STUCK AT BLOCK 345039 OF FILE 488

“Logical Corruption happens when a data block has a valid checksum, etc., but the block contents are logically inconsistent. Logical block corruption can also occur when the structure below the beginning of the block (below the block header) is corrupt. In this case, the block checksum is correct but the block structures may be corrupt. Logical corruption can also result from a lost write in which a version of the block that was sent to disk somehow was not actually written. The result is that the version of that block on disk is older than the version in the buffer cache. Lost writes are usually caused by bugs in the operating system or hardware.”

We tried recover datafile 488 block 345039

RMAN>  RECOVER DATAFILE 488 BLOCK  345039;

Starting recover at 26-APR-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=2550 instance=matrix1 device type=DISK

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 26-APR-12
Thu Apr 26 01:13:25 2012
alter database recover datafile list clear
Completed: alter database recover datafile list clear

There wasn’t any error, but media recovery complete showed elapsed time of 00:00:00 , which made us doubtful.We tried starting the instance and again it got stuck at the same “RECOVERY OF THREAD 3 STUCK AT BLOCK 345039 OF FILE 488”. Meanwhile the other 2 instances also got evicted cause of high load average on the server causing downtime.We then performed datafile recovery

Thu Apr 26 01:22:17 2012
ALTER DATABASE RECOVER  datafile '/u10/oradata/matrix/test_idx_01_177.dbf'
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 2 Seq 86819 Reading mem 0
  Mem# 0: /u02/oraredo/matrix/redot1g2m1.rdo
  Mem# 1: /u05/oraredo/matrix/redot1g2m2.rdo
Recovery of Online Redo Log: Thread 2 Group 5 Seq 62761 Reading mem 0
  Mem# 0: /u02/oraredo/matrix/redot2g5m1.rdo
  Mem# 1: /u05/oraredo/matrix/redot2g5m2.rdo
Recovery of Online Redo Log: Thread 3 Group 22 Seq 58470 Reading mem 0
  Mem# 0: /u02/oraredo/matrix/redot3g22m1.rdo
  Mem# 1: /u05/oraredo/matrix/redot3g22m2.rdo
Thu Apr 26 01:22:50 2012
Recovery of Online Redo Log: Thread 3 Group 19 Seq 58471 Reading mem 0
  Mem# 0: /u02/oraredo/matrix/redot3g19m1.rdo
  Mem# 1: /u05/oraredo/matrix/redot3g19m2.rdo
Recovery of Online Redo Log: Thread 2 Group 6 Seq 62762 Reading mem 0
  Mem# 0: /u02/oraredo/matrix/redot2g6m1.rdo
  Mem# 1: /u05/oraredo/matrix/redot2g6m2.rdo
Thu Apr 26 01:23:22 2012
Recovery of Online Redo Log: Thread 1 Group 3 Seq 86820 Reading mem 0
  Mem# 0: /u02/oraredo/matrix/redot1g3m1.rdo
  Mem# 1: /u05/oraredo/matrix/redot1g3m2.rdo
Media Recovery Complete (matrix1)
Completed: ALTER DATABASE RECOVER  datafile '/u10/oradata/matrix/test_idx_01_177.dbf'

The datafile was recovered and the instance started.All 3-nodes were brought up.

From 11gr2 we have Automatic Block recovery (ABMR) feature, but as per my knowledge it would be useful in physical block corruption scenario only.ABMR requires data guard to be in real-time apply mode and sync or async transport modes.For more read-

http://docs.oracle.com/cd/E11882_01/server.112/e17157/unplanned.htm#BABCECIC

http://emrebaransel.blogspot.com.au/2012/02/automatic-block-media-recovery-on.html

set newname command made easier in 11gR2

From 11gR2 oracle introduced new options for “SET NEWNAME” command.

1.SET NEWNAME FOR DATAFILE and SET NEWNAME FOR TEMPFILE
2.SET NEWNAME FOR TABLESPACE
3.SET NEWNAME FOR DATABASE

The following variables are introduced for SET NEWNAME from 11gR2 :-

%b The file name remains same as the original. For example, if a datafile is named D:\oracle\oradata\matrix\test.dbf, then %b results in test.dbf.
%f Specifies the absolute file number of the datafile for which the new name is generated.
%I Specifies the DBID.
%N Specifies the tablespace name.
%U Specifies the following format: data-D-%d_id-%I_TS-%N_FNO-%f.

Time to test 🙂

In my test, i am creating a duplicate database ORCL, from the rman backup of MATRIX database. As i am doing it on Windows box, i do hit ORA-600 [KSMFPG5], [0xAEC0000], which is a bug, and also mentioned in one of my previous blog.

1. taking backup of matrix database –

RMAN> run{
2> backup database format 'D:\oracle\backup\matrix\%d_%s_%p';
3> backup archivelog all format 'D:\oracle\backup\matrix\arc_%d_%s_%p';
4> }

Starting backup at 11-JAN-12
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=D:\ORACLE\ORADATA\MATRIX\SYSTEM01.DBF
..............
channel ORA_DISK_1: finished piece 1 at 11-JAN-12
piece handle=D:\ORACLE\BACKUP\MATRIX\ARC_MATRIX_10_1 tag=TAG20120111T133016 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07
Finished backup at 11-JAN-12

The database as well as the archivelog backupset exist in D:\ORACLE\BACKUP\MATRIX\

2. Duplicating db MATRIX to ORCL. Before executing the command, created ORCL pfile , passwordfile and windows service using oradim utility.

D:\scripts>set oracle_sid=ORCL

D:\scripts>sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Wed Jan 11 13:32:14 2012

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.

Total System Global Area  150667264 bytes
Fixed Size                  1382112 bytes
Variable Size              92277024 bytes
Database Buffers           50331648 bytes
Redo Buffers                6676480 bytes
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

D:\scripts>
D:\scripts>
D:\scripts>rman auxiliary /

Recovery Manager: Release 11.2.0.2.0 - Production on Wed Jan 11 13:33:02 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to auxiliary database: ORCL (not mounted)

RMAN> run {
2>
3> SET NEWNAME FOR DATABASE   TO 'D:\oracle\oradata\orcl\%b';
4> SET NEWNAME FOR TEMPFILE 1 TO 'D:\oracle\oradata\orcl\temp01.dbf' ;
5>
6> DUPLICATE DATABASE 'MATRIX' DBID 2312606933
7>  TO ORCL
8>   BACKUP LOCATION 'D:\oracle\backup\matrix'
9> LOGFILE
10> GROUP 1 ('D:\oracle\oradata\orcl\redo01a.log',
11> 'D:\oracle\oradata\orcl\redo01b.log') SIZE 50M REUSE,
12> GROUP 2 ('D:\oracle\oradata\orcl\redo02a.log',
13> 'D:\oracle\oradata\orcl\redo02b.log') SIZE 50M REUSE,
14> GROUP 3 ('D:\oracle\oradata\orcl\redo03a.log',
15> 'D:\oracle\oradata\orcl\redo03b.log') SIZE 50M REUSE;
16> }

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting Duplicate Db at 11-JAN-12

contents of Memory Script:
{
   sql clone "alter system set  db_name =
 ''MATRIX'' comment=
 ''Modified by RMAN duplicate'' scope=spfile";
   sql clone "alter system set  db_unique_name =
 ''ORCL'' comment=
 ''Modified by RMAN duplicate'' scope=spfile";
   shutdown clone immediate;
   startup clone force nomount
   restore clone primary controlfile from  'D:\ORACLE\BACKUP\matrix\MATRIX_9_1';
   alter clone database mount;
}
executing Memory Script

sql statement: alter system set  db_name =  ''MATRIX'' comment= ''Modified by RMAN duplicate'' scope=spfile

sql statement: alter system set  db_unique_name =  ''ORCL'' comment= ''Modified by RMAN duplicate'' scope=spfile

Oracle instance shut down

Oracle instance started

Total System Global Area     150667264 bytes

Fixed Size                     1382112 bytes
Variable Size                 92277024 bytes
Database Buffers              50331648 bytes
Redo Buffers                   6676480 bytes

Starting restore at 11-JAN-12
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=130 device type=DISK

channel ORA_AUX_DISK_1: restoring control file
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:01
output file name=D:\ORACLE\ORADATA\ORCL\CONTROL01.CTL
output file name=D:\ORACLE\ORADATA\ORCL\CONTROL02.CTL
Finished restore at 11-JAN-12

database mounted
released channel: ORA_AUX_DISK_1
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=130 device type=DISK

contents of Memory Script:
{
   set until scn  859591;
   set newname for datafile  1 to
 "D:\oracle\oradata\orcl\SYSTEM01.DBF";
   set newname for datafile  2 to
 "D:\oracle\oradata\orcl\SYSAUX01.DBF";
   set newname for datafile  3 to
 "D:\oracle\oradata\orcl\UNDOTBS01.DBF";
   set newname for datafile  4 to
 "D:\oracle\oradata\orcl\USERS01.DBF";
   set newname for datafile  5 to
 "D:\oracle\oradata\orcl\EXAMPLE01.DBF";
   restore
   clone database
   ;
}
executing Memory Script

executing command: SET until clause

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting restore at 11-JAN-12
using channel ORA_AUX_DISK_1

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00001 to D:\oracle\oradata\orcl\SYSTEM01.DBF
channel ORA_AUX_DISK_1: restoring datafile 00002 to D:\oracle\oradata\orcl\SYSAUX01.DBF
channel ORA_AUX_DISK_1: restoring datafile 00003 to D:\oracle\oradata\orcl\UNDOTBS01.DBF
channel ORA_AUX_DISK_1: restoring datafile 00004 to D:\oracle\oradata\orcl\USERS01.DBF
channel ORA_AUX_DISK_1: restoring datafile 00005 to D:\oracle\oradata\orcl\EXAMPLE01.DBF
channel ORA_AUX_DISK_1: reading from backup piece D:\ORACLE\BACKUP\MATRIX\MATRIX_8_1
channel ORA_AUX_DISK_1: piece handle=D:\ORACLE\BACKUP\MATRIX\MATRIX_8_1 tag=TAG20120111T132903
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:01:05
Finished restore at 11-JAN-12

contents of Memory Script:
{
   switch clone datafile all;
}
executing Memory Script

datafile 1 switched to datafile copy
input datafile copy RECID=6 STAMP=772292110 file name=D:\ORACLE\ORADATA\ORCL\SYSTEM01.DBF
datafile 2 switched to datafile copy
input datafile copy RECID=7 STAMP=772292110 file name=D:\ORACLE\ORADATA\ORCL\SYSAUX01.DBF
datafile 3 switched to datafile copy
input datafile copy RECID=8 STAMP=772292110 file name=D:\ORACLE\ORADATA\ORCL\UNDOTBS01.DBF
datafile 4 switched to datafile copy
input datafile copy RECID=9 STAMP=772292110 file name=D:\ORACLE\ORADATA\ORCL\USERS01.DBF
datafile 5 switched to datafile copy
input datafile copy RECID=10 STAMP=772292111 file name=D:\ORACLE\ORADATA\ORCL\EXAMPLE01.DBF

contents of Memory Script:
{
   set until scn  859591;
   recover
   clone database
    delete archivelog
   ;
}
executing Memory Script

executing command: SET until clause

Starting recover at 11-JAN-12
using channel ORA_AUX_DISK_1

starting media recovery

channel ORA_AUX_DISK_1: starting archived log restore to default destination
channel ORA_AUX_DISK_1: restoring archived log
archived log thread=1 sequence=10
channel ORA_AUX_DISK_1: reading from backup piece D:\ORACLE\BACKUP\MATRIX\ARC_MATRIX_10_1
channel ORA_AUX_DISK_1: piece handle=D:\ORACLE\BACKUP\MATRIX\ARC_MATRIX_10_1 tag=TAG20120111T133016
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:03
archived log file name=D:\ORACLE\ARCHIVELOG\ORCL\ORCL_0000000010_0772282200_0001.ARC thread=1 sequence=10
channel clone_default: deleting archived log(s)
archived log file name=D:\ORACLE\ARCHIVELOG\ORCL\ORCL_0000000010_0772282200_0001.ARC RECID=1 STAMP=772292118
media recovery complete, elapsed time: 00:00:03
Finished recover at 11-JAN-12
Oracle instance started

Total System Global Area     150667264 bytes

Fixed Size                     1382112 bytes
Variable Size                 92277024 bytes
Database Buffers              50331648 bytes
Redo Buffers                   6676480 bytes

contents of Memory Script:
{
   sql clone "alter system set  db_name =
 ''ORCL'' comment=
 ''Reset to original value by RMAN'' scope=spfile";
   sql clone "alter system reset  db_unique_name scope=spfile";
   shutdown clone immediate;
}
executing Memory Script

sql statement: alter system set  db_name =  ''ORCL'' comment= ''Reset to original value by RMAN'' scope=spfile

sql statement: alter system reset  db_unique_name scope=spfile

Oracle instance shut down
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 01/11/2012 13:36:16
RMAN-05501: aborting duplication of target database
RMAN-04014: startup failed: ORA-00600: internal error code, arguments: [17099], [], [], [], [], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [KSMFPG5], [0xAEC0000], [], [], [], [], [], [], [], [], [], []

RMAN> exit

For the ORA-600, recreated the controlfile, opened the database using RESETLOGS option and added temp file.

In earlier versions of oracle we had to mention SET NEWNAME command to rename the duplicate datafiles while restoring/duplicating the database, where we mentioned it for each and every datafile.

# set new filenames for the datafiles
SET NEWNAME FOR DATAFILE 1 TO ‘/dup/oracle/oradata/trgt/system01.dbf’;
SET NEWNAME FOR DATAFILE 2 TO ‘/dup/oracle/oradata/trgt/undotbs01.dbf’;

From 11gR2 simply using SET NEWNAME FOR DATABASE has made things easier.

Shell Script for RMAN Backup

Few days back  i devoted my time in creating a shell script to take RMAN backup for a 2-node RAC database on Sun Solaris.Both the datafiles and the archive logfiles(for both the instances) are on a shared SAN box. Below is the script :-

ORACLE_HOME=/projects/product/10.2.0/db_1
export ORACLE_HOME
ORACLE_SID=test1
export ORACLE_SID
PATH=$PATH:$ORACLE_HOME/bin
export PATH

set -x

RMAN_BACKUP=/projects/test_backup
Rman_Log=$RMAN_BACKUP/"$ORACLE_SID"_backup_log

touch $Rman_Log

echo "\n\n ****RMAN FULL BACKUP****" >> $Rman_Log
echo "\n rman full startup time: `date`" >> $Rman_Log
DD=`date +%d%m%y`
mkdir /backup/rman/datafiles_backup/bkp_${DD}
mkdir /backup/rman/controlfile_backup/bkp_${DD}
mkdir /backup/rman/arch_backup/bkp_${DD}

sqlplus -s "sys/xxxxxx@TEST1 as sysdba" <<EOF  >> $Rman_Log
set feedback off;
alter system archive log current
/
set serveroutput on
declare
x number;
y number;
begin
select max(SEQUENCE#) into x from gv\$archived_log where thread#=1;
dbms_output.put_line('The Strat Sequence number on instance 1 = '||x||'');
sys.dbms_system.ksdwrt(2,'Note to DBA : On Instance 1 RMAN Backup Starts at '||to_char(x)||' on '||to_char(sysdate,'DD/MON/YYYY HH:MM:MI'));
select max(SEQUENCE#) into y from gv\$archived_log where thread#=2;
dbms_output.put_line('The Strat Sequence number on instance 2 = '||y||'');
sys.dbms_system.ksdwrt(2,'Note to DBA : On Instance 2 RMAN Backup Starts at '||to_char(y)||' on '||to_char(sysdate,'DD/MON/YYYY HH:MM:MI'));
end;
/
exit
EOF
rman target / nocatalog log=$RMAN_BACKUP/rman_fullbackup_${DD}.log << EOF1
change archivelog all crosscheck;
run {
CONFIGURE RETENTION POLICY TO REDUNDANCY 1;
CONFIGURE DEVICE TYPE DISK PARALLELISM 2 BACKUP TYPE TO BACKUPSET;
backup as compressed backupset incremental level 0 database tag 'TEST_FULL_BACKUP' format '/backup/rman/datafiles_backup/bkp_${DD}/%d_%s_%p';
backup as compressed backupset archivelog all not backed up 1 times FORMAT '/backup/rman/arch_backup/bkp_${DD}/ARCH_%d_%s_%p';
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO  '/ora_backup/rman/controlfile_backup/bkp_${DD}/%F';
delete noprompt obsolete;
delete noprompt archivelog all completed before 'sysdate-2';
}
EXIT;
EOF1

sqlplus -s "sys/xxxxxxx@TEST1 as sysdba" <<EOF2 >> $Rman_Log
set feedback off;
alter system archive log current
/
set serveroutput on
declare
x number;
y number;
begin
select max(SEQUENCE#+1) into x from gv\$archived_log where thread#=1;
dbms_output.put_line('The Finished Sequence number on instance 1 = '||x||'');
sys.dbms_system.ksdwrt(2,'Note to DBA : On Instance 1 RMAN Backup Finished at '||to_char(x)||' on '||to_char(sysdate,'DD/MON/YYYY HH:MM:MI'));
select max(SEQUENCE#+1) into y from gv\$archived_log where thread#=2;
dbms_output.put_line('The Finished Sequence number on instance 2 = '||y||'');
sys.dbms_system.ksdwrt(2,'Note to DBA : On Instance 2 RMAN Backup Finished at '||to_char(y)||' on '||to_char(sysdate,'DD/MON/YYYY HH:MM:MI'));
end;
/
exit
EOF2

echo "\n rman fullbackup end time: `date`" >> $Rman_Log

Some Features:-

1. Two log files will be generated.One with the name “$ORACLE_SID_backup_log” and the other with “rman_fullbackup_${DD}.log” (DD will be the current date).The first log file ($ORACLE_SID_backup_log) will have the

a. The startup time of the script run

b. The maximum archive log sequence number – before the start of the backup and after the end of the backup

c. The end time of the backup.

The rman_fullbackp_${DD} will look like rman_fullbackp_030310,if run on 03rd March 2010.Every time the script runs it will generate a new logfile.It will contain the details of the RUN block in RMAN backup.

2. A line in the alert log of both the instances ,with the maximum sequence number (before the start of the backup and after the end of the backup) will be written.For example

Note to DBA : On Instance 2 RMAN Backup Starts at 2289 on 03/03/2010 02:40:00

3. All the archivelogs will be backed up as  “NOT BACKED UP 1 TIMES” is being used.Even if the crontab didn’t run a particular day or time , the next time it runs it will take the backup of all those archivelogs that haven’t been backed up even once.

4. The backup pieces goes to the current date folder which is created using

     mkdir /backup/rman/datafiles_backup/bkp_${DD}

Suppose the backup is run on 03rd March 2010, so it will create a folder bkp_030310 inside “/backup/rman/datafiles_backup” and place all the backup pieces inside it.Similar is the case for archivelogs and the controlfile.

5. The retention and the deletion policy varies as per the requirements .I take a full backups on Sunday and rest all the days its incremental, with the same script with few modifications and the most important one is

     backup as compressed backupset incremental level 1 database

Hope this helps someone and as always, suggestions are welcomed. 🙂