ORA-15042: ASM disk “N” is missing

This was pending from a long time.As i am little free these days, so thought of writing it down.A little background on the issue –

The RDBMS and ASM HOME had 10.2.0.3 version on AIX OS(version i don’t remember).A new disk 100Gb had to be added to the ASM diskgroup.As part of initial check, the consultant on site checked for the ownership and permission of the provided disk on both the nodes of RAC setup.The ownership on node 2 was incorrectly set(showed root:system).It was changed by the sysadmin and after re-checking the OS visibility of the disk,ownership and permission on both nodes, add disk command was fired(from node 1) and it failed.The alert log (+ASM2 instance) showed

Fri Oct 22 15:35:52 2010
NOTE: reconfiguration of group 2/0x1771b7a6 (ASMDB), full=1
NOTE: disk validation pending for group 2/0x1771b7a6 (ASMDB)
ERROR: group 2/0x1771b7a6 (ASMDB): could not validate disk 3
SUCCESS: validated disks for 2/0x1771b7a6 (ASMDB)
NOTE: PST refresh pending for group 2/0x1771b7a6 (ASMDB)
NOTE: PST update: grp = 2, dsk = 3, mode = 0x4
Fri Oct 22 15:35:57 2010
ERROR: too many offline disks in PST (grp 2)
Fri Oct 22 15:35:57 2010
NOTE: PST not enabling heartbeating (grp 2): group dismounted
Fri Oct 22 15:35:57 2010
SUCCESS: refreshed PST for 2/0x1771b7a6 (ASMDB)
ERROR: ORA-15040 thrown in RBAL for group number 2
Fri Oct 22 15:35:57 2010
Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm2_rbal_1159174.trc:
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "" may result in a data loss
ORA-15042: ASM disk "3" is missing
Fri Oct 22 15:35:57 2010
Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm2_ckpt_1060874.trc:
ORA-00600: internal error code, arguments: [kfcbCloseCIC10], [2], [3], [7], [], [], [], []
Fri Oct 22 15:36:00 2010
Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm2_ckpt_1060874.trc:
ORA-00600: internal error code, arguments: [kfcbCloseCIC10], [2], [3], [7], [], [], [], []
Fri Oct 22 15:36:00 2010
CKPT: terminating instance due to error 469
Fri Oct 22 15:36:00 2010
Trace dumping is performing id=[cdmp_20101022153600]
Fri Oct 22 15:36:02 2010
Shutting down instance (abort)
License high water mark = 9
Fri Oct 22 15:36:05 2010
Instance terminated by CKPT, pid = 1060874
Fri Oct 22 15:36:07 2010
Instance terminated by USER, pid = 1536124
Fri Oct 22 15:36:11 2010
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0

Now things are interesting…insn’t it ;).So what all is mandatory to check before adding the disk to diskgroup :-

1. The OS visibility of the disk across all the nodes.
2. The ownership of the disk across all the nodes.
3. The permissions of the disk across all the nodes.
4. Checking the visibility of the disk using view GV$ASM_DISK.

The above 4 are the key points of areas (i suppose) to look before adding the disk.For the disk to be added the column HEADER_STATUS in GV$ASM_DISK must show as CANDIDATE.And this is what was not checked.I was called to check what could have the been reason for the same and also to add 2 more disk in the diskgroup.

After arriving on client’s site, i checked for the above 4 points and saw that the point 4 is where things went wrong.The view gv$asm_disk showed :-

SQL> select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk15';
   INST_ID PATH                      MOUNT_S HEADER_STATU
---------- ------------------------- ------- ------------
         1 /dev/rhdisk15             CLOSED  MEMBER

The above confirms that the disk (/dev/rhdisk15) was not discovered by the ASM instance on node 2 and hence the error ORA-15042: ASM disk “3” is missing.The only reason i could think of was that the disk was exclusively acquired by node1 as ownership and permissions were correctly set.To confirm the same,i had to make few MOS and google search.The Metalink Doc Id :- 400005.1 confirms the same issue.Also found this useful link http://oraclue.com/2009/03/10/kernel-files-editor-kfed-and-kfod-osm-discovery-utility/.

As i had to add 2 more disk , asked the sysadmin team to provide the disk.After they made it available, i started checking the pre-requisites and again point 4 seemed problematic.The disk was getting recognized only be node 1.

SQL>select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk17';
   INST_ID     PATH                      MOUNT_S HEADER_STATU
---------- ------------------------- ------- ------------
     1             /dev/rhdisk17             CLOSED  CANDIDATE

The newly added disk showed CANDIDATE, but was recognized by only node 1.To confirm that the disk was acquired exclusively by node1 used the kfod utility along with truss as suggested by the metalink doc from node1.

truss -aefo kfod_17.txt kfod disks=all

For the disk(rhdisk17) the output showed :-

1192140: 2490391: statx("/dev/rhdisk17", 0x0FFFFFFFFFFFCF00, 176, 010) = 0
1192140: 2490391: open("/dev/rhdisk17", O_RDONLY|O_LARGEFILE) Err#16 EBUSY
1192140: 2490391: open("/dev/rhdisk17", O_RDONLY|O_LARGEFILE) Err#16 EBUSY

So, it was time to catch the sysadmin.It was hard to make him understand the problem because he had only one sentence ‘Everything is OK from myside’.Finally after having discussions with him, he was ok in diagnosing the analyzing the problem.In between the time he was diagnosing,output of gv$asm_disk kept changing.

17:03:12 SQL> select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk17';

   INST_ID PATH                      MOUNT_S HEADER_STATU
---------- ------------------------- ------- ------------
         1 /dev/rhdisk17             CLOSED  CANDIDATE

17:08:13 SQL> select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk17';

   INST_ID PATH                      MOUNT_S HEADER_STATU
---------- ------------------------- ------- ------------
         2 /dev/rhdisk17             CLOSED  CANDIDATE
         1 /dev/rhdisk17             CLOSED  CANDIDATE

17:38:07 SQL> select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk17';
   INST_ID PATH                      MOUNT_S HEADER_STATU
---------- ------------------------- ------- ------------
         1 /dev/rhdisk17             CLOSED  CANDIDATE

To resolve it below steps were taken on AIX server (luckily i was part of doing this) :-


1. Logged in as root user on the server.
2. root@db1:/ $ cd /dev
3. root@db1:/dev $ ls -l | grep rhdisk17
crw-------   1 root     system       40, 19 Oct 25 18:03 rhdisk17
root@db1:/dev $ chmod 660 rhdisk17
root@db1:/dev $ ls -lrt rhdisk17
crw-rw----   1 root     system       40, 19 Oct 25 18:03 rhdisk17
root@db1:/dev $ chown -R oracle:oinstall rhdisk17
root@db1:/dev $ ls -lrt rhdisk17
crw-rw----   1 oracle   oinstall     40, 19 Oct 25 18:03 rhdisk17
root@db1:/dev $
root@db1:/dev $ ls -lrt hdisk17
brw-------   1 root     system       40, 20 Oct 25 18:03 hdisk17
root@db1:/dev $ lsattr -El hdisk17
PR_key_value   none                             Persistant Reserve Key Value           True
cache_method   fast_write                       Write Caching method                   False
ieee_volname   600A0B800042193E000009984CC4D6EB IEEE Unique volume name                False
lun_id         0x0010000000000000               Logical Unit Number                    False
max_transfer   0x100000                         Maximum TRANSFER Size                  True
prefetch_mult  1                                Multiple of blocks to prefetch on read False
pvid           none                             Physical volume identifier             False
q_type         simple                           Queuing Type                           False
queue_depth    10                               Queue Depth                            True
raid_level     1                                RAID Level                             False
reassign_to    120                              Reassign Timeout value                 True
reserve_policy single_path                      Reserve Policy                         True
rw_timeout     30                               Read/Write Timeout value               True
scsi_id        0x10000                          SCSI ID                                False
size           51200                            Size in Mbytes                         False
write_cache    yes                              Write Caching enabled                  False
root@db1:/dev $

The reserve_policy was set as single_path(by default).So, which ever node it was made visible later, had the exclusive lock on the disk.To encounter it the disk’s attribute must be changed so that they are shared by all the nodes.

root@db1:/dev $ lsattr -El hdisk17  | grep lun_id
lun_id         0x0010000000000000               Logical Unit Number
root@db1:/dev $ chdev -l hdisk17 -a reserve_policy=no_reserve
hdisk17 changed
root@db1:/dev $ lsattr -El hdisk17 | grep reserve_policy
reserve_policy no_reserve                       Reserve Policy                         True
root@db1:/dev $ lspv | grep hdisk17
hdisk17         none                                None
root@db1:/dev $

The same steps were performed on the node 2 to change the reserve_policy of the disk to no reserve and made it sharable.
Once it was completed checked the output from gv$asm_disk

18:28:29 SQL> select inst_id,path,mount_status,header_status from gv$asm_disk where path='/dev/rhdisk17';

   INST_ID PATH                      MOUNT_S HEADER_STATU
---------- ------------------------- ------- ------------
         2 /dev/rhdisk17             CLOSED  CANDIDATE
         1 /dev/rhdisk17             CLOSED  CANDIDATE

The ‘alter diskgroup add disk’ for rhdisk17 and rhdisk18 succeeded.

The issue with ‘rhdisk15’ was resolved by Oracle Support using the ‘dd’ command.

References :-

1. Disks Added to the ASM From One Node Are Not Discovered on the Other Node [ID 400005.1]
2. http://askdba.org/weblog/2008/05/ora-15063-asm-discovered-insufficient-amount-of-disks-2/
3. http://oraclue.com/2009/03/10/kernel-files-editor-kfed-and-kfod-osm-discovery-utility/
4. http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/iphcg/lspv.htm

Advertisements

5 thoughts on “ORA-15042: ASM disk “N” is missing

  1. Hi!

    I just wanted to say that these troubleshooting steps are still relevant for 11gR2. One more item you can check if multi-pathing is set correctly:

    ‘oracleasm listdisks’

    If your new disk is missing there, then have your sysadmin run the scandisks command. That is what fixed my ora-15042 error.

    Thanks much!
    Kevin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s