OHASD doesn’t start – 11gR2

Few weeks back had an issue where 2nd node of 4-node RAC got evicted and the alert log showed the below error before the instance was evicted -

Errors in file /u04/oraout/matrix/diag/rdbms/matrix_adc/matrix2/trace/matrix2_ora_8418.trc  (incident=16804):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 169.254.*.* not found. Check output from ifconfig command
Sat Oct 22 23:54:41 2011

ORA-29740: evicted by instance number 2, group incarnation 24
LMON (ospid: 29328): terminating the instance due to error 29740
Sun Oct 23 00:00:01 2011
Instance terminated by LMON, pid = 29328

We tried starting the instance with srvctl and manually using startup command, but both failed.During the startup the interesting thing i noticed was

Private Interface 'bond2' configured from GPnP for use as a private interconnect.
  [name='bond2', type=1, ip=144.xx.xx.xxx, mac=xx-xx-xx-xx-xx-xx, net=144.20.xxx.xxx/xx, mask=255.255.x.x, use=cluster_interconnect/6]

But in normal cases it should have been like

Private Interface 'bond2:1' configured from GPnP for use as a private interconnect.
  [name='bond2:1', type=1, ip=169.254.*.*, mac=xx-xx-xx-xx-xx-xx, net=169.254.x.x/xx, mask=255.255.x.x, use=haip:cluster_interconnect/62]

Now, the question comes up what is “haip”. HAIP is High Availability IP,

Grid automatically picks free link local addresses from reserved 169.254.*.* subnet for HAIP. According to RFC-3927, link local subnet 169.254.*.* should not be used for any other purpose. With HAIP, by default, interconnect traffic will be load balanced across all active interconnect interfaces, and corresponding HAIP address will be failed over transparently to other adapters if one fails or becomes non-communicative. .

The number of HAIP addresses is decided by how many private network adapters are active when Grid comes up on the first node in the cluster . If there’s only one active private network, Grid will create one.Grid Infrastructure can activate a maximum of four private network adapters at a time even if more are defined.

Few commands to check -

$oifcfg iflist -p -n

$crsctl stat res -t -init  --> ora.cluster_interconnect.haip must be ONLINE

$ oifcfg getif

select inst_id,name,ip_address from gv$cluster_interconnects;

We got network team involved, but as per them everything was well on network side, so we finally decided to go for server rebooted, after which OHAS deamon wasn’t coming up automatically, though

$ cat crsstart
enable

TEST:oracle> (matrix2:11.2.0.2_matrix) /etc/oracle/scls_scr/test/root
$ cat ohasdstr
enable

No logs in $GRID_HOME/log/test/ were getting updated, so it was little difficult to diagnose it.As ohasd.bin is responsible to start up all other cluserware processes directly or indirectly, it needs to start up properly for the rest of the stack to come up, which wasn’t happening.

One of the reasons for ohasd not coming up is, if any rc Snncommand script is stuck at OS level

 root      2744     1  0 02:20 ?        00:00:00 /bin/bash /etc/rc.d/rc 3
 root      4888  2744  0 02:30 ?        00:00:00 /bin/sh /etc/rc3.d/S98gcstartup start

This S98gcsstartup was stuck.Checked the script which showed related to OMS startup. Renamed the file and got server rebooted, OHASD and all other resources came up successfully.

$ ls -lrt /etc/rc3.d/old_S98gcstartup
lrwxrwxrwx 1 root root 27 Jun  1 07:09 /etc/rc3.d/old_S98gcstartup -> /etc//rc.d/init.d/gcstartup

There are few other reasons too like ,inaccessible/corrupted OLR , CRS autostart disabled etc.

But still i was unable to find why we got “additional information: requested interface 169.254.*.* not found” all of a sudden when things were running fine.

About these ads
    • swapnil mhetre
    • November 9th, 2011

    Hi Anand,

    That was very good information. Although I never got a chance to work on RAC , but this is surely help me

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 462 other followers

%d bloggers like this: