ologgerd daemon – 11gR2

In past few weeks i have been involved in RAC as well as non-RAC databases upgrade from to every upgrade i come to learn something new🙂

After an upgrade to, for one of the development database which is a 2-node RAC, EMGC started showing Swap Utilization 99.99%.The “TOP” command showed

node 1 –> az8500
node 2 –> az8501

top - 08:39:20 up 12 days, 10:13,  7 users,  load average: 1.51, 1.21, 1.16
Tasks: 233 total,   1 running, 232 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.4% us,  1.3% sy,  0.0% ni, 92.4% id,  0.5% wa,  0.1% hi,  0.4% si
Mem:  16319928k total, 16237460k used,    82468k free,    97948k buffers
Swap:  6289384k total,  6288964k used,      420k free,  5998720k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 7414 root      RT   0 13.3g 8.5g  56m S 53.8 54.6   1447:04 ologgerd           
 7062 root      RT   0  109m  84m  54m S  3.7  0.5 223:40.41 osysmond.bin       

[root@az8501 ~]# ps -ef | grep 7414
root      7414     1 14 Feb21 ?        1-05:14:29 /u01/app/grid/11.2.0/bin/ologgerd -m az8500 -r -d /u01/app/grid/11.2.0/crf/db/az8501
root     18569 15708  0 04:47 pts/8    00:00:00 grep 7414

Question which comes to mind is what is this daemon? What does it do?

From 11gR2, Oracle introduced a new resource “ora.crf” which is run by “orarootagent” agent and “root” as the owner. This resource in turn spawns osysmond process which spawns the ologgerd daemon, one daemon per cluster node.More or less it seems to be implementation of IPD/OS (Instantaneous Problem Detector) Cluster Health Monitor Tool on the servers which was available for CRS 10gR2 and above.

$ crsctl stat res ora.crf -init -t
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
Cluster Resources
      1        ONLINE  ONLINE       az8501

[root@az8501 ~]# ps -ef | grep osysmond
oracle   24181 19835  0 04:02 pts/5    00:00:00 grep osysmond
root     24813     1  0 Feb21 ?        03:46:44 /u01/app/oracle/grid/11.2.0/bin/osysmond.bin

[root@az8501 ~]# ps -ef | grep ologgerd
oracle   24246 19835  0 04:03 pts/5    00:00:00 grep ologgerd
root      7414     1 14 Feb21 ?        1-05:14:29 /u01/app/grid/11.2.0/bin/ologgerd -m az8500 -r -d /u01/app/grid/11.2.0/crf/db/az8501

The 2 daemons are

a.) osysmond –> is the monitoring and OS metric collection daemon on every node.
b.) ologgerd –> follows a master / standby paradigm if more than 1 node in the cluster.The master manages the OS metric database in BDB (Berkeley DB) based database and interacts with the standby to manage a replica of the master metrics

So,osysmond is the monitoring and OS metric collection daemon that sends the data to ologgerd. ologgerd receives the information from all the nodes and persists in a Berkeley DB based database.

The crf folder in $GRID_HOME has 2 folders

az8501:/u01/app/grid/11.2.0/crf> ls -lrt
total 6
drwxr-x---  3 root dba 3 Feb 17 17:27 db
drwxr-x---  3 root dba 5 Mar  2 16:54 admin

For the ologgerd daemon, the directory ($GRID_HOME/crf/db/) specified by “-d” denotes the location where the ologgerd process maintains/stores its logging information on the sever.Number of *.bdb format (Berkeley DB) files and few other files can be found in the directory. “-r” represents the replica.

The admin folder ($GRID_HOME/crf/admin) contains the crf(hostname).cfg and crf(hostname).ora.

The ctf(hostname).ora file shows –

HOSTS=az8500,az8501  --> Hostnames of the Clusters
MASTER=az8500 -->   Hostname for Master daemon
MYNAME=az8501 -->   Server's Hostname 
BDBLOC=/u01/app/oracle/grid/11.2.0/crf/db/az8500   --> Location of BDB (Berkeley DB)
az8500 1= localhost.localdomain 0
az8500 0=xx.xxx.xx.xxx 16020 
az8500 2=xx.xxx.xx.xxx 23188
az8501 1= localhost.localdomain 0
az8501 0=xx.xxx.xx.xxx 16020
az8501 2=xx.xxx.xx.xxx 16021
MASTERPUB=xx.xxx.xxx.xx --> host ip of the master daemon
REPLICA=az8501  --> Hostname of Replica daemon

Now, back to the Swap utilization issue.As it was a dev box we decided to kill the process and see what happens.

[root@az8501 ~]# kill -9 7414
[root@az8501 ~]# ps -ef | grep ologgerd
root      7414     1 14 Feb21 ?        1-05:14:49 [ologgerd] 
root     18971 15708  0 04:51 pts/8    00:00:00 grep ologgerd

After few seconds,the daemon re-spawned

[root@az8501 ~]# ps -ef | grep ologgerd
root     19558     1  4 04:54 ?        00:00:01 /u01/app/grid/11.2.0/bin/ologgerd -M -d /u01/app/grid/11.2.0/crf/db/az8501
root     19585 15708  0 04:54 pts/8    00:00:00 grep ologgerd

Interesting to note that now ologgerd daemon on az8501 has become the Master, which earlier was replica.Even, the content of crf(hostname).ora has changed

az8500 1= localhost.localdomain 0
az8500 0=xx.xxx.xx.xxx 16020 
az8500 2=xx.xxx.xx.xxx 23188
az8501 1= localhost.localdomain 0
az8501 0=xx.xxx.xx.xxx 16020
az8501 2=xx.xxx.xx.xxx 16021
MASTERPUB=xxx.xx.xx.xxx --> IP of the new Mater Daemon

The above shows, the Master Daemon is now running on node2, the IP of node 2 , the master process died on node 1 showed by DEAD, and STATE=mutated.The swap was released and no more alerts🙂

On node 1

[root@az8500 ~]# ps -ef | grep ologgerd
root     4620  4422  0 12:16 pts/1    00:00:00 grep ologgerd
root     11122     1  0 05:01 ?        00:01:00 /u01/app/grid/11.2.0/bin/ologgerd -m az8501 -r -d /u01/app/grid/11.2.0/crf/db/az8500

References –



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s