ologgerd daemon – 11gR2 | Learn. Share. Repeat.

In past few weeks i have been involved in RAC as well as non-RAC databases upgrade from 10.2.0.5 to 11.2.0.2.With every upgrade i come to learn something new 🙂

After an upgrade to 11.2.0.2, for one of the development database which is a 2-node RAC, EMGC started showing Swap Utilization 99.99%.The “TOP” command showed

node 1 –> az8500
node 2 –> az8501

top - 08:39:20 up 12 days, 10:13,  7 users,  load average: 1.51, 1.21, 1.16
Tasks: 233 total,   1 running, 232 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.4% us,  1.3% sy,  0.0% ni, 92.4% id,  0.5% wa,  0.1% hi,  0.4% si
Mem:  16319928k total, 16237460k used,    82468k free,    97948k buffers
Swap:  6289384k total,  6288964k used,      420k free,  5998720k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 7414 root      RT   0 13.3g 8.5g  56m S 53.8 54.6   1447:04 ologgerd           
 7062 root      RT   0  109m  84m  54m S  3.7  0.5 223:40.41 osysmond.bin

[root@az8501 ~]# ps -ef | grep 7414
root      7414     1 14 Feb21 ?        1-05:14:29 /u01/app/grid/11.2.0/bin/ologgerd -m az8500 -r -d /u01/app/grid/11.2.0/crf/db/az8501
root     18569 15708  0 04:47 pts/8    00:00:00 grep 7414

Question which comes to mind is what is this daemon? What does it do?

From 11gR2, Oracle introduced a new resource “ora.crf” which is run by “orarootagent” agent and “root” as the owner. This resource in turn spawns osysmond process which spawns the ologgerd daemon, one daemon per cluster node.More or less it seems to be implementation of IPD/OS (Instantaneous Problem Detector) Cluster Health Monitor Tool on the servers which was available for CRS 10gR2 and above.

$ crsctl stat res ora.crf -init -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.crf
      1        ONLINE  ONLINE       az8501

[root@az8501 ~]# ps -ef | grep osysmond
oracle   24181 19835  0 04:02 pts/5    00:00:00 grep osysmond
root     24813     1  0 Feb21 ?        03:46:44 /u01/app/oracle/grid/11.2.0/bin/osysmond.bin

[root@az8501 ~]# ps -ef | grep ologgerd
oracle   24246 19835  0 04:03 pts/5    00:00:00 grep ologgerd
root      7414     1 14 Feb21 ?        1-05:14:29 /u01/app/grid/11.2.0/bin/ologgerd -m az8500 -r -d /u01/app/grid/11.2.0/crf/db/az8501

The 2 daemons are

a.) osysmond –> is the monitoring and OS metric collection daemon on every node.
b.) ologgerd –> follows a master / standby paradigm if more than 1 node in the cluster.The master manages the OS metric database in BDB (Berkeley DB) based database and interacts with the standby to manage a replica of the master metrics

So,osysmond is the monitoring and OS metric collection daemon that sends the data to ologgerd. ologgerd receives the information from all the nodes and persists in a Berkeley DB based database.

The crf folder in $GRID_HOME has 2 folders

az8501:/u01/app/grid/11.2.0/crf> ls -lrt
total 6
drwxr-x---  3 root dba 3 Feb 17 17:27 db
drwxr-x---  3 root dba 5 Mar  2 16:54 admin

For the ologgerd daemon, the directory ($GRID_HOME/crf/db/) specified by “-d” denotes the location where the ologgerd process maintains/stores its logging information on the sever.Number of *.bdb format (Berkeley DB) files and few other files can be found in the directory. “-r” represents the replica.

The admin folder ($GRID_HOME/crf/admin) contains the crf(hostname).cfg and crf(hostname).ora.

The ctf(hostname).ora file shows –

HOSTS=az8500,az8501  --> Hostnames of the Clusters
MASTER=az8500 -->   Hostname for Master daemon
MYNAME=az8501 -->   Server's Hostname 
CLUSTERNAME=crs  
USERNAME=oracle
BDBLOC=/u01/app/oracle/grid/11.2.0/crf/db/az8500   --> Location of BDB (Berkeley DB)
CRFHOME=/u01/app/oracle/grid/11.2.0
az8500 1= localhost.localdomain 0
az8500 0=xx.xxx.xx.xxx 16020 
az8500 2=xx.xxx.xx.xxx 23188
az8501 1= localhost.localdomain 0
az8501 0=xx.xxx.xx.xxx 16020
az8501 2=xx.xxx.xx.xxx 16021
BDBSIZE=12623
MASTERPUB=xx.xxx.xxx.xx --> host ip of the master daemon
DEAD=
REPLICA=az8501  --> Hostname of Replica daemon
ACTIVE=az8500,az8501

Now, back to the Swap utilization issue.As it was a dev box we decided to kill the process and see what happens.

[root@az8501 ~]# kill -9 7414

[root@az8501 ~]# ps -ef | grep ologgerd
root      7414     1 14 Feb21 ?        1-05:14:49 [ologgerd] 
root     18971 15708  0 04:51 pts/8    00:00:00 grep ologgerd

After few seconds,the daemon re-spawned

[root@az8501 ~]# ps -ef | grep ologgerd
root     19558     1  4 04:54 ?        00:00:01 /u01/app/grid/11.2.0/bin/ologgerd -M -d /u01/app/grid/11.2.0/crf/db/az8501
root     19585 15708  0 04:54 pts/8    00:00:00 grep ologgerd

Interesting to note that now ologgerd daemon on az8501 has become the Master, which earlier was replica.Even, the content of crf(hostname).ora has changed

HOSTS=az8500,az8501
MYNAME=az8501
CLUSTERNAME=crs
USERNAME=oracle
BDBLOC=/u01/app/grid/11.2.0/crf/db/az8501
CRFHOME=/u01/app/grid/11.2.0
az8500 1= localhost.localdomain 0
az8500 0=xx.xxx.xx.xxx 16020 
az8500 2=xx.xxx.xx.xxx 23188
az8501 1= localhost.localdomain 0
az8501 0=xx.xxx.xx.xxx 16020
az8501 2=xx.xxx.xx.xxx 16021
STATE=mutated
BDBSIZE=12623
MASTERPUB=xxx.xx.xx.xxx --> IP of the new Mater Daemon
MASTER=az8501
DEAD=az8500
ACTIVE=az8500,az8501
REPLICA=az8500

The above shows, the Master Daemon is now running on node2, the IP of node 2 , the master process died on node 1 showed by DEAD, and STATE=mutated.The swap was released and no more alerts 🙂

On node 1

[root@az8500 ~]# ps -ef | grep ologgerd
root     4620  4422  0 12:16 pts/1    00:00:00 grep ologgerd
root     11122     1  0 05:01 ?        00:01:00 /u01/app/grid/11.2.0/bin/ologgerd -m az8501 -r -d /u01/app/grid/11.2.0/crf/db/az8500

References –

http://martincarstenbach.wordpress.com/2010/10/19/first-contact-with-oracle-11-2-0-2-rac/

http://surachartopun.com/2009/12/oracle-cluster-health-monitoripdos.html

2 thoughts on “ologgerd daemon – 11gR2”

Yong Huang says:

July 23, 2020 at 8:37 AM

Can ologgerd be kept down permanently or deleted? Someone (https://www.rocworks.at/wordpress/?p=271) did that. The worry I have is that when you patch the GI and database next time, maybe the patching process will throw errors if it sees some component not running. That’s what happened to us when we kept mgmtdb down.

1. Anand says:
  
  July 23, 2020 at 9:41 AM
  
  Hi Yong,
  
  Thank you for visiting the blog. I do not work on Oracle databases anymore. But, what is you actual issue? Why do you want to shutdown the process? Which oracle version is the database running?
  
  Regards,
  Anand

Share this:

Related

2 thoughts on “ologgerd daemon – 11gR2”

Leave a comment Cancel reply