12cR1 RAC Installation on OEL7

To build Oracle Clusterware Database at Home, I believe , RAC ATTACK is the best place to learn. Its is a free curriculum and platform for hands-on learning labs related to Oracle RAC. While reviewing the article, I thought to perform 12cR1 RAC installation on OEL 7.2.

Attached is the document :- 12c_RAC_on_OEL7

The attached article is inspired by

RAC ATTACK :- https://en.wikibooks.org/wiki/RAC_Attack_-_Oracle_Cluster_Database_at_Home/RAC_Attack_12c

Tim Hall’s article :- https://oracle-base.com/articles/12c/oracle-db-12cr1-rac-installation-on-oracle-linux-7-using-virtualbox 

Deploying Oracle RAC Database 12c on RHEL 7 – Best Practices :- https://www.redhat.com/en/resources/deploying-oracle-rac-database-12c-rhel-7-best-practices

Big Thank you to RAC Attack members!!!

I hope the document helps some of you. Please feel free to comment.

Its all about learning 🙂

Advertisements

DB startup/shutdown error after downgrade from 11gR2 to 10gR2

I was asked to work on a 2-node RAC db which was downgraded to 10.2.0.5 from 11.2.0.3. While working, i had stopped the database using “shutdown immediate” command from sqlplus instead of srvctl. While start the db using srvctl –

wdlab1: (matrix1) /u01/app/oracle/product/rdbms/10205> srvctl  start database -d matrix_lab -o open
PRKR-1001 : cluster database matrix_lab does not exist
PRKO-2005 : Application error: Failure in getting Cluster Database Configuration for: matrix_lab
wdlab1: (matrix1) /u01/app/oracle/product/rdbms/10205/bin> which srvctl
/u01/app/oracle/product/rdbms/10205/bin/srvctl

Hmmm…lets try starting it using srvctl from 11gR2 grid home

wdlab1: (matrix1) /u01/app/grid/11.2.0/bin> ./srvctl start database -d matrix_lab -o open
PRCR-1079 : Failed to start resource ora.matrix_lab.db
CRS-5017: The resource action "ora.matrix_lab.db start" encountered the following error:
ORA-02095: specified initialization parameter cannot be modified
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/11.2.0/log/wdlab1/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-2674: Start of 'ora.matrix_lab.db' on 'wdlab1' failed
CRS-5017: The resource action "ora.matrix_lab.db start" encountered the following error:
ORA-02095: specified initialization parameter cannot be modified
. For details refer to "(:CLSN00107:)" in "/u01/app/grid/11.2.0/log/wdlab2/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-2632: There are no more servers to try to place resource 'ora.matrix_lab.db' on that would satisfy its placement policy
CRS-2674: Start of 'ora.matrix_lab.db' on 'wdlab2' failed

Hmmm…lets try to see what is the issue

wdlab1: (matrix1) /u01/app/grid/11.2.0/bin> ./crsctl stat res -t
..........
ora.matrix_lab.db
      1        ONLINE  OFFLINE                               Instance Shutdown
      2        ONLINE  OFFLINE                               Instance Shutdown
..........

wdlab1: (matrix1) /u01/app/grid/11.2.0/bin> ./crsctl stat res ora.matrix_lab.db -p
....................
GEN_USR_ORA_INST_NAME@SERVERNAME(wdlab1)=matrix1
GEN_USR_ORA_INST_NAME@SERVERNAME(wdlab2)=matrix2
HOSTING_MEMBERS=
INSTANCE_FAILOVER=0
....................
USR_ORA_INST_NAME=
USR_ORA_INST_NAME@SERVERNAME(wdlab1)=matrix1
USR_ORA_INST_NAME@SERVERNAME(wdlab2)=matrix2
USR_ORA_OPEN_MODE=open
....................

For a pre-11gR2 database registered to the 11gR2 grid should not have entry like “GEN_USR_ORA_INST_NAME@SERVERNAME” and also should have “ora.dbname.instancename.inst” in the crsctl stat res -t output.That was enough of the clue that, though the db was downgraded seemed like it wasn’t removed and added back using the 10gR2 OH srvctl.

wdlab1: (matrix1) /u01/app/grid/11.2.0/bin> ./srvctl remove instance -d matrix_lab -i matrix1 -f
wdlab1: (matrix1) /u01/app/grid/11.2.0/bin> ./srvctl remove instance -d matrix_lab -i matrix2 -f
wdlab1: (matrix1) /u01/app/grid/11.2.0/bin> ./srvctl remove database -d matrix_lab -f
wdlab1: (matrix1) /u01/app/grid/11.2.0/bin>
wdlab1: (matrix1) /u01/app/grid/11.2.0/bin> ./crsctl stat res -t

crsctl stat res -t didn;t show any entry for it. Add the instance and db info using srvctl from 10gR2 home

wdlab1: (matrix1) /u01/app/oracle/product/rdbms/10205/bin> which srvctl
/u01/app/oracle/product/rdbms/10205/bin/srvctl
wdlab1: (matrix1) /u01/app/oracle/product/rdbms/10205/bin> srvctl add database -d matrix_lab -o /u01/app/oracle/product/rdbms/10205
wdlab1: (matrix1) /u01/app/oracle/product/rdbms/10205/bin> srvctl add instance -d matrix_lab -i matrix1 -n wdlab1
wdlab1: (matrix1) /u01/app/oracle/product/rdbms/10205/bin> srvctl add instance -d matrix_lab -i matrix2 -n wdlab2
wdlab1: (matrix1) /home/oracle> srvctl modify database -d matrix_lab -p '/u01/admin/matrix/spfile/spfilematrix.ora' -s open

crsctl stat res -t output showed –

wdlab1: (matrix1) /u01/app/grid/11.2.0/bin> ./crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
......................
......................
ora.matrix_lab.db
      1        OFFLINE OFFLINE
ora.matrix_lab.matrix1.inst
      1        OFFLINE OFFLINE
ora.matrix_lab.matrix2.inst
      1        OFFLINE OFFLINE
.....................

Extract keeps ABENDING – No valid archive log dest for Thread N – OGG

Recently we refreshed one of the 2-node UAT database from 3-node production.The 2-node UAT acts as the source for the GoldenGate.After the refresh of source and the initial load on the target, we started the extract and replicat process.As, the source was 2-node RAC, extract was added with 2 Threads information

ADD EXTRACT ext, TRANLOG, THREADS 2, BEGIN NOW

Everything ran well for almost 2 weeks when all of a sudden extract kept abending and GG error log started showing

2011-11-14 03:44:47  INFO    OGG-01629  Oracle GoldenGate Capture for Oracle, ext.prm:  BOUNDED RECOVERY: PERSISTED OBJECTS RECOVERED: <>.
2011-11-14 03:44:51  WARNING OGG-01423  Oracle GoldenGate Capture for Oracle, ext.prm:  No valid default archive log destination directory found for thread 3.

Though the source was 2-node RAC the alert showed – No valid default archive log destination directory found for thread 3.

On looking at the source db more carefully showed thread 3 in v$thread.Someone had mistakenly added redo log groups with thread 3 and they were in UNUSED state.Once we “dropped” the LOG GROUPs for Thread 3 and restarted the Extract, everything is working fine.

Now,suppose you have source with 3 threads and you mistakenly added extract with 2 threads, you can get the same error in future, then –

1. Take the showch ouput of the existing extract
2. Consider the earliest Recovery checkpoint timestamp to begin the extract
3. ggsci> delete ext
4. ggsci> add extract ext TRANLOG, THREADS 3, BEGIN {Earliest Recovery Checkpiont timestamp taken in step2}
5. ggsci> add exttrail {path}, extract ext, megabytes XXX
6. ggsci> start ext

Incase, if you donot wish to capture the transactions from node N, edit the extract parameter file –

THREADOPTIONS PROCESSTHREADS EXCEPT N –> N is the thread number.

OHASD doesn’t start – 11gR2

Few weeks back had an issue where 2nd node of 4-node RAC got evicted and the alert log showed the below error before the instance was evicted –

Errors in file /u04/oraout/matrix/diag/rdbms/matrix_adc/matrix2/trace/matrix2_ora_8418.trc  (incident=16804):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 169.254.*.* not found. Check output from ifconfig command
Sat Oct 22 23:54:41 2011

ORA-29740: evicted by instance number 2, group incarnation 24
LMON (ospid: 29328): terminating the instance due to error 29740
Sun Oct 23 00:00:01 2011
Instance terminated by LMON, pid = 29328

We tried starting the instance with srvctl and manually using startup command, but both failed.During the startup the interesting thing i noticed was

Private Interface 'bond2' configured from GPnP for use as a private interconnect.
  [name='bond2', type=1, ip=144.xx.xx.xxx, mac=xx-xx-xx-xx-xx-xx, net=144.20.xxx.xxx/xx, mask=255.255.x.x, use=cluster_interconnect/6]

But in normal cases it should have been like

Private Interface 'bond2:1' configured from GPnP for use as a private interconnect.
  [name='bond2:1', type=1, ip=169.254.*.*, mac=xx-xx-xx-xx-xx-xx, net=169.254.x.x/xx, mask=255.255.x.x, use=haip:cluster_interconnect/62]

Now, the question comes up what is “haip”. HAIP is High Availability IP,

Grid automatically picks free link local addresses from reserved 169.254.*.* subnet for HAIP. According to RFC-3927, link local subnet 169.254.*.* should not be used for any other purpose. With HAIP, by default, interconnect traffic will be load balanced across all active interconnect interfaces, and corresponding HAIP address will be failed over transparently to other adapters if one fails or becomes non-communicative. .

The number of HAIP addresses is decided by how many private network adapters are active when Grid comes up on the first node in the cluster . If there’s only one active private network, Grid will create one.Grid Infrastructure can activate a maximum of four private network adapters at a time even if more are defined.

Few commands to check –

$oifcfg iflist -p -n

$crsctl stat res -t -init  --> ora.cluster_interconnect.haip must be ONLINE

$ oifcfg getif

select inst_id,name,ip_address from gv$cluster_interconnects;

We got network team involved, but as per them everything was well on network side, so we finally decided to go for server rebooted, after which OHAS deamon wasn’t coming up automatically, though

$ cat crsstart
enable

TEST:oracle> (matrix2:11.2.0.2_matrix) /etc/oracle/scls_scr/test/root
$ cat ohasdstr
enable

No logs in $GRID_HOME/log/test/ were getting updated, so it was little difficult to diagnose it.As ohasd.bin is responsible to start up all other cluserware processes directly or indirectly, it needs to start up properly for the rest of the stack to come up, which wasn’t happening.

One of the reasons for ohasd not coming up is, if any rc Snncommand script is stuck at OS level

 root      2744     1  0 02:20 ?        00:00:00 /bin/bash /etc/rc.d/rc 3
 root      4888  2744  0 02:30 ?        00:00:00 /bin/sh /etc/rc3.d/S98gcstartup start

This S98gcsstartup was stuck.Checked the script which showed related to OMS startup. Renamed the file and got server rebooted, OHASD and all other resources came up successfully.

$ ls -lrt /etc/rc3.d/old_S98gcstartup
lrwxrwxrwx 1 root root 27 Jun  1 07:09 /etc/rc3.d/old_S98gcstartup -> /etc//rc.d/init.d/gcstartup

There are few other reasons too like ,inaccessible/corrupted OLR , CRS autostart disabled etc.

But still i was unable to find why we got “additional information: requested interface 169.254.*.* not found” all of a sudden when things were running fine.

ORA-01567: dropping log N would leave less than 2 log files for instance UNNAMED_INSTANCE_N (thread N)

Recently we had a database refresh where a standalone db got refreshed from the 4-node RAC. After the completion of the refresh, saw 10 redo log groups.As it was a single instance database and wouldn’t have much workload, thought of dropping 6 groups.


SQL> select GROUP# from v$log;

    GROUP#
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

SQL> alter database drop logfile group 10;
alter database drop logfile group 10
*
ERROR at line 1:
ORA-01567: dropping log 10 would leave less than 2 log files for instance UNNAMED_INSTANCE_4 (thread 4)
ORA-00312: online log 10 thread 4: '/u05/flashback/matrix/MATRIX/onlinelog/o1_mf_10_79t3s884_.log'

As it was single instance database, was little astonished seeing “UNNAMED_INSTANCE_4 (thread 4)”.So started checking for some more clue

SQL> show parameter thread

NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
parallel_threads_per_cpu             integer                           2
thread                               integer                           0

SQL> show parameter cluster

NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
cluster_database                     boolean                           FALSE
cluster_database_instances           integer                           1

Parameter settings seems fine.Lets check what v$thread has to say –


SQL> select thread#,status from v$thread;

   THREAD# STATUS
---------- ------------------
         1 OPEN
         2 CLOSED
         3 CLOSED
         4 CLOSED

This is where the problem is. So, disabled the thread which in turn removed the row from v$thread


SQL> alter database disable thread 4;

Database altered.

SQL> alter database drop logfile group 10;

Database altered.

SQL>  alter database drop logfile group 9;

Database altered.

Similarly, disabled thread 2 and 3 and dropped the redo log group.


SQL> select thread#,status from v$thread;

   THREAD# STATUS
---------- ------------------
         1 OPEN