Enabling Flashback – Be Cautious

Enabling flashback on 11.2.0.4 database can take minutes and cause multiple database wait events degrading database performance.

On a quite busy system, during the low peak hours, I tried to enable flashback on 11.2.0.4.6 version database and multiple sessions waited on

1. buffer exterminate
2. enq: SQ – contention
3. latch: cbc contention
4. SGA: allocation forcing component growth

PRAKANAN_DBA@:1> show parameter sga_target
 
NAME                           TYPE        VALUE
------------------------------ ----------- ---------------
sga_target                     big integer 0

23:35:40 PRAKANAN_DBA@:1> show parameter db_cache_size
NAME                           TYPE        VALUE
------------------------------ ----------- ------------
db_cache_size                  big integer 39G

23:35:53 PRAKANAN_DBA@:1> show parameter shared_pool
 
NAME                           TYPE        VALUE
------------------------------ ----------- -------------
shared_pool_reserved_size      big integer 128M
shared_pool_size               big integer 4G

23:36:21 PRAKANAN_DBA@:1> @free_memory_in_shared_pool.sql
 
POOL         NAME                            BYTES
------------ -------------------------- ----------
shared pool  free memory                 283039728

23:38:48 PRAKANAN_DBA@:1> @shared_pool_detail.sql
 
POOL         NAME                               MB
------------ -------------------------- ----------
shared pool  KGLH0                      785.823174
shared pool  SQLA                        592.82975
shared pool  free memory                269.572815
shared pool  db_block_hash_buckets      268.503906
shared pool  SQLP                       192.214272
shared pool  KKSSP                       191.37767
shared pool  FileOpenBlock              156.083519
shared pool  event statistics per sess  142.083984
shared pool  ksunfy : SSO free list     135.169922
shared pool  KGLHD                      132.602089
shared pool  Checkpoint queue              128.125
shared pool  dirty object counts array         128
shared pool  dbktb: trace buffer          117.1875
shared pool  ASH buffers                       100
shared pool  parameter table block      83.7121429
shared pool  KGLDA                      36.5461655
shared pool  object queue hash buckets          36
shared pool  procs: ksunfy              30.6129456
shared pool  transaction                30.5776443
shared pool  enqueue                    27.7831573

SQL> alter database flashback on;
 
Database altered.

23:41:28 PRAKANAN_DBA@:1> /
 
USERNAME                      SID    SERIAL# SPID       EVENT                          LAST_CALL_ET  WAIT_TIME SECONDS_IN_WAIT STATE               SQL_ID        SQL_CHILD_NUMBER PLAN_HASH_VALUE MODULE
------------------------- ------- ---------- ---------- ------------------------------ ------------ ---------- --------------- ------------------- ------------- ---------------- --------------- -------------------------
APPS_USER_2                  3458       9875 15518      SQL*Net message from client               0          0               0 WAITING             b4pav99kxux32                0      3764792327 SessionV2
PRAKANAN_DBA                 9290      35933 21649      SQL*Net message to client                 0         -1               0 WAITED SHORT TIME   8q6awbsup6jm5                0      1761607409 SQL*Plus
APPS_USER_2                  1963      13699 28348      buffer exterminate                        1          0               0 WAITING             2zrudchusrcwa                3               0 JDBC Thin Client
SQL_240                      7301      36487 28346      buffer exterminate                        1          0               0 WAITING             2zrudchusrcwa                3               0
APPS_USER_2                  7467      12721 28352      buffer exterminate                        1          0               0 WAITING             2zrudchusrcwa                3               0 JDBC Thin Client
APPS_USER_2                  9038      42107 28351      buffer exterminate                        1          0               0 WAITING             2zrudchusrcwa                3               0 JDBC Thin Client
APPS_USER_2                 11153      29227 28354      buffer exterminate                        1          0               0 WAITING             2zrudchusrcwa                3               0 JDBC Thin Client
APPS_USER_2                   438      52879 28356      buffer exterminate                        1          0               0 WAITING             2zrudchusrcwa                3               0 JDBC Thin Client
APPS_USER_2                  1449      29585 3901       read by other session                     0          0               0 WAITING             4n4u2qqbfcmhu                1      1270994832 ResourceMonitor
APPS_USER_2                  5405      26175 6055       latch: cache buffers chains               0         -1               0 WAITED SHORT TIME   4n4u2qqbfcmhu                1      1270994832 ResourceMonitor
APPS_USER_2                  7807       9423 6057       db file sequential read                   0          0               0 WAITING             2ksm378ta54rw                2       231135236 ResourceMonitor
APPS_USER_2                  2319      34579 12665      db file sequential read                   0          0               0 WAITING             4n4u2qqbfcmhu                1      1270994832 ResourceMonitor
APPS_USER_2                   570      11479 5193       read by other session                     0          0               0 WAITING             4n4u2qqbfcmhu                1      1270994832 ResourceMonitor
SYS                          7106      60689 14609      SGA: allocation forcing compon           11          0               0 WAITING             1pbrwgrgmfu40                0               0 sqlplus@
                                                        ent growth                                                                                                                                
APPS_USER_2                   196      22161 5270       read by other session                     0          0               0 WAITING             4n4u2qqbfcmhu                1      1270994832 ResourceMonitor
 
15 rows selected.
 
Elapsed: 00:00:00.45
23:41:29 PRAKANAN_DBA@:1> /
 
USERNAME                      SID    SERIAL# SPID       EVENT                          LAST_CALL_ET  WAIT_TIME SECONDS_IN_WAIT STATE               SQL_ID        SQL_CHILD_NUMBER PLAN_HASH_VALUE MODULE
------------------------- ------- ---------- ---------- ------------------------------ ------------ ---------- --------------- ------------------- ------------- ---------------- --------------- -------------------------
APPS_USER_2                   899      26003 12761      enq: TX - index contention                0          0               0 WAITING             7mvrrf2cr8w26                1      3334570644 ResourceMonitor
PRAKANAN_DBA                 9290      35933 21649      SQL*Net message to client                 0         -1               0 WAITED SHORT TIME   8q6awbsup6jm5                0      1761607409 SQL*Plus
SYS                          7106      60689 14609      SGA: allocation forcing compon           12          0               1 WAITING             1pbrwgrgmfu40                0               0 sqlplus@
                                                        ent growth 
 
Elapsed: 00:00:00.44
23:41:30 PRAKANAN_DBA@:1> /
 
USERNAME                      SID    SERIAL# SPID       EVENT                          LAST_CALL_ET  WAIT_TIME SECONDS_IN_WAIT STATE               SQL_ID        SQL_CHILD_NUMBER PLAN_HASH_VALUE MODULE
------------------------- ------- ---------- ---------- ------------------------------ ------------ ---------- --------------- ------------------- ------------- ---------------- --------------- -------------------------
APPS_USER_2                    45       5745 18731      db file sequential read                   0         -1               0 WAITED SHORT TIME   7mvrrf2cr8w26                0      3334570644 ResourceMonitor
PRAKANAN_DBA                 9290      35933 21649      SQL*Net message to client                 0         -1               0 WAITED SHORT TIME   8q6awbsup6jm5                0      1761607409 SQL*Plus
APPS_USER_2                 10446      23443 14617      SQL*Net message from client               0          5               0 WAITED KNOWN TIME   8tk54xvssnx13                0      3718310410 SessionV2
SYS                          7106      60689 14609      SGA: allocation forcing compon           13          0               2 WAITING             1pbrwgrgmfu40                0               0 sqlplus@
                                                        ent growth

23:42:17 PRAKANAN_DBA@:1> show parameter shared_pool
 
NAME                           TYPE        VALUE
------------------------------ ----------- ----------------------
shared_pool_reserved_size      big integer 128M
shared_pool_size               big integer 4224M

23:42:48 PRAKANAN_DBA@:1> @free_memory_in_shared_pool.sql
 
POOL         NAME                            BYTES
------------ -------------------------- ----------
shared pool  free memory                2147133056

PRAKANAN_DBA@:1> @hidden_param
Enter value for parameter: _memory_imm_mode_without_autosga
Enter value for 1: _memory_imm_mode_without_autosga
old  11: AND    a.ksppinm like DECODE(LOWER('%&parameter%'), 'all', a.ksppinm, LOWER('%&1%'))
new  11: AND    a.ksppinm like DECODE(LOWER('%_memory_imm_mode_without_autosga%'), 'all', a.ksppinm, LOWER('%_memory_imm_mode_without_autosga%'))
 
PARAMETER                             DESCRIPTION                    SESSION_VA INSTANCE_V
------------------------------------- ------------------------------ ---------- ----------
_memory_imm_mode_without_autosga      Allow immediate mode without   TRUE       TRUE
                                      sga/memory target

23:43:18 PRAKANAN_DBA@:1> select start_time,COMPONENT, PARAMETER, INITIAL_SIZE/1024/1024 "INITIAL_SIZE",FINAL_SIZE/1024/1024 "FINAL_SIZE", STATUS from v$SGA_RESIZE_OPS where start_time > sysdate -1 order by 1;
 
START_TIME          COMPONENT                                                        PARAMETER                                                                        INITIAL_SIZE FINAL_SIZE STATUS
------------------- ---------------------------------------------------------------- -------------------------------------------------------------------------------- ------------ ---------- ---------
2016.11.01 23:41:21 shared pool                                                      shared_pool_size                                                                         4096       4096 ERROR
2016.11.01 23:41:21 DEFAULT buffer cache                                             db_cache_size                                                                           39936      39936 ERROR
2016.11.01 23:41:31 DEFAULT buffer cache                                             db_cache_size                                                                           39936      39808 COMPLETE
2016.11.01 23:41:31 shared pool                                                      shared_pool_size                                                                         4096       4224 COMPLETE
 
23:43:33 PRAKANAN_DBA@:1> @shared_pool_resize_ops.sql
 
END                     OPER_TYPE     INITIAL_SIZE TARGET_SIZE FINAL_SIZE
----------------------- ------------- ------------ ----------- ----------
01-Nov-2016 23:41       GROW            4294967296  4429185024 4294967296
01-Nov-2016 23:41       GROW            4294967296  4429185024 4429185024
29-Nov-2015 09:01       STATIC                   0  4294967296 4294967296

23:43:56 PRAKANAN_DBA@:1> @shared_pool_detail.sql
 
POOL         NAME                               MB
------------ -------------------------- ----------
shared pool  free memory                2044.86245
shared pool  db_block_hash_buckets      268.503906
shared pool  KKSSP                      191.963104
shared pool  FileOpenBlock              156.083519
shared pool  event statistics per sess  142.083984
shared pool  ksunfy : SSO free list     135.169922
shared pool  Checkpoint queue              128.125
shared pool  dirty object counts array         128
shared pool  flashback generation buff  124.156067
shared pool  dbktb: trace buffer          117.1875
shared pool  ASH buffers                       100
shared pool  parameter table block      83.8214722
shared pool  object queue hash buckets          36
shared pool  procs: ksunfy              30.6129456
shared pool  transaction                30.5776443
shared pool  enqueue                    27.7831573
shared pool  buffer handles             25.1770096
shared pool  state objects              22.5323486
shared pool  write state object         19.9843826
shared pool  KGLH0                      18.8174286

Alert log :-

alter database flashback on
Starting background process RVWR
Tue Nov 01 23:40:08 2016
RVWR started with pid=6159, OS id=1030
Tue Nov 01 23:40:29 2016
Errors in file /dumps-01/diag/rdbms/db1/db1/trace/db1_ora_29214.trc:
ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K
ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K

Trace file shows --

Scanned into hot portion of LRU
Deadlock because all buffers pinned.
In set=527, bsz=8192, wsi=0, poolid=3, pool_name=DEFAULT
Buffers originally scanned and found pinned = [189 0], evict=189, aged=0
Buffers rescanned on auxiliary list = 0
Buffers rescanned on main list = 140
Buffers in working set 189 (0), hbufs=50, hbmax=50
Buffers on repl list main=189, aux=0
Scanned into hot portion of LRU
Deadlock because all buffers pinned.
In set=528, bsz=8192, wsi=0, poolid=3, pool_name=DEFAULT
Buffers originally scanned and found pinned = [185 1], evict=184, aged=0
Buffers rescanned on auxiliary list = 0
Buffers rescanned on main list = 136
Buffers in working set 185 (0), hbufs=50, hbmax=50
Buffers on repl list main=185, aux=0

*** 2016-11-01 23:40:29.473
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K
ORA-00379: no free buffers available in buffer pool DEFAULT for block size 8K
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

From Metalink Doc id 2195173.1

The normal behavior is that the flashback buffer is only allocated at mount time if flashback is already enabled, otherwise the flashback buffer is allocated later when the Database Administrator decides to enable flashback (either explicitly, or by creating the first restore point) but in the latter case, there is the possibility that the database performance may be impacted as the instance searches for the required amount of contiguous memory in shared pool to allocate to the flashback buffer.

The issue is related to

Bug 24838599 Turning on Flashback Database May Take Minutes to Complete and may cause errors ORA-4031 ORA-379″ Doc ID 24838599.8

Patch 24838599: TURNING ON FLASHBACK DATABASE TAKES 15 MINUTES TO COMPLETE

Post Install Instructions for Bug 24838599 – Turning on Flashback Database May Take Minutes to Complete (Doc ID 2195173.1)

Hope this helps!!

Advertisements

Oracle ASMLib

This blog post is more of a note for myself on configuring ASMLib.

ASMLib is an optional utility that can be used on Linux systems to manage Oracle ASM devices.

ASMLib consists of the following components:

An open source (GPL) kernel module package: kmod-oracleasm
An open source (GPL) utilities package: oracleasm-support
A closed source (proprietary) library package: oracleasmlib

On my Oracle VMBox, I performed the below steps –

1. Check the installed packages related to oracleasm

[root@oel67-12c dev]# rpm -qa | grep oracleasm
[root@oel67-12c dev]#

2. As no rpm were installed, I started installing the packages

[root@oel67-12c dev]# yum install oracleasm-support
Loaded plugins: refresh-packagekit, security, ulninfo
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package oracleasm-support.x86_64 0:2.1.8-1.el6 will be installed
--> Finished Dependency Resolution
..........................
Installed:
  oracleasm-support.x86_64 0:2.1.8-1.el6

Complete!
[root@oel67-12c dev]#
[root@oel67-12c dev]# yum install kmod-oracleasm
Loaded plugins: refresh-packagekit, security, ulninfo
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package kmod-oracleasm.x86_64 0:2.0.8-13.el6_8 will be installed
--> Processing Dependency: kernel >= 2.6.32-642.el6 for package: kmod-oracleasm-2.0.8-13.el6_8.x86_64

...........................
Is this ok [y/N]: y
Downloading Packages:
(1/2): kernel-2.6.32-642.el6.x86_64.rpm
(2/2): kmod-oracleasm-2.0.8-13.el6_8.x86_64.rpm   

...........................

Installed:
  kmod-oracleasm.x86_64 0:2.0.8-13.el6_8

Dependency Installed:
  kernel.x86_64 0:2.6.32-642.el6

Complete!
[root@oel67-12c dev]#

3. Download the oracleasmlib rpm from http://www.oracle.com/technetwork/server-storage/linux/asmlib/rhel6-1940776.html and install it. Without this RPM, you will not have the oracleasm-discover executable


[root@oel67-12c dev]# ls -lrt /media/sf_12cR1/oracleasmlib*
-rwxrwx--- 1 root vboxsf    13336 May 18 21:44 oracleasmlib-2.0.12-1.el6.x86_64.rpm

[root@oel67-12c sf_12cR1]# rpm -ihv oracleasmlib-2.0.12-1.el6.x86_64.rpm
Preparing...                ########################################### [100%]
   1:oracleasmlib           ########################################### [100%]
[root@oel67-12c sf_12cR1]#

4. Verify all the components are installed

[root@oel67-12c sf_12cR1]# rpm -qa | grep oracleasm
oracleasm-support-2.1.8-1.el6.x86_64
oracleasmlib-2.0.12-1.el6.x86_64
kmod-oracleasm-2.0.8-13.el6_8.x86_64
[root@oel67-12c sf_12cR1]# cd 
[root@oel67-12c ~]# which oracleasm
/usr/sbin/oracleasm
[root@oel67-12c ~]#

5. Configure oracleasm — will configure the on-boot properties of the Oracle ASM library driver as well as set the appropriate file permissions.

[root@oel67-12c dev]# oracleasm configure -i
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver.  The following questions will determine whether the driver is
loaded on boot and what permissions it will have.  The current values
will be shown in brackets ('[]').  Hitting  without typing an
answer will keep that current value.  Ctrl-C will abort.

Default user to own the driver interface []: oracle
Default group to own the driver interface []: oinstall
Start Oracle ASM library driver on boot (y/n) [n]: y
Scan for Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: done
[root@oel67-12c dev]#

The configuration setting done above, can be checked in /etc/sysconfig/oracleasm

[root@oel67-12c dev]# ls -lrt /etc/sysconfig/oracleasm
lrwxrwxrwx 1 root root 24 May 18 22:02 /etc/sysconfig/oracleasm -> oracleasm-_dev_oracleasm

6. Check the status

[root@oel67-12c ~]# oracleasm status
Checking if ASM is loaded: no
Checking if /dev/oracleasm is mounted: no
[root@oel67-12c ~]#

7. Load oracleasm module with the init command

[root@oel67-12c ~]# oracleasm init
Creating /dev/oracleasm mount point: /dev/oracleasm
Loading module "oracleasm": oracleasm
Configuring "oracleasm" to use device physical block size
Mounting ASMlib driver filesystem: /dev/oracleasm
[root@oel67-12c ~]#

8. Verify the oracleasm configuration

a. Perform ‘df –ha’, this should show you that /dev/oracleasmfs is mounted.


[root@oel67-12c ~]# df -ha
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_oel6712c-lv_root
                       45G  7.4G   36G  18% /
....................
oracleasmfs              0     0     0    - /dev/oracleasm

b. Execute the ‘lsmod’ command, to show the loaded oracleasm module.

[root@oel67-12c ~]# lsmod | grep oracleasm
Module                  Size  Used by
oracleasm              53591  1

9. Create the partition on disk , if not already done

[root@oel67-12c oracleasm]# lsblk
NAME                           MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                              8:0    0   50G  0 disk
├─sda1                           8:1    0  500M  0 part /boot
└─sda2                           8:2    0 49.5G  0 part
  ├─vg_oel6712c-lv_root (dm-0) 252:0    0 45.6G  0 lvm  /
  └─vg_oel6712c-lv_swap (dm-1) 252:1    0  3.9G  0 lvm  [SWAP]
sdb                              8:16   0   10G  0 disk
sdc                              8:32   0   12G  0 disk
sdd                              8:48   0   12G  0 disk
sde                              8:64   0   10G  0 disk
sdf                              8:80   0   12G  0 disk
sr0                             11:0    1 55.5M  0 rom  /media/VBOXADDITIONS_5.0.20_106931


[root@oel67-12c disks]# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x50e6a869.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4):
Value out of range.
Partition number (1-4): 1
First cylinder (1-1305, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-1305, default 1305):
Using default value 1305

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@oel67-12c disks]#
[root@oel67-12c disks]#
[root@oel67-12c disks]#
[root@oel67-12c disks]# lsblk
NAME                           MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                              8:0    0   50G  0 disk
├─sda1                           8:1    0  500M  0 part /boot
└─sda2                           8:2    0 49.5G  0 part
  ├─vg_oel6712c-lv_root (dm-0) 252:0    0 45.6G  0 lvm  /
  └─vg_oel6712c-lv_swap (dm-1) 252:1    0  3.9G  0 lvm  [SWAP]
sdb                              8:16   0   10G  0 disk
└─sdb1                           8:17   0   10G  0 part
sdc                              8:32   0   12G  0 disk
sdd                              8:48   0   12G  0 disk
sde                              8:64   0   10G  0 disk
sdf                              8:80   0   12G  0 disk
sr0                             11:0    1 55.5M  0 rom /media/VBOXADDITIONS_5.0.20_106931

Perform the same above steps for remaining disk.

10. Once the disk are created, label the disks as ASM disks and give then an ASM name.


[root@oel67-12c disks]# oracleasm createdisk data0101 /dev/sdb1
Writing disk header: done
Instantiating disk: done
[root@oel67-12c disks]# oracleasm createdisk data0102 /dev/sde1
Writing disk header: done
Instantiating disk: done
[root@oel67-12c disks]# ls -lrt
total 0
brw-rw---- 1 oracle oinstall 8, 17 May 18 22:49 DATA0101
brw-rw---- 1 oracle oinstall 8, 65 May 18 22:50 DATA0102

11. Check the configured disk using listdisks

[root@oel67-12c disks]# oracleasm listdisks
DATA0101

12. You can query the disk to check if its a valid ASM Disk

[root@oel67-12c disks]# oracleasm querydisk -d DATA0101
Disk "DATA0101" is a valid ASM disk on device [8,17]

[root@oel67-12c disks]# more /proc/partitions
major minor  #blocks  name
   8    17   10482381 sdb1

[root@oel67-12c sbin]# blkid|grep asm
/dev/sdb1: LABEL="DATA0101" TYPE="oracleasm"
/dev/sde1: LABEL="DATA0102" TYPE="oracleasm"
[root@oel67-12c sbin]#

The [8,17] is the major,minor number of the device with this partition. The value comes from /proc/partitions

major — The major number of the device with this partition. The major number in the /proc/partitions, (3), corresponds with the block device ide0, in /proc/devices.

minor — The minor number of the device with this partition. This serves to separate the partitions into different physical devices and relates to the number at the end of the name of the partition.

#blocks — Lists the number of physical disk blocks contained in a particular partition.

name — The name of the partition.

13. oracleasm-discover


[root@oel67-12c sbin]# oracleasm-discover
-bash: /usr/sbin/oracleasm-discover: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

For the above error install/lib/ld-linux.so.2

[root@oel67-12c ~]# yum install /lib/ld-linux.so.2
Loaded plugins: refresh-packagekit, security, ulninfo
Setting up Install Process
public_ol6_latest/filelists                                                                                                                                                             |  61 MB     00:43
Resolving Dependencies
--> Running transaction check
---> Package glibc.i686 0:2.12-1.192.el6 will be installed
Dependencies Resolved

..................
Complete!
[root@oel67-12c ~]#

[root@oel67-12c ~]# ls -lrt /lib/ld-linux.so.2
lrwxrwxrwx 1 root root 10 May 18 23:25 /lib/ld-linux.so.2 -> ld-2.12.so
[root@oel67-12c ~]#
[root@oel67-12c ~]# oracleasm-discover
Using ASMLib from /opt/oracle/extapi/32/asm/orcl/1/libasm.so
[ASM Library - Generic Linux, version 2.0.4 (KABI_V2)]
Discovered disk: ORCL:DATA0101 [20964762 blocks (10733958144 bytes), maxio 512]
Discovered disk: ORCL:DATA0102 [20964762 blocks (10733958144 bytes), maxio 512]
[root@oel67-12c ~]#

latch: cache buffers chains and rollback

I had an interesting encounter with latch: cbc contention early this week. During my oncall I received page for Load of 206.81 exceeded threshold of 150. After I logged into server , the server load average was continously increasing and all the top PIDs were of oracle processes. After logging into database, I saw multiple sessions waiting on latch: cache buffer chains wait event

load average: 258.52, 244.27, 226.15

select username,sql_id,event,count(*) from v$session where event like 'latch%' group by username,sql_id,event order by 4;
 
USERNAME                  SQL_ID        EVENT                            COUNT(*)
------------------------- ------------- ------------------------------ ----------
USER_EAS4                5j9xrj07xqysz latch free                              1
USER_EAS7                4f6dfhuxbz18y latch free                              1
USER_EAS7                a1wwm6tpukyh8 latch free                              1
USER_EAS7                c1sa78t1rvxkq latch free                              1
USER_EAS7                gc4xfk544n98v latch free                              2
USER_EAS7                bmhmdqvt9c2qz latch free                              2
USER_EAS7                4ycfn45y3vj5p latch free                              8
USER_EAS7                4ycfn45y3vj5p latch: cache buffers chains            23
USER_EAS7                9cg18w6mfqphn latch: cache buffers chains            51
USER_EAS7                9cg18w6mfqphn latch free                            144

sql monitor report of one session for sqlid 9cg18w6mfqphn

Global Stats
=======================================================
| Elapsed |   Cpu   | Concurrency |  Other   | Buffer |
| Time(s) | Time(s) |  Waits(s)   | Waits(s) |  Gets  |
=======================================================
|    2243 |    2132 |        0.00 |      111 |   260M |
=======================================================
 
SQL Plan Monitoring Details (Plan Hash Value=2956480150)
=============================================================================================================================================================
| Id   |               Operation               |         Name         |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail |
|      |                                       |                      | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |
=============================================================================================================================================================
|    0 | SELECT STATEMENT                      |                      |         |      |           |        |     1 |          |          |                 |
|    1 |   FILTER                              |                      |         |      |           |        |     1 |          |          |                 |
|    2 |    PARTITION RANGE SINGLE             |                      |      18 |  133 |           |        |     1 |          |          |                 |
|    3 |     TABLE ACCESS BY LOCAL INDEX ROWID | MESSAGE_DEMO_CTC     |      18 |  133 |           |        |     1 |          |          |                 |
| -> 4 |      INDEX RANGE SCAN                 | MESSAGE_DEMO_CTC_ID6 |      18 |  133 |      2246 |     +1 |     1 |        0 |   100.00 | Cpu (2194)      |
=============================================================================================================================================================

AWR performance trend for the sqlid for past 5 days:-

@awr_sqlid_perf_trend 9cg18w6mfqphn 5 .25
 
TIME                 EXECUTIONS   ET_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC
------------------- ----------- ------------ ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- -----------------
23.09.2015 00:15:00          12      117.923          117.667           .001           .000           .000           .000                 .000      23049569.333              .083
23.09.2015 00:30:00          14     1284.493         1280.963           .002           .000           .000           .000                 .000     243817703.571             1.214
23.09.2015 00:45:00          26     1579.532         1574.306           .016           .000           .000           .000                 .000              .000             5.962
23.09.2015 01:00:00          41     1729.918         1718.699           .002           .000           .000           .000                 .000      26916357.415              .317
23.09.2015 01:15:00          26     3850.767         3739.324           .011           .000           .000           .000                 .000              .000             1.731
23.09.2015 01:30:00          27     4565.527         3868.424           .017           .000           .000           .000                 .000      82060590.481             4.407
 

At the first glance it seemed to me as the sql was newly introduced to the database as AWR showed it from the same day, even though I had asked from 5 days snapshots.

The buffer gets per execution for the sql was too high and with multiple sessions with the same sqlid caused latch cbc and in turn high load. So, had to find what was causing this high buffer gets.

The execution plan for sql looked fine. The stats were fine and the index picked by the optimizer was fine too.At this point came in “latchprofx” by Tanel Poder 🙂

SELECT * FROM (
        SELECT
            event
          , TRIM(TO_CHAR(p1, 'XXXXXXXXXXXXXXXX')) latch_addr
          , TRIM(ROUND(RATIO_TO_REPORT(COUNT(*)) OVER () * 100, 1))||'%' PCT
          , COUNT(*)
        FROM
            v$active_session_history
        WHERE
           event = 'latch: cache buffers chains'
       AND session_state = 'WAITING'
       GROUP BY
           event
        , p1
       ORDER BY
           COUNT(*) DESC
   )
   WHERE ROWNUM <= 10
  /  

EVENT                                  LATCH_ADDR        PCT              COUNT(*)
-----------------------------------  ----------------- ---------------- ----------
latch: cache buffers chains           7B9A57DB50        92.2%                 448
latch: cache buffers chains           7AE4EBCD68        2.5%                   12
latch: cache buffers chains           7B682705B8        1%                      5
latch: cache buffers chains           7B6FB0AFD0        .8%                     4
latch: cache buffers chains           7ADA39D6B8        .4%                     2
latch: cache buffers chains           7AE4D206F8        .2%                     1
latch: cache buffers chains           7B4A1A04B0        .2%                     1
latch: cache buffers chains           7B475726C0        .2%                     1
latch: cache buffers chains           7ACB819888        .2%                     1
latch: cache buffers chains           7AC6958BA8        .2%                     1

@latchprofx sid,name,sqlid,object % 7B9A57DB50 100000
 
-- LatchProfX 1.21 by Tanel Poder ( http://www.tanelpoder.com )
 
  SID NAME                                SQLID                      OBJECT       Held       Gets  Held %     Held ms Avg hold ms
----- ----------------------------------- --------------- ----------------- ---------- ---------- ------- ----------- -----------
 7155 cache buffers chains                9cg18w6mfqphn            1ECA461B          4          4     .00       2.025        .506
 1696 cache buffers chains                4ycfn45y3vj5p            1ECA461B          3          3     .00       1.519        .506
 8689 cache buffers chains                9cg18w6mfqphn            1ECA461B          3          3     .00       1.519        .506
 3809 cache buffers chains                9cg18w6mfqphn            1ECA461B          3          3     .00       1.519        .506
 3969 cache buffers chains                9cg18w6mfqphn            1ECA461B          3          3     .00       1.519        .506
 2742 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 2482 cache buffers chains                4ycfn45y3vj5p            1ECA461B          2          2     .00       1.013        .506
  439 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 1514 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 6894 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 6524 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 6101 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 3545 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506

All rows point to object “1ECA461B”

@dba 1ECA461B
 
    RFILE#     BLOCK# DUMP_CMD
---------- ---------- ------------------------------------------------
       123     673307 -- alter system dump datafile 123 block 673307

select name from v$datafile where file#=123;
 
NAME
---------------------------------------------------------------
+DATA01/usrprd04/datafile/undotbs1.1134.818027637

Ok, so something related to undo!!!!.

The alert log showed ORA-01555 for an Insert SQL in the same table

Wed Sep 23 02:44:32 2015
ORA-01555 caused by SQL statement below (SQL ID: 72pwjk9bxyygw, Query Duration=13521 sec, SCN: 0x0891.8fff8587):
 
                        INSERT INTO message_demo_ctc (.....)  select ........
						
Wed Sep 23 02:45:26 2015
Completed checkpoint up to RBA [0x2a16.2.10], SCN: 9421305667060

User/Pid                   Megs Used MINUTES_ACTIVE MODULE                                             Recs upd/del
-------------------- --------------- -------------- -------------------------------------------------- ------------
USER_EAS7/(3858)             392.82            264 cli@mesagae_demo_ctc.php                              3,945,675

Once the rollback completed, eventhing was back to normal. The buffer gets for sqlid ‘9cg18w6mfqphn’ was down to 24,950 from ~652617k. We had waited for almost 30-40mins for the rollback, could have killed the SID and SMON would have made rollback using parallel processes, which could have been faster.

ORDIM – Oracle Multimedia – Invalid after upgrade

ORDIM – The Oracle Multimedia component might be in INVALID state after the database upgrade to 12c. While I was testing the upgrade on test server, received below errors after the catctl.pl completion.

Serial   Phase #:69 Files: 1 Use of uninitialized value $gsRTInclusion in concatenation (.) or string at catctl.pl line 1500.
    Time: 49s

------------------------------------------------------
Phases [0-73]         End Time:[2015_08_08 08:06:45]
------------------------------------------------------

Grand Total Time: 4848s

*** WARNING: ERRORS FOUND DURING UPGRADE ***

Due to errors found during the upgrade process, the post
upgrade actions in catuppst.sql have not been automatically run.

*** THEREFORE THE DATABASE UPGRADE IS NOT YET COMPLETE ***

 1. Evaluate the errors found in the upgrade logs
    and determine the proper action.
 2. Execute the post upgrade script as described in Chapter 3
    of the Database Upgrade Guide.


REASON:
catuppst.sql unable to run in Database: TESTSX Id: 0
        ERRORS FOUND: during upgrade CATCTL ERROR COUNT=7
------------------------------------------------------
Identifier ORDIM 15-08-08 07:56:26 Script = /oracle/app/product/12.1.0.2/rdbms/admin
ERROR = [ORA-06550: line 76, column 17: PL/SQL: ORA-00942: table or view does not exist
ORA-06550: line 76, column 5:
PL/SQL: SQL Statement ignored
]

Multiple insert and update sqls failed for tables in MDSYS schema.At this point, dba_registry showed

SQL> @dba_registry

COMP_ID  SCHEMA       STATUS       VERSION      COMP_NAME
-------- ------------ ------------ ------------ -----------------------------------
APS      SYS          REMOVED      11.1.0.7.0   OLAP Analytic Workspace
CATALOG  SYS          UPGRADED     12.1.0.2.0   Oracle Database Catalog Views
CATJAVA  SYS          VALID        12.1.0.2.0   Oracle Database Java Packages
CATPROC  SYS          INVALID      12.1.0.2.0   Oracle Database Packages and Types
CONTEXT  CTXSYS       VALID        12.1.0.2.0   Oracle Text
JAVAVM   SYS          VALID        12.1.0.2.0   JServer JAVA Virtual Machine
ORDIM    ORDSYS       INVALID      12.1.0.2.0   Oracle Multimedia
OWM      WMSYS        VALID        12.1.0.2.0   Oracle Workspace Manager
RAC      SYS          OPTION OFF   12.1.0.2.0   Oracle Real Application Clusters
SDO      MDSYS        REMOVED      11.1.0.7.0   Spatial
XDB      XDB          VALID        12.1.0.2.0   Oracle XML Database
XML      SYS          VALID        12.1.0.2.0   Oracle XDK
XOQ      SYS          REMOVED      11.1.0.7.0   Oracle OLAP API

Running UTLRP.sql would show below in the end :-

Locator INVALID OBJECTS: CS_SRS_TRIGGER – INVALID – TRIGGER
Locator INVALID OBJECTS: OGIS_CRS_DELETE_TRIGGER – INVALID – TRIGGER
Locator INVALID OBJECTS: OGIS_CRS_INSERT_TRIGGER – INVALID – TRIGGER

The reason for this is the missing VIEW “MDSYS”.”SDO_COORD_REF_SYSTEM”.

CREATE OR REPLACE FORCE NONEDITIONABLE VIEW MDSYS.SDO_COORD_REF_SYSTEM ("SRID", "COORD_REF_SYS_NAME",
"COORD_REF_SYS_KIND", "COORD_SYS_ID", "DATUM_ID", "GEOG_CRS_DATUM_ID", "SOURCE_GEOG_SRID", "PROJECTION_CONV_ID",
"CMPD_HORIZ_SRID","CMPD_VERT_SRID", "INFORMATION_SOURCE", "DATA_SOURCE", "IS_LEGACY", "LEGACY_CODE", "LEGACY_WKTEXT",
"LEGACY_CS_BOUNDS", "IS_VALID", "SUPPORTS_SDO_GEOMETRY") AS SELECT SRID,COORD_REF_SYS_NAME,COORD_REF_SYS_KIND,COORD_SYS_ID,
DATUM_ID,GEOG_CRS_DATUM_ID,SOURCE_GEOG_SRID,PROJECTION_CONV_ID,CMPD_HORIZ_SRID,CMPD_VERT_SRID,INFORMATION_SOURCE,DATA_SOURCE,
IS_LEGACY,LEGACY_CODE,LEGACY_WKTEXT,LEGACY_CS_BOUNDS,IS_VALID,SUPPORTS_SDO_GEOMETRY FROM MDSYS.SDO_COORD_REF_SYS;

After creating the view, recompile the triggers and run utlrp.

grant SELECT on MDSYS.SDO_COORD_REF_SYSTEM to PUBLIC;

ALTER TRIGGER MDSYS.CS_SRS_TRIGGER COMPILE;
show errors
ALTER TRIGGER MDSYS.OGIS_CRS_DELETE_TRIGGER COMPILE;
show errors
ALTER TRIGGER MDSYS.OGIS_CRS_INSERT_TRIGGER COMPILE;
show errors

On production, I created the view as part of pre-task and then upgraded the database.The only change in view ddl was I had to remove “NONEDITIONABLE” keyword to create it as compatible parameter was set to 11.2.0.3.

On Production

SQL> @find_obj
Enter value for object_name: SDO_COORD_REF_SYSTEM
old   1: select owner,object_name,object_id,object_type,status,created,last_ddl_time from dba_objects where object_name=upper('&object_name')
new   1: select owner,object_name,object_id,object_type,status,created,last_ddl_time from dba_objects where object_name=upper('SDO_COORD_REF_SYSTEM')

OWNER        OBJECT_NAME                OBJECT_ID OBJECT_TYPE        STATUS  CREATED                   LAST_DDL_TIME
------------ ------------------------- ---------- ------------------ ------- ------------------------- -------------------------
PUBLIC       SDO_COORD_REF_SYSTEM        39533538 SYNONYM            VALID   18-08-15 01:30:26         18-08-15 01:30:26
MDSYS        SDO_COORD_REF_SYSTEM        39500022 VIEW               VALID   17-08-15 18:28:03         18-08-15 01:30:26

ORA-00353: log corruption near block change time – 11.2.0.3

Recently I came across “ORA-00353: log corruption near block change time” error.

Errors in file /oracle/app/diag/rdbms/test/test/trace/test_ms00_54284.trc  (incident=1490189):
ORA-00353: log corruption near block  change  time
ORA-00353: log corruption near block 4631374 change 263470473806052 time 04/16/2015 06:53:16
ORA-00312: online log 43 thread 3: '+REDO02/test/onlinelog/group_43.258.875922665'
ORA-00312: online log 43 thread 3: '+REDO01/test/onlinelog/group_43.258.875922633'
Incident details in: /oracle/app/diag/rdbms/test/test/incident/incdir_1490189/test_ms00_54284_i1490189.trc
Thu Apr 16 08:14:42 2015
Sweep [inc][1490188]: completed
Thu Apr 16 08:14:55 2015
LOGMINER: WARNING: error 353 encountered, failed to read corrupt logfile +REDO01/test/onlinelog/group_43.258.875922633

As per ORA-354 ORA-353 and ORA-312 on Redo Log Group members (Doc ID 332672.1)

Possible corruption in Online Redo Log File Members in a Redo Log Group.

If Oracle detects a corruption when archiving a log member, it will try to read the same redo block from the second member of the group. If the block is found corrupt in all the members of the redo log group, archiving will not proceed.

I started validating the online redo logfile member and luckily none of the members were corrupt. Incase you find corrupt members please refer to above mentioned Doc “Doc ID 332672.1” for solution.

As a precaution I validated all the online redo logfiles to confirm for corruption. If the redo log file is corrupt it should produce an error.

SQL> alter system dump logfile '+REDO01/test/onlinelog/group_43.258.875922633' validate;

System altered.

Elapsed: 00:01:21.89

SQL> alter system dump logfile '+REDO02/test/onlinelog/group_43.258.875922665' validate;

System altered.

Elapsed: 00:01:10.61

Interestingly I noticed trace file “test_ms00_54284.trc” of size 49Gb and continuously increasing. The file was being continuously written by GoldenGate process.

The issue seemed similar to one mentioned in

Bug 13840711 – ORA-353 in Standby / Streams Data Capture or ORA-272 in PRIMARY: Redo log corruption by ASYNC redo shipping (Doc ID 13840711.8)

Corrupt redo blocks are read from in memory log buffer (not from disk) by ASYNC redo shipping or hotmining clients.

ORA-353 may be reported by the process reading the redo log block from in memory log buffer (e.g. Logminer Reader – MSnn background process, Streams data capture).

Looking at the goldengate status showed

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
JAGENT      RUNNING
EXTRACT     RUNNING     DPUMP       00:00:00      00:00:03
EXTRACT     RUNNING     EXTL        02:05:36      00:00:07
REPLICAT    RUNNING     REPL        00:00:00      00:00:06

Meanwhile we received another error, this time the extract process which failed reading from online redo logfile, failed reading from archivelog file too. I checked the standby database, and it was insync with Primary.

Errors in file /oracle/app/diag/rdbms/test/test/trace/test_ms00_54284.trc (incident=1490190):
ORA-00353: log corruption near block 4631374 change 605163911323 time 01/06/1988 07:26:12
ORA-00334: archived log: ‘+FLASH01/test/archivelog/2015_04_16/thread_3_seq_9370.7790.877158359′

Incident details in: /oracle/app/diag/rdbms/test/test/incident/incdir_1490190/test_ms00_54284_i1490190.trc
Thu Apr 16 09:19:13 2015
Dumping diagnostic data in directory=[cdmp_20150416091913], requested by (instance=1, osid=54284 (MS00)), summary=[incident=1490190].
Thu Apr 16 09:19:27 2015
Errors in file /oracle/app/diag/rdbms/test/test/trace/test_ms00_54284.trc (incident=1490191):
ORA-00356: inconsistent lengths in change description
ORA-00353: log corruption near block 4631374 change 605163911323 time 01/06/1988 07:26:12
ORA-00334: archived log: ‘+FLASH01/test/archivelog/2015_04_16/thread_3_seq_9370.7790.877158359’
Incident details in: /oracle/app/diag/rdbms/test/test/incident/incdir_1490191/test_ms00_54284_i1490191.trc

At this point, client stopped all the GoldenGate processes, deleted the trace file and started the processes.I started monitoring the Extract Process

 send extract EXTL status

Sending STATUS request to EXTRACT EXTL ...


EXTRACT EXTL (PID 102390)
  Current status: In recovery[1]: Processing data

  Current read position:
  Redo thread #: 3
  Sequence #: 9351
  RBA: 4023516192
  Timestamp: 2015-04-16 03:37:35.000000
  SCN: 140.3838686301
  Current write position:
  Sequence #: 1637
  RBA: 1551
  Timestamp: 2015-04-16 09:31:56.423132
  Extract Trail: /oracle2/app/product/ogg12.1.2/dirdat/capture/sd

Monitored the Extract, and it was successfully able to read and extract from SEQ 9370, for which it had earlier failed.

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
JAGENT      RUNNING
EXTRACT     RUNNING     DPUMP       00:00:01      00:00:07
EXTRACT     RUNNING     EXTL        00:00:01      00:00:00
REPLICAT    RUNNING     REPL        00:00:00      00:00:06

send extract EXTL status

Sending STATUS request to EXTRACT EXTL ...


EXTRACT EXTL (PID 102390)
  Current status: Recovery complete: Processing data

  Current read position:
  Redo thread #: 3
  Sequence #: 9395
  RBA: 1721029512
  Timestamp: 2015-04-16 12:08:00.000000
  SCN: 140.3917853576
  Current write position:
  Sequence #: 1641
  RBA: 997030325
  Timestamp: 2015-04-16 12:07:57.056673
  Extract Trail: /oracle2/app/product/ogg12.1.2/dirdat/capture/sd

lag extract EXTL

Sending GETLAG request to EXTRACT EXT1OL ...
Last record lag 0 seconds.

lag extract DPUMP

Sending GETLAG request to EXTRACT DPUMP ...
Last record lag 2 seconds.
At EOF, no more records to process.

Current DB version is 11.2.0.3.13. As mentioned in the doc, fix for bug 13840711 is first included in 11.2.0.3.14 (Apr 2015) Database Patch Set Update (DB PSU) , we do plan to apply it in coming weeks.

ORA-01628: max # extents (32765) reached for rollback segment _SYSSMUxxx

Recently I came across “ORA-01628: max # extents (32765)” error. The database had resumable_timeout parameter set I received page for a session in resumable state. Looking into the alert log showed

statement in resumable session ‘User USER_E3(375), Session 9028, Instance 1’ was suspended due to
ORA-01628: max # extents (32765) reached for rollback segment _SYSSMU375_247595031$
Wed Apr 08 02:41:42 2015

Looking at the sid# 9028 details, INSERT sql was running and the session was on waiting on “statement suspended, wait error to be cl” as resumable_timeout parameter was set.

PRD04> @undo_stats
 
Undo Stats for Automatic Undo Management for Last 3 Hours
 
BEGIN_TIME         END_TIME           ACTIVEBLKS UNEXPIREDBLKS EXPIREDBLKS TUNED_UNDORETENTION   TXNCOUNT MAXQUERYLEN NOSPACEERRCNT
------------------ ------------------ ---------- ------------- ----------- ------------------- ---------- ----------- -------------
07-APR-15          08-APR-15              207496      22622976     6746320               46384     670317       11587             0
08-APR-15          08-APR-15              204488      22562776     6797160               46215     736330       10327             0
08-APR-15          08-APR-15              119832      22329656     7295296               46251     645728       10934             0
08-APR-15          08-APR-15              171992      22368920     7304336               46010     532932       11539             0
08-APR-15          08-APR-15              109968      22591672     7245352               46057     649534       12145             0
08-APR-15          08-APR-15              113016      22505688     7349552               45925     679114       12751             0
08-APR-15          08-APR-15              218048      22355184     7317048               46147     665002       13357             0
08-APR-15          08-APR-15              242368      22534704     7360800               46188     695974       13962             0
08-APR-15          08-APR-15              298368      22489016     7546968               45427     622305        5092             0
08-APR-15          08-APR-15               98792      22417864     7941104               44771     587418         702             0
08-APR-15          08-APR-15              151888      22744560     7860176               43790     703158        1308             0
08-APR-15          08-APR-15              353272      22606848     8085272               42483     670949        1913             0
08-APR-15          08-APR-15              679376      21995016     8654552               40691     662446        1245             0
08-APR-15          08-APR-15             1172864      21419400     8860792               39531     780894        1851             0
08-APR-15          08-APR-15             1093784      21696040     8664424               36463          0        2457             0
08-APR-15          08-APR-15             1374528      21315992     8761640               33362     733162        3062             0
08-APR-15          08-APR-15             1433128      21166848     8852944               31892     755982        3668           158
08-APR-15          08-APR-15             1454984      21040312     8960960               31767     692704        4274           293
08-APR-15          08-APR-15             1454984      21040312     8960960               31596     266300        4577           108
 
19 rows selected.

Transactions Using Rolback:
 
RBS        OS User         DB User         SID   Schema     Object Name          Type       Used RBS Blocks  # of Records
---------- --------------- --------------- ----- ---------- -------------------- ---------- ---------------- ------------------
...................
...................
_SYSSMU375 root            USER_E3         9028  ADMIN_E TESTS             TABLE PART 385255           23794634
_SYSSMU375 root            USER_E3         9028  ADMIN_E TESTS             TABLE      385255           23794634
...................
...................

26 rows selected.

 
PRD04> show parameter undo
 
NAME                                 TYPE       VALUE
------------------------------------ ---------- ------------------------------
undo_management                      string     AUTO
undo_retention                       integer    7200
undo_tablespace                      string     UNDOTBS4

PRD04> @undo_extent_stats
 
STATUS                                 MB       PERC
------------------------------ ---------- ----------
ACTIVE                              11741          3
EXPIRED                            225670         49
UNEXPIRED                          170579         37

Looking at the sessions with active transactions

PRD04> SELECT s.sid, s.serial#, s.username, u.segment_name, count(u.extent_id) "Extent Count", t.used_ublk, t.used_urec, s.program
FROM v$session s, v$transaction t, dba_undo_extents u
WHERE s.taddr = t.addr and u.segment_name like '_SYSSMU'||t.xidusn||'_%$' and u.status = 'ACTIVE'
GROUP BY s.sid, s.serial#, s.username, u.segment_name, t.used_ublk, t.used_urec, s.program
ORDER BY t.used_ublk desc, t.used_urec desc, s.sid, s.serial#, s.username, s.program;
 
    SID    SERIAL# USERNAME               SEGMENT_NAME                   Extent Count  USED_UBLK  USED_UREC PROGRAM
------- ---------- ---------------------- ------------------------------ ------------ ---------- ---------- ------------------------------------------------
   9028      15075 USER_E3                _SYSSMU375_247595031$                 32726     389285   24043132 php@script01.example.com
   2317      38231 USER_E2                _SYSSMU344_2152184027$                   11       8303     509953 php@script06.example.com
   8930      34713 USER_E3                _SYSSMU366_2128258416$                    6       4930     401890 php@script02.example.com
   5469       3267 USER_E4                _SYSSMU362_25371301$                      3       1704     116802 php@script02.example.com
..........
..........

13 rows selected

As per Metalink Doc “Troubleshooting ORA-1628 – max # extents (32765) reached for rollback segment (Doc ID 1580182.1)”

With automatic undo, you have no control over the extent sizes, nor do you have the ability to shrink them. This all happens automatically. The system automatically decides on the extent size, however it will generally start allocating larger extents if an undo segment is extending a lot. The large number of extents in the rollback segment is likely due to fragmentation in the undo tablespace: Oracle can probably only allocate extents of 64k due to fragmentation, so it is very probable to hit the max extents issue.

The maximum number of extents for undo segments is limited to 32K and a long/large running transaction can exhaust this limit by adding new extents if the next extent of the current one is not an expired one and finally will receive ORA-1628.

So, after getting the ORA-1628 errors in the transaction that extended the undo segment until its limit, future transactions will not be allowed to bind to the undo segment until it is not shrinked (you may see that the number of extents is decreased).

So, The two major causes of ORA-1628 issue are a very large transaction or undo tablespace fragmentation.

In case of large transaction, That is solved by splitting the large transaction to smaller ones (e.g. frequent commits).

In case of undo tablespace fragmentation, That is solved by recreating the undo tablespace (this is also the recommended solution of Bug 10330444 and Bug 10229998 which were filed for the same issue and closed as not a bug).

The UNDO tablespace had around 10Gb free, so adding space to UNDO wouldn’t resolve the issue. We informed the client’s Dev team and it was taken care.

Below are few sqls related to UNDO which I think can be useful while diagnosing issues


col "Parameter" format A32
col "Session Value" format A32
col "Instance Value" format A32
select a.inst_id, a.ksppinm "Parameter",
b.ksppstvl "Session Value",
c.ksppstvl "Instance Value"
from x$ksppi a, x$ksppcv b, x$ksppsv c
where a.indx = b.indx and a.indx = c.indx
and a.inst_id=b.inst_id and b.inst_id=c.inst_id
and a.ksppinm in ('_smu_debug_mode', '_highthreshold_undoretention')
order by 2;

   INST_ID Parameter			    Session Value		     Instance Value
---------- -------------------------------- -------------------------------- --------------------------------
	 1 _highthreshold_undoretention     4294967294			     4294967294
	 1 _smu_debug_mode		    0				     0


select segment_name,
   round(nvl(sum(act),0)/(1024*1024*1024),3 ) "ACT GB BYTES",
   round(nvl(sum(unexp),0)/(1024*1024*1024),3) "UNEXP GB BYTES",
   round(nvl(sum(exp),0)/(1024*1024*1024),3) "EXP GB BYTES",
   NO_OF_EXTENTS
   from ( select segment_name, nvl(sum(bytes),0) act,00 unexp, 00 exp, count(*) NO_OF_EXTENTS
   from DBA_UNDO_EXTENTS
   where status='ACTIVE' and tablespace_name = 'UNDOTBS4'
   group by segment_name
   union
   select segment_name,00 act, nvl(sum(bytes),0) unexp, 00 exp , count(*) NO_OF_EXTENTS
   from DBA_UNDO_EXTENTS
   where status='UNEXPIRED' and tablespace_name = 'UNDOTBS4'
   group by segment_name
   union
   select segment_name, 00 act, 00 unexp, nvl(sum(bytes),0) exp, count(*) NO_OF_EXTENTS
   from DBA_UNDO_EXTENTS
   where status='EXPIRED' and tablespace_name = 'UNDOTBS4'
   group by segment_name
   ) group by segment_name, NO_OF_EXTENTS having NO_OF_EXTENTS >= 30 order by 5 desc;

SEGMENT_NAME		       ACT GB BYTES UNEXP GB BYTES EXP GB BYTES NO_OF_EXTENTS
------------------------------ ------------ -------------- ------------ -------------
_SYSSMU375_247595031$			  0	    10.082	      0 	32765
_SYSSMU313_3664260743$			  0	     1.209	      0 	 2925
_SYSSMU316_1833322353$			  0		 0	   .607 	 2299
_SYSSMU367_1976067076$			  0		 0	   .788 	 2003
_SYSSMU577_2158293664$			  0		 0	  1.186 	 1710
_SYSSMU503_170622332$			  0		 0	   .604 	 1617
_SYSSMU330_231237204$			  0	      .899	      0 	 1523
_SYSSMU336_3955560249$			  0	     1.375	      0 	 1240
_SYSSMU5059_1468117272$ 		  0		.5	      0 	 1089
_SYSSMU306_1485117688$			  0	    12.145	      0 	  964

We can now take look at extent details for the specific undo segment which caused error. The undo segment was primarily made up of tiny (64 KB) extents

PRD04> break on report
PRD04> compute sum label Total of Extent_Count Extent_MB on report
PRD04> col Extent_MB format 999,999.00
PRD04> SELECT segment_name, bytes/1024 "Extent_Size_KB", count(extent_id) "Extent_Count", bytes * count(extent_id) / power(1024, 2) "Extent_MB" FROM dba_undo_extents WHERE segment_name = '_SYSSMU375_247595031$' group by segment_name, bytes order by 1, 3 desc;

SEGMENT_NAME		       Extent_Size_KB Extent_Count   Extent_MB
------------------------------ -------------- ------------ -----------
_SYSSMU375_247595031$			   64	     31858    1,991.13
_SYSSMU375_247595031$			 8192	       493    3,944.00
_SYSSMU375_247595031$			 4096	       178	712.00
_SYSSMU375_247595031$			 2048		23	 46.00
_SYSSMU375_247595031$			 1024		22	 22.00
_SYSSMU375_247595031$			16384		22	352.00
_SYSSMU375_247595031$			65536		17    1,088.00
_SYSSMU375_247595031$			 3072		12	 36.00
_SYSSMU375_247595031$			24576		12	288.00
_SYSSMU375_247595031$			12288		10	120.00
_SYSSMU375_247595031$			49152		 8	384.00
_SYSSMU375_247595031$			 1152		 7	  7.88
_SYSSMU375_247595031$			 1088		 6	  6.38
_SYSSMU375_247595031$			 3136		 6	 18.38
_SYSSMU375_247595031$			32768		 5	160.00
_SYSSMU375_247595031$			57344		 5	280.00
_SYSSMU375_247595031$			20480		 5	100.00
_SYSSMU375_247595031$			28672		 5	140.00
_SYSSMU375_247595031$			40960		 4	160.00
_SYSSMU375_247595031$			 1216		 4	  4.75
_SYSSMU375_247595031$			 6144		 4	 24.00
_SYSSMU375_247595031$			 3264		 3	  9.56
_SYSSMU375_247595031$			 3200		 3	  9.38
_SYSSMU375_247595031$			 5248		 2	 10.25
_SYSSMU375_247595031$			 2240		 2	  4.38
_SYSSMU375_247595031$			 3776		 2	  7.38
_SYSSMU375_247595031$			 2368		 2	  4.63
_SYSSMU375_247595031$			 2112		 2	  4.13
_SYSSMU375_247595031$			 5120		 2	 10.00
_SYSSMU375_247595031$			 1344		 2	  2.63
_SYSSMU375_247595031$			 3328		 2	  6.50
_SYSSMU375_247595031$			 2752		 2	  5.38
_SYSSMU375_247595031$			 3392		 2	  6.63
_SYSSMU375_247595031$			 2432		 2	  4.75
_SYSSMU375_247595031$			 7616		 1	  7.44
_SYSSMU375_247595031$			36864		 1	 36.00
_SYSSMU375_247595031$			 3584		 1	  3.50
_SYSSMU375_247595031$			 4032		 1	  3.94
_SYSSMU375_247595031$			 2304		 1	  2.25
_SYSSMU375_247595031$			 7872		 1	  7.69
_SYSSMU375_247595031$			 1280		 1	  1.25
_SYSSMU375_247595031$			11776		 1	 11.50
_SYSSMU375_247595031$			 5440		 1	  5.31
_SYSSMU375_247595031$			 7360		 1	  7.19
_SYSSMU375_247595031$			11264		 1	 11.00
_SYSSMU375_247595031$			 3712		 1	  3.63
_SYSSMU375_247595031$			35520		 1	 34.69
_SYSSMU375_247595031$			14336		 1	 14.00
_SYSSMU375_247595031$			59648		 1	 58.25
_SYSSMU375_247595031$			 6400		 1	  6.25
_SYSSMU375_247595031$			 2560		 1	  2.50
_SYSSMU375_247595031$			 3456		 1	  3.38
_SYSSMU375_247595031$			37888		 1	 37.00
_SYSSMU375_247595031$			 9536		 1	  9.31
_SYSSMU375_247595031$			 6336		 1	  6.19
_SYSSMU375_247595031$			15168		 1	 14.81
_SYSSMU375_247595031$			14400		 1	 14.06
_SYSSMU375_247595031$			 3520		 1	  3.44
_SYSSMU375_247595031$			 6528		 1	  6.38
_SYSSMU375_247595031$			19520		 1	 19.06
_SYSSMU375_247595031$			 1600		 1	  1.56
_SYSSMU375_247595031$			 2176		 1	  2.13
_SYSSMU375_247595031$			 6272		 1	  6.13
_SYSSMU375_247595031$			 9344		 1	  9.13
_SYSSMU375_247595031$			 5376		 1	  5.25
					      ------------ -----------
Total						     32765   10,324.25

65 rows selected.

ORA-17629 ORA-20079

Yesterday I received page for ” ORA-17629: Cannot connect to the remote database server”  reported in Primary Database alert log.

Fri Apr 10 06:47:41 2015
Errors in file /oracle/app/diag/rdbms/testdb/testdb/trace/testdb_ora_161991.trc:
ORA-17627:
ORA-17629: Cannot connect to the remote database server
Errors in file /oracle/app/diag/rdbms/testdb/testdb/trace/testdb_ora_161991.trc:
ORA-17629: Cannot connect to the remote database server
ORA-17627:
ORA-17629: Cannot connect to the remote database server
Fri Apr 10 06:48:22 2015

The error trace file shows

*** 2015-04-10 06:47:13.005
*** SESSION ID:(3024.45081) 2015-04-10 06:47:13.005

*** CLIENT ID:() 2015-04-10 06:47:13.005
*** SERVICE NAME:(testdb) 2015-04-10 06:47:13.005
*** MODULE NAME:(oracle@oracle2.testdbds.com (TNS V1-V3)) 2015-04-10 06:47:13.005

*** ACTION NAME:() 2015-04-10 06:47:13.005

krbmsrvgrcf:Expected db_unique_name: testdb
krbmsrvgrcf:mydbuname = :testdb:
krbmsrvgrcf:Instance with connect id testdbSBY requested controlfile.
krbmsrvgrcf:Filespec: /oracle/app/product/11.2_1/dbs/snappcf_testdb.file

*** 2015-04-10 06:47:41.467
OCI error val is 184283056 and errmsg is ”
ORA-17627:
ORA-17629: Cannot connect to the remote database server
******************** WARNING ***************************
The errors during Server autobackup are not fatal, as it
is attempted after sucessful completion of the command.
However, it is recomended to take an RMAN control file
backup as soon as possible because the Autobackup failed
with the following error:
ORA-17629: Cannot connect to the remote database server
ORA-17627:
ORA-17629: Cannot connect to the remote database server
******************** END OF WARNING *******************
ORA-17629: Cannot connect to the remote database server
ORA-17627:
ORA-17629: Cannot connect to the remote database server

The trace file mentions the SID,SERIAL# and the module name. The module shows “oracle@oracle2.testdbds.com” which is the standby database.

I thought to check the sid details from v$active_session_history and the session is coming in from standby database and definitely related to RMAN.

SQL> @ash_sid
Enter value for sid: 3024
Enter value for serial: 45081
old  17: where    session_id = &sid and session_serial# = &serial
new  17: where    session_id = 3024 and session_serial# = 45081

                                                                         SQL                                                                                        Dur   CPU DB Time
Sid,Serial<Blk  STIME    Module               SqlId:ChildNo    SqlExecId Start CPH S PRXJBZ Sta    SEQ# EVENT                            P1:P2:P3   WCLASS           ms    ms      ms
--------------- -------- -------------------- ---------------- --------- ----- ------------ ---- ------ ------------------------ ------------------ ----------- ------- ----- -------
3024,45081      06:47:13 oracle@oracle2.testdbds.com                           NNN N NNNNN  Wait    129 control file single writ 9999999:  1024:  9 System I/O      105     8     113
                06:47:14                                                       NNN N NNNNN  Wait    269 control file single writ 9999999:  5504:  9 System I/O
                06:47:15                                                       NNN N NNNNN  Wait    395 control file single writ 9999999:  9536:  9 System I/O
                06:47:16                                                       NNN N NNNNN  Wait    473 control file single writ 9999999: 12032:  9 System I/O
                06:47:17                                                       NNN N NNNNN  Wait    560 control file sequential  9999999: 14848:  9 System I/O
                06:47:18                                                       NNN N NNNNN  Wait    673 control file single writ 9999999: 18432:  9 System I/O
                06:47:19                                                       NNN N NNNNN  Wait    768 control file sequential  9999999: 21504:  9 System I/O
                06:47:20                                                       NNN N NNNNN  Wait    863 control file single writ 9999999: 24512:  9 System I/O
                06:47:21                                                       NNN N NNNNN  Wait    959 control file single writ 9999999: 27584:  9 System I/O
                06:47:22                                                       NNN N NNNNN  cpu    1044                          9999999: 30336:  9
                06:47:23                                                       NNN N NNNNN  cpu    1132                          9999999: 33152:  9
                06:47:24                                                       NNN N NNNNN  Wait   1220 control file sequential  9999999: 35968:  9 System I/O
                06:47:25                                                       NNN N NNNNN  Wait   1318 control file sequential  9999999: 39104:  9 System I/O
                06:47:26                                                       NNN N NNNNN  Wait   1405 control file single writ 9999999: 41856:  9 System I/O
                06:47:27                                                       NNN N NNNNN  Wait   1489 control file single writ 9999999: 44544:  9 System I/O
                06:47:28                                                       NNN N NNNNN  Wait   1567 control file single writ 9999999: 47040:  9 System I/O
                06:47:29                                                       NNN N NNNNN  Wait   1649 control file single writ 9999999: 49664:  9 System I/O
                06:47:30                                                       NNN N NNNNN  Wait   1733 control file single writ 9999999: 52352:  9 System I/O
                06:47:31                                                       NNN N NNNNN  cpu    1817                          9999999: 55040:  9
                06:47:32                                                       NNN N NNNNN  Wait   1895 control file single writ 9999999: 57536:  9 System I/O
                06:47:33                                                       NNN N NNNNN  Wait   1965 control file single writ 9999999: 59776:  9 System I/O
                06:47:34                                                       NNN N NNNNN  Wait   2041 control file single writ 9999999: 62208:  9 System I/O
                06:47:35                                                       NNN N NNNNN  Wait   2113 control file single writ 9999999: 64512:  9 System I/O
                06:47:36                                                       NNN N NNNNN  Wait   2179 control file single writ 9999999: 66624:  9 System I/O
                06:47:37                                                       NNN N NNNNN  Wait   2238 control file sequential  9999999: 68544:  9 System I/O
                06:47:38                                                       NNN N NNNNN  Wait   2307 control file single writ 9999999: 70720:  9 System I/O
                06:47:39                                                       NNN N NNNNN  Wait   2369 control file single writ 9999999: 72704:  9 System I/O
                06:47:40                                                       NNN N NNNNN  Wait   2444 control file sequential  9999999: 75136:  9 System I/O


28 rows selected.

Looking into MOS I found “RMAN-06820 ORA-17629 During Backup at Standby Site (Doc ID 1616074.1)”. It stated something interesting, so I thought to mention it here, which I did not know

Change in 11.2.0.4 onward

Per ‘unpublished’ Bug 8740124, as of 11.2.0.4, we now include the current standby redo log as part of an RMAN archivelog backup at the standby site. This is achieved by forcing a log switch at the primary site.

I looked at the archive log backup trace file on standby database and there was no error reported. Then I started looking into the backup script and saw, after the backup is completed, resync function is called which connects to rcat database and perform resync catalog.

cle2.testdbs logs]$ more testdb_backup_testdb_resync_201504100617_Fri.log

Recovery Manager: Release 11.2.0.3.0 – Production on Fri Apr 10 06:46:55 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

RMAN>
echo set on

RMAN>

RMAN> connect target *
connected to target database: testdb (DBID=3948365078)

RMAN> connect catalog *
connected to recovery catalog database

RMAN> resync catalog;
starting partial resync of recovery catalog
ORA-20079: full resync from primary database is not done

doing automatic resync from primary
resyncing from database with DB_UNIQUE_NAME testdb
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of resync command on default channel at 04/10/2015 06:47:13
ORA-17629: Cannot connect to the remote database server

ORA-17628: Oracle error 17629 returned by remote Oracle server

RMAN> exit;

Recovery Manager complete.

Tried “show all for db_unique_name testdb;” from standby

RMAN> show all for db_unique_name testdb;

ORA-20079: full resync from primary database is not done

doing automatic resync from primary
resyncing from database with DB_UNIQUE_NAME testdb
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of show command at 04/10/2015 07:53:40
RMAN-03014: implicit resync of recovery catalog failed
RMAN-03009: failure of partial resync command on default channel at 04/10/2015 07:53:40
ORA-17629: Cannot connect to the remote database server
ORA-17628: Oracle error 17629 returned by remote Oracle server

Connected to Primary Database and ran the same command

[oracle@oracle1 anand]$ rman target /

Recovery Manager: Release 11.2.0.3.0 – Production on Fri Apr 10 07:55:43 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

connected to target database: testdb (DBID=3948365078)

RMAN> connect catalog rcat/xxxxxxxx@rcat

connected to recovery catalog database

RMAN> show all for db_unique_name testdb;

starting full resync of recovery catalog

full resync complete
RMAN configuration parameters for database with db_unique_name testdb are:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 10 DAYS;
CONFIGURE BACKUP OPTIMIZATION OFF;
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO ‘%F’;
CONFIGURE DEVICE TYPE DISK PARALLELISM 8 BACKUP TYPE TO BACKUPSET;
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1;
CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT ‘/mnt/oracle-backup/testdb/%U’;
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM ‘AES192’;
CONFIGURE COMPRESSION ALGORITHM ‘BASIC’ AS OF RELEASE ‘DEFAULT’ OPTIMIZE FOR LOAD FALSE;
CONFIGURE DB_UNIQUE_NAME ‘testdbSBY’ CONNECT IDENTIFIER ‘testdbSBY’;
CONFIGURE DB_UNIQUE_NAME ‘testdb’ CONNECT IDENTIFIER ‘testdb’;
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE;
CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/oracle/app/product/11.2_1/dbs/snappcf_testdb.file’;

RMAN>
RMAN> exit

The full resync completed on Primary database and then I performed “resync catalog” on standby as it completed successfully.

RMAN> resync catalog
2> ;

starting partial resync of recovery catalog
partial resync complete

RMAN> exit