Purging SYSAUX

In continuation to my previous post “SYSAUX Growing Rapidly” , here i wanted to present the second method of purging sysaux.

Basically i tried to perform the steps as mentioned in previous post and drop_snapshot_range was taking too long (> 24hrs) and still running on test db.Again WRH$_ACTIVE_SESSION_HISTORY was in top of the list occupying most of the SYSAUX space.

SYS01> EXEC dbms_workload_repository.drop_snapshot_range(25155,26155,3179571572);


From Another session after some time 

SYS01> @asw

USERNAME		      SID    SERIAL# SPID	EVENT			       LAST_CALL_ET  WAIT_TIME SECONDS_IN_WAIT STATE		   SQL_ID	 PLAN_HASH_VALUE
------------------------- ------- ---------- ---------- ------------------------------ ------------ ---------- --------------- ------------------- ------------- ---------------
SYS			     7654	8641 47879	db file sequential read 		 28	    -1		     0 WAITED SHORT TIME   fqq01wmb4hgt8       763705880

SYS01> @orax fqq01wmb4hgt8
old   7:       sql_id  = '&&1'
new   7:       sql_id  = 'fqq01wmb4hgt8'

SQL_ID
-------------
SQL_FULLTEXT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
fqq01wmb4hgt8
delete from WRH$_FILESTATXS tab where (:beg_snap <= tab.snap_id and	    tab.snap_id = b.start_snap_id) and			  (tab.snap_id <= b.end_snap_id))

SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('fqq01wmb4hgt8',NULL,'typical +peeked_binds allstats last'))

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	fqq01wmb4hgt8, child number 0
-------------------------------------
delete from WRH$_FILESTATXS tab where (:beg_snap <= tab.snap_id and
    tab.snap_id =
b.start_snap_id) and			      (tab.snap_id <=
b.end_snap_id))

Plan hash value: 763705880

---------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation			 | Name 	      | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT		 |		      |        |       |   325K(100)|	       |       |       |       |       |	  |
|   1 |  DELETE 			 | WRH$_FILESTATXS    |        |       |	    |	       |       |       |       |       |	  |
|*  2 |   FILTER			 |		      |        |       |	    |	       |       |       |       |       |	  |
|   3 |    MERGE JOIN ANTI		 |		      |    494M|    23G|   325K  (1)| 01:05:08 |       |       |       |       |	  |
|   4 |     PARTITION RANGE ITERATOR	 |		      |    494M|  8957M|   325K  (1)| 01:05:08 |   KEY |   KEY |       |       |	  |
|*  5 |      INDEX RANGE SCAN		 | WRH$_FILESTATXS_PK |    494M|  8957M|   325K  (1)| 01:05:08 |   KEY |   KEY |       |       |	  |
|*  6 |     FILTER			 |		      |        |       |	    |	       |       |       |       |       |	  |
|*  7 |      SORT JOIN			 |		      |      1 |    33 |     2	(50)| 00:00:01 |       |       | 73728 | 73728 |	  |
|*  8 |       TABLE ACCESS BY INDEX ROWID| WRM$_BASELINE      |      1 |    33 |     1	 (0)| 00:00:01 |       |       |       |       |	  |
|*  9 |        INDEX RANGE SCAN 	 | WRM$_BASELINE_PK   |      1 |       |     1	 (0)| 00:00:01 |       |       |       |       |	  |
---------------------------------------------------------------------------------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (NUMBER): 0
   2 - (NUMBER): 95781
   3 - (NUMBER): 3179571572

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(:BEG_SNAP=:BEG_SNAP AND "TAB"."SNAP_ID"="B"."START_SNAP_ID" AND "TAB"."SNAP_ID"=:BEG_SNAP AND "B"."START_SNAP_ID"<=:END_SNAP))
   9 - access("B"."DBID"=:DBID)

SYS01> col name format a10;
col VALUE_STRING format a30;
select name, position, datatype_string, was_captured, value_string,
anydata.accesstimestamp(value_anydata) from v$sql_bind_capture where sql_id = '&sqlid';
SYS01>   2  Enter value for sqlid: fqq01wmb4hgt8
old   2: anydata.accesstimestamp(value_anydata) from v$sql_bind_capture where sql_id = '&sqlid'
new   2: anydata.accesstimestamp(value_anydata) from v$sql_bind_capture where sql_id = 'fqq01wmb4hgt8'

NAME	     POSITION DATATYPE_STRING						   WAS VALUE_STRING
---------- ---------- ------------------------------------------------------------ --- ------------------------------
ANYDATA.ACCESSTIMESTAMP(VALUE_ANYDATA)
---------------------------------------------------------------------------
:BEG_SNAP	    1 NUMBER							   YES 0


:END_SNAP	    2 NUMBER							   YES 95781


:DBID		    3 NUMBER							   YES 3179571572

Interestingly, looking at the bind values shows value_string 0 and 95781 for BEG_SNAP and END_SNAP respectively, though the input range for drop snapshot was between 25155 and 26155.

The database was refreshed by client (so my session was no more) and so i thought not to take drop_snapshot_range approach. After going through few blogs and MOS documents, i thought we had 2 approaches :-

1. “Recreate the AWR tables as in the MOS note 782974.1″ , which would basically drop all WRH$* table and then recreate. The AWR tables contains wealth of important performance data which can be very useful in performance tuning trend analysis and also in comparing performance between two separate periods of time.Hence recreating AWR,I believe should be the last resort. The activity needs to be done in startup restrict mode so requires downtime.

And if you plan to go forward with it, I would recommend to export the AWR snapshot data using @?/rdbms/admin/awrextr.sql and keep the dump. In future it can used by simply importing to some other repository db to get the AWR data.

2. Simply delete the Orphaned rows from WRH$_ACTIVE_SESSION_HISTORY table and perform shrink space cascade.

I went ahead with 2nd approach and performed the below steps (Note: – DB was a single instance db)


SYS01> SELECT COUNT(1) Orphaned_ASH_Rows
FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

ORPHANED_ASH_ROWS
-----------------
        301206452

SYS01> alter table wrh$_active_session_history parallel 4;

Table altered.

SYS01> alter session force parallel dml;

Session altered.

SYS01> DELETE /*+ PARALLEL(a,4) */
FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );


From Another session :-

SYS01> @asw

USERNAME		      SID    SERIAL# SPID	EVENT			       LAST_CALL_ET  WAIT_TIME SECONDS_IN_WAIT STATE
------------------------- ------- ---------- ---------- ------------------------------ ------------ ---------- --------------- -------------------
SQL_ID	      PLAN_HASH_VALUE
------------- ---------------
SYS			      921	1329 107213	db file sequential read 		 60	     0		     0 WAITING
144bpj4qg68m1	   2217072169

SYS			     1227	 889 107215	db file sequential read 		 60	     0		     0 WAITING
144bpj4qg68m1	   2217072169

SYS			     9181	3277 107211	db file sequential read 		 60	     1		     0 WAITED KNOWN TIME
144bpj4qg68m1	   2217072169

SYS			     3370	 455 107727	SQL*Net message to client		  0	    -1		     0 WAITED SHORT TIME
8tfjp8cd2xtd1	    193683216

SYS			     1840	 809 107217	PX Deq Credit: need buffer		 60	     0		     0 WAITING
144bpj4qg68m1	   2217072169

SYS			     8875	3889 107209	db file sequential read 		 60	     1		     0 WAITED KNOWN TIME
144bpj4qg68m1	   2217072169

SYS			     8266	3139 90257	PX Deq: Execute Reply	 60	     0		    60 WAITING
144bpj4qg68m1	   2217072169


SYS01> @parallel_sess

Username     QC/Slave SlaveSet SID					Slave INS STATE    WAIT_EVENT			  QC SID QC INS Req. DOP Actual DOP
------------ -------- -------- ---------------------------------------- --------- -------- ------------------------------ ------ ------ -------- ----------
SYS	     QC 	       8266					1	  WAIT	   PX Deq: Execute Reply	  8266
 - p000      (Slave)  1        8875					1	  WAIT	   db file sequential read	  8266	 1	       4	  4
 - p003      (Slave)  1        1227					1	  WAIT	   db file sequential read	  8266	 1	       4	  4
 - p001      (Slave)  1        9181					1	  WAIT	   db file sequential read	  8266	 1	       4	  4
 - p002      (Slave)  1        921					1	  WAIT	   db file sequential read	  8266	 1	       4	  4
 - p004      (Slave)  2        1840					1	  WAIT	   PX Deq Credit: send blkd	  8266	 1	       4	  4
 - p007      (Slave)  2        2757					1	  WAIT	   PX Deq: Execution Msg	  8266	 1	       4	  4
 - p006      (Slave)  2        2450					1	  WAIT	   PX Deq: Execution Msg	  8266	 1	       4	  4
 - p005      (Slave)  2        2147					1	  WAIT	   PX Deq: Execution Msg	  8266	 1	       4	  4

SYS01> SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('144bpj4qg68m1',NULL,'typical +peeked_binds allstats last'))

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID	144bpj4qg68m1, child number 0
-------------------------------------
DELETE /*+ PARALLEL(a,4) */ FROM wrh$_active_session_history a WHERE
NOT EXISTS   (SELECT 1	 FROM wrm$_snapshot   WHERE snap_id	  =
a.snap_id   AND dbid		= a.dbid   AND instance_number =
a.instance_number   )

Plan hash value: 2217072169

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation			 | Name 			  | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | Pstart| Pstop |	TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT		 |				  |	   |	   |   106K(100)|	   |	   |	   |	    |	   |		|
|   1 |  PX COORDINATOR 		 |				  |	   |	   |		|	   |	   |	   |	    |	   |		|
|   2 |   PX SEND QC (RANDOM)		 | :TQ10001			  |    298M|	11G|   106K  (1)| 00:21:14 |	   |	   |  Q1,01 | P->S | QC (RAND)	|
|   3 |    DELETE			 | WRH$_ACTIVE_SESSION_HISTORY	  |	   |	   |		|	   |	   |	   |  Q1,01 | PCWP |		|
|   4 |     PX RECEIVE			 |				  |    298M|	11G|   106K  (1)| 00:21:14 |	   |	   |  Q1,01 | PCWP |		|
|   5 |      PX SEND HASH (BLOCK ADDRESS)| :TQ10000			  |    298M|	11G|   106K  (1)| 00:21:14 |	   |	   |  Q1,00 | P->P | HASH (BLOCK|
|   6 |       NESTED LOOPS ANTI 	 |				  |    298M|	11G|   106K  (1)| 00:21:14 |	   |	   |  Q1,00 | PCWP |		|
|   7 |        PX PARTITION RANGE ALL	 |				  |    298M|  7404M|   106K  (1)| 00:21:14 |	 1 |	 3 |  Q1,00 | PCWC |		|
|   8 | 	INDEX FULL SCAN 	 | WRH$_ACTIVE_SESSION_HISTORY_PK |    298M|  7404M|   106K  (1)| 00:21:14 |	 1 |	 3 |  Q1,00 | PCWP |		|
|*  9 |        INDEX UNIQUE SCAN	 | WRM$_SNAPSHOT_PK		  |	 1 |	15 |	 0   (0)|	   |	   |	   |  Q1,00 | PCWP |		|
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   9 - access("DBID"="A"."DBID" AND "SNAP_ID"="A"."SNAP_ID" AND "INSTANCE_NUMBER"="A"."INSTANCE_NUMBER")

The deletion of rows (301206452 rows) completed with elapsed time of 12:59:38.44.


301206452 rows deleted.

Elapsed: 12:59:38.44

SYS01> alter table wrh$_active_session_history noparallel ;

Table altered.

SYS01> select degree from dba_tables where table_name=upper('wrh$_active_session_history');

DEGREE
----------
         1

SYS01> SELECT COUNT(1) Orphaned_ASH_Rows FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

ORPHANED_ASH_ROWS
-----------------
	   309984

SYS01> DELETE /*+ PARALLEL(a,4) */ FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );  

309984 rows deleted.

Elapsed: 00:00:19.08
SYS01> commit;

Commit complete.

Elapsed: 00:00:00.07

SYS01> SELECT COUNT(1) Orphaned_ASH_Rows FROM wrh$_active_session_history a
WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

ORPHANED_ASH_ROWS
-----------------
		0

SYS01> alter table wrh$_active_session_history shrink space cascade;

Table altered.

Elapsed: 06:47:21.36

Before this activity SM/AWR was occupying 339Gb which reduced to 209Gb. Also had SM/OPTSTAT occupying 143Gb space and after confirmation from client purged the stats as it was test db.

SYS01> exec DBMS_STATS.PURGE_STATS(DBMS_STATS.PURGE_ALL);

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.54

SYS01> COLUMN "Item" FORMAT A25
 COLUMN "Space Used (GB)" FORMAT 999.99
 COLUMN "Schema" FORMAT A25
 COLUMN "Move Procedure" FORMAT A40

    SELECT  occupant_name "Item",
    space_usage_kbytes/1048576 "Space Used (GB)",
    schema_name "Schema",
    move_procedure "Move Procedure"
    FROM v$sysaux_occupants
    WHERE occupant_name in  ('SM/AWR','SM/OPTSTAT')
    ORDER BY 1
    /23:47:31 EMTSYS01> 

Item                      Space Used (GB) Schema                    Move Procedure
------------------------- --------------- ------------------------- ----------------------------------------
SM/AWR                             209.16 SYS
SM/OPTSTAT                          19.72 SYS

Saving in SYSAUX


TABLESPACE_NAME                  TSP_SIZE USED_SPACE FREE_SPACE   PCT_FREE
------------------------------ ---------- ---------- ---------- ----------
SYSAUX                             505856     496310       9546       1.89  --> Before Size

SYSAUX                             505856     237833     268023      52.98  --> After Size

Hope this helps :)

SYSAUX Growing rapidly!!! What can be done

Recently i have been working on cleaning up SYSAUX tablespace for few of clients, so thought to put down my steps which might be helpful to some of you out there.

Why does SYSAUX tablespace grows much larger than expected?

There could be number of potential reasons:

1. ASH data has grown too large (SM/AWR)
2. High Retention Period
3. Segment Advisor has grown too large
4. Increase in older version of Optimizer Statistics (SM/OPTSTAT)
5. Bugs Bugs Bugs!!!!!

How do we identify the SYSAUX space Usage?

There are basically 2 ways to identify that i know of

1. Running @?/rdbms/admin/awrinfo.sql –> Detailed Info like Schema breakdown, SYSAUX occupants space usage etc.

2.

COLUMN "Item" FORMAT A25
COLUMN "Space Used (GB)" FORMAT 999.99
COLUMN "Schema" FORMAT A25
COLUMN "Move Procedure" FORMAT A40

SELECT  occupant_name "Item",
   space_usage_kbytes/1048576 "Space Used (GB)",
   schema_name "Schema",
   move_procedure "Move Procedure"
   FROM v$sysaux_occupants
   ORDER BY 2
   /

 col owner for a6
 col segment_name for a50
  select * from
 (select owner,segment_name||'~'||partition_name segment_name,bytes/(1024*1024) size_m
 from dba_segments
 where tablespace_name = 'SYSAUX' ORDER BY BLOCKS desc) where rownum < 11;

In my case, below 2 were occupying most of the space :-

1. SM/AWR
2. SM/OPTSTAT

SM/AWR — It refers to Automatic Workload Repository.Data in this section is retained for a certain amount of time (default 8 days). Setting can be checked through DBA_HIST_WR_CONTROL.

SM/OPSTAT — Stores older data of optimizer statistics.Setting can be checked through dbms_stats.get_stats_history_retention. This is not a part of AWR and is not controlled by AWR retention.

When looking at the top segments, i saw WRH$_ACTIVE_SESSION_HISTORY occupying most of the space. Sometimes AWR tables are not purged to settings in sys.wrm$_wr_control.

As per Oracle :-

Oracle decides what rows need to be purged based on the retention policy. There is a special mechanism which is used in the case of the large AWR tables where we store the snapshot data in partitions. One method of purging data from these tables is by removing partitions that only contain rows that have exceeded the retention criteria. During the nightly purge task, we only drop the partition if all the data in the partition has expired. If the partition contains at least one row which, according to the retention policy shouldn’t be removed, then the partition won’t be dropped and as such the table will contain old data.

If partition splits do not occur (for whatever reason), then we can end up with a situation where we have to wait for the latest entries to expire before the partition that they sit in can be removed. This can mean that some of the older entries can be retained significantly past their expiry date. The result of this is that the data is not purged as expected.

Diagnose and Reduce Used Space of SYSAUX.

Once the major occupants and top segments is identified as discussed above, we can start with the steps to rectify it.

Expecting SM/AWR occupying most of the space , i think we can follow 3 methods. In this blog i will be posting one of the method only :)

To check Orphaned ASH rows :-

 SELECT COUNT(1) Orphaned_ASH_Rows FROM wrh$_active_session_history a
  WHERE NOT EXISTS
  (SELECT 1
  FROM wrm$_snapshot
  WHERE snap_id       = a.snap_id
  AND dbid            = a.dbid
  AND instance_number = a.instance_number
  );

Check minimum snap_id in ASH table and then compare to the minimum snap_id in dba_hist_snapshot.

select min(snap_id) from WRH$_ACTIVE_SESSION_HISTORY;
select min(snap_id) from dba_hist_snapshot;

Example :-

select min(snap_id),MAX(snap_id) from dba_hist_snapshot;

MIN(SNAP_ID) MAX(SNAP_ID)
------------ ------------
       17754        18523

select min(snap_id),MAX(snap_id) from WRH$_ACTIVE_SESSION_HISTORY;

MIN(SNAP_ID) MAX(SNAP_ID)
------------ ------------
           1        18523

Above as per the retention period, we should have data from snap_id 17754 till 18523, but the WRH$_ASH table has data from snap_id 1.

From Oracle MOS Doc :-

A potential solution to this issue is to manually split the partitions of the partitioned AWR objects such that there is more chance of the split partition being purged.You will still have to wait for all the rows in the new partitions to reach their retention time but with split partitions there is more chance of this happening. you can manually split the partitions using the following undocumented command:

alter session set “_swrf_test_action” = 72;



select table_name, count(*) from dba_tab_partitions where table_name like 'WRH$%' and table_owner = 'SYS'
group by table_name order by 1;

TABLE_NAME                                           COUNT(*)
-------------------------------------------------- ----------
WRH$_ACTIVE_SESSION_HISTORY                                 2
WRH$_DB_CACHE_ADVICE                                        2
WRH$_DLM_MISC                                               2
WRH$_EVENT_HISTOGRAM                                        2
WRH$_FILESTATXS                                            11
WRH$_INST_CACHE_TRANSFER                                    2
WRH$_INTERCONNECT_PINGS                                     2
........................
25 rows selected.

SQL>  alter session set "_swrf_test_action"=72; 

Session altered.

SQL>  select table_name,partition_name from dba_tab_partitions where table_name = 'WRH$_ACTIVE_SESSION_HISTORY';

TABLE_NAME                                         PARTITION_NAME
------------------------------  -------------------------------------------------------
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_1798927129_0
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_1798927129_18531  --> New Partition created 
WRH$_ACTIVE_SESSION_HISTORY                        WRH$_ACTIVE_SES_MXDB_MXSN

col table_name for a80
select table_name, count(*) from dba_tab_partitions where table_name like 'WRH$%' and table_owner = 'SYS' group by table_name order by 1

TABLE_NAME                                   COUNT(*)
------------------------------------------- ----------
WRH$_ACTIVE_SESSION_HISTORY                     3
WRH$_DB_CACHE_ADVICE                            3
WRH$_DLM_MISC                                   3
WRH$_EVENT_HISTOGRAM                            3
......................

25 rows selected.

In the above example, WRH$_ACTIVE_1798927129_18531 is the new partition created where 1798927129 being the DBID and 18531 is the max(snap_id) when it was partitioned. So, now we can start dropping the snapshots range,which in my case is from 1 to 17753 as 17754 is the min(snap_id) in dba_hist_snapshot.

SQL> EXEC dbms_workload_repository.drop_snapshot_range(1,17753,1798927129);

It can generate good amount of redo and use undo. So keep monitoring undo tablespace and make sure you have sufficient space.

So, what happens when run the above :-

SQL> @sqlid ft7m07stk3dws
old   9:        sql_id = ('&1')
new   9:        sql_id = ('ft7m07stk3dws')

SQL_ID                                  HASH_VALUE SQL_TEXT
--------------------------------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
ft7m07stk3dws                            857847704 delete from WRH$_SYSTEM_EVENT tab where (:beg_snap <= tab.snap_id and         tab.snap_id = b.start_snap_id) and
                                                   (tab.snap_id  @sqlid 854knbb15976z
old   9:        sql_id = ('&1')
new   9:        sql_id = ('854knbb15976z')

SQL_ID                                  HASH_VALUE SQL_TEXT
--------------------------------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
854knbb15976z                           3260325087 delete from WRH$_SQLSTAT tab where (:beg_snap <= tab.snap_id and         tab.snap_id = b.start_snap_id) and
                                                   (tab.snap_id <= b.end_snap_id))

So, internally oracle runs delete command which cause high redo and undo generation :)

Once the procedure is completed successfully, check the min(snap_id) in WRH$_ACTIVE_SESSION_HISTORY and perform shrink space cascade.


elect owner,segment_name,round(sum(bytes/1024/1024),2)MB, tablespace_name from dba_segments where segment_name = upper('WRH$_ACTIVE_SESSION_HISTORY') group by owner,segment_name,tablespace_name

OWNER       SEGMENT_NAME                      MB           TABLESPACE_NAME
-------  ---------------------------------- -----------  -------------------
SYS        WRH$_ACTIVE_SESSION_HISTORY        3538.06          SYSAUX

SQL> alter table WRH$_ACTIVE_SESSION_HISTORY shrink space cascade;

Table altered.


OWNER       SEGMENT_NAME                      MB           TABLESPACE_NAME
-------  ---------------------------------- -----------  -------------------
SYS        WRH$_ACTIVE_SESSION_HISTORY        46.75          SYSAUX

In similar fashion, other WRH$ tables can be shrink ed to free up space in SYSAUX.

Hope this helps!!!

Reference :-

WRH$_ACTIVE_SESSION_HISTORY Does Not Get Purged Based Upon the Retention Policy (Doc ID 387914.1)
Suggestions if Your SYSAUX Tablespace Grows Rapidly or Too Large (Doc ID 1292724.1)

ASM Diskgroup shows USABLE_FILE_MB value in Negative

Today while working on ASM diskgroup i noticed Negative value for USABLE_FILE_MB. I was little surprised as it has been pretty long that i worked on ASM. So i started looking around for blogs and mos docs and found few really nice one around.

A negative value for USABLE_FILE_MB means that you do not have sufficient free space to tolerate a disk failure. If a disk were to fail, the subsequent rebalance would run out of space before full redundancy could be restored to all files.

I would really recommend reading :-

http://prutser.wordpress.com/2013/01/03/demystifying-asm-required_mirror_free_mb-and-usable_file_mb/

The box i was working on was exadata server quarter rack, so it had 3 storage server. Each storage server on an exadata server has 12 cell disk. Grid disk are created within Cell Disks.In a simple configuration, One Grid Disk can be created per Cell Disk and Grid disks are what the storage cell presents to db servers. So basically

GRID DISK = ASM DISK.

When creating disk groups, ASM automatically puts all grid disks from the same storage cell into the same failgroup. The failgroup is then named after the storage cell.


[oracle@test~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  40697856  8464936         13565952        -2550508              0             N  DATA1/
MOUNTED  NORMAL  N         512   4096  4194304    415296   367220           138432          114394              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304  10176480  9018276          3392160         2813058              0             N  RECO1/

compute sum Label total_FG of total_mb on FAILGROUP
compute sum Label total of total_mb on report
col diskgroup for a20
col failgroup for a30
col name for a30
select g.name diskgroup, d.failgroup,  d.name, d.total_mb from v$asm_disk d, v$asm_diskgroup g where g.name = 'DATA1' and d.GROUP_NUMBER = g.GROUP_NUMBER order by g.name, d.failgroup;

DISKGROUP            FAILGROUP                      NAME                                   TOTAL_MB
-------------------- ------------------------------ ------------------------------ ----------------
DATA1               CELL01                         DATA1_CD_00_CELL01             2260992
DATA1                                              DATA1_CD_05_CELL01             2260992
DATA1                                              DATA1_CD_03_CELL01             2260992
DATA1                                              DATA1_CD_04_CELL01             2260992
DATA1                                              DATA1_CD_01_CELL01             2260992
DATA1                                              DATA1_CD_02_CELL01             2260992
                     ******************************                                ----------------
                     total_FG                                                              13565952
DATA1               CELL02                         DATA1_CD_01_CELL02             2260992
DATA1                                              DATA1_CD_05_CELL02             2260992
DATA1                                              DATA1_CD_02_CELL02             2260992
DATA1                                              DATA1_CD_03_CELL02             2260992
DATA1                                              DATA1_CD_00_CELL02             2260992
DATA1                                              DATA1_CD_04_CELL02             2260992
                     ******************************                                ----------------
                     total_FG                                                              13565952
DATA1               CELL03                         DATA1_CD_02_CELL03             2260992
DATA1                                              DATA1_CD_05_CELL03             2260992
DATA1                                              DATA1_CD_01_CELL03             2260992
DATA1                                              DATA1_CD_04_CELL03             2260992
DATA1                                              DATA1_CD_03_CELL03             2260992
DATA1                                              DATA1_CD_00_CELL03             2260992
                     ******************************                                ----------------
                     total_FG                                                              13565952
                                                                                   ----------------
total                                                                                      40697856

For DATA1 diskgroup the USABLE_FILE_MB shows value in Negative (-2550508 MB).

SQL> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free,  usable_file_mb use_mb from v$asm_diskgroup where name = 'DATA1';

NAME                      STATE       TYPE     TOTAL_MB    FREE_MB   REQ_FREE     USE_MB
------------------------- ----------- ------ ---------- ---------- ---------- ----------
DATA1                      MOUNTED     NORMAL   40697856    8464936   13565952   -2550508
                                                                                                              ----------
total                                                                                                           40697856

TOTAL_MB:- Refers to total capacity of the diskgroup
FREE_MB :- Refers to raw free space available in diskgroup in MB.

FREE_MB = (TOTAL_MB – (HOT_USED_MB + COLD_USED_MB))

REQUIRED_MIRROR_FREE_MB :- Indicates how much free space is required in an ASM disk group to restore redundancy after the failure of an ASM disk or ASM failure group.In exadata it is the disk capacity of one failure group.

USABLE_FILE_MB :- Indicates how much space is available in an ASM disk group considering the redundancy level of the disk group.

Its calculated as :-

USABLE_FILE_MB=(FREE_MB – REQUIRED_MIRROR_FREE_MB ) / 2 –> For Normal Redundancy
USABLE_FILE_MB=(FREE_MB – REQUIRED_MIRROR_FREE_MB ) / 3 –> For High Redundancy

Also to note here is ASM diskgroup do not set aside the space based on reuqired_mirror_free_mb. Its merely calculated and used to derive usable_file_mb.

While reading Mos Doc Id 1551288.1 i came across some interesting terms and script which i wanted to share to everyone (atleast some of you who might not have been familiar)

Failure coverage refers to the amount of space in a disk group that will be used to re-mirror data in the event of some storage failure.

1. Disk Failure Coverage :- Refers to having enough free space to allow data to be re-mirrored (rebalanced) after a single disk failure in Normal redundancy.

2. Cell Failure Coverage :- Refers to having enough free space to allow data to be re-mirrored after loss of One entire Cell Disk.

Reserving space in the disk group means that you monitor the disk group to ensure that FREE_MB never goes below minimum amount needed for disk or cell failure coverage.

I ran the script provided in Mos Docid 1551288 and below was the output :-

Description of Derived Values:
One Cell Required Mirror Free MB : Required Mirror Free MB to permit successful rebalance after losing largest CELL regardless of redundancy type
Disk Required Mirror Free MB     : Space needed to rebalance after loss of single or double disk failure (for normal or high redundancy)
Disk Usable File MB              : Usable space available after reserving space for disk failure and accounting for mirroring
Cell Usable File MB              : Usable space available after reserving space for SINGLE cell failure and accounting for mirroring
.  .  .
ASM Version: 11.2.0.4
.  .  .
----------------------------------------------------------------------------------------------------------------------------------------------------
|          |         |     |          |            |            |            |Cell Req'd  |Disk Req'd  |            |            |    |    |       |
|          |DG       |Num  |Disk Size |DG Total    |DG Used     |DG Free     |Mirror Free |Mirror Free |Disk Usable |Cell Usable |    |    |PCT    |
|DG Name   |Type     |Disks|MB        |MB          |MB          |MB          |MB          |MB          |File MB     |File MB     |DFC |CFC |Util   |
----------------------------------------------------------------------------------------------------------------------------------------------------
|DATA1    |NORMAL   |   18| 2,260,992|  40,697,856|  32,233,944|   8,463,912|  14,922,547|   2,761,008|   2,851,452|  -3,229,318|PASS|FAIL|  79.2%|
|DBFS_DG  |NORMAL   |   12|    34,608|     415,296|      48,076|     367,220|     152,275|      59,425|     153,898|     107,472|PASS|PASS|  11.6%|
|RECO1    |NORMAL   |   18|   565,360|  10,176,480|   1,171,220|   9,005,260|   3,731,376|     703,460|   4,150,900|   2,636,942|PASS|PASS|  11.5%|
----------------------------------------------------------------------------------------------------------------------------------------------------
Cell Failure Coverage Freespace Failures Detected. Warning Message Follows.
Enough Free Space to Rebalance after loss of ONE cell: WARNING (However, cell failure is very rare)
.  .  .
Script completed.

So here i am good with one disk failure but not with One celldisk failure. Basically i need to either add disk to the disk group or free up some space.

This post is more of a note for myself to refer back. Hope it useful for some of you too :)

Plan change using load_plans_from_cursor_cache

This post is more of a note for myself and might be helpful to few other.

Assuming db is 11gR2 and baselines/spm is used.

When a new query is introduced in db, it might be that it runs with the good plan, but sometimes it picks up wrong plan. It could be that Index Range Scan could have been done, but the sql is going for Full Table Scan (FTS) causing performance issue.

Few of the times we do have issues were the sql is newly introduced to the system and starts taking execution time longer than expected or increase the CPU or sessions starts piling up with some particular session wait event.

Now supposing, a new sql is introduced and you see lots of sessions running it doing ‘direct path reads’ and the sql is taking more than expected time as mostly it would be doing FTS, what can be done in such case?

If we have the same sqlid running fine as expected in some other prod db we can simply migrate the baseline using the below steps (just overview) –

1. dbms_spm.create_stgtab_baseline
2. dbms_spm.pack_stgtab_baseline
3. Export the staging table from source db
4. Import the dump in target db
5. dbms_spm.unpack_stgtab_baseline

In case you don’t have it running anywhere and you know an Index scan would be better you can fake the plan by generating an execution plan using index and transfer its execution plan to bad performing sql using dbms_spm.load_plans_from_cursor_cache().

Original sqlid 9x7g8gyjzr95d is doing FTS and its sql_handle is SQL_677er77f1f7bf077 and we have index which can be used for better performance.


sql> !more bind.sql
accept SQL_ID prompt 'Enter SQL_ID:- '
col name format a10;
col VALUE_STRING format a30;
select sql_id, NAME, position,DATATYPE_STRING, VALUE_STRING from v$sql_bind_capture where sql_id = '&SQL_ID';

sql>@bind
Enter SQL_ID:- 9x7g8gyjzr95d
old   1: select sql_id, NAME, position,DATATYPE_STRING, VALUE_STRING from v$sql_bind_capture where sql_id = '&SQL_ID'
new   1: select sql_id, NAME, position,DATATYPE_STRING, VALUE_STRING from v$sql_bind_capture where sql_id = '9x7g8gyjzr95d'

SQL_ID	      NAME	   POSITION DATATYPE_STRING VALUE_STRING
------------- ---------- ---------- --------------- ------------------------------
9x7g8gykrz95d :1		  1 VARCHAR2(128)   test.xxxxxx

sql>variable lu VARCHAR2(128)
sql>exec :lu:='test.xxxxxx'

PL/SQL procedure successfully completed.

sql>select /*+ index(ia i_demand_src) */ creation_date - time_util.local_time_offset_from_utc() creation_date, ia.* from invent_audits ia where demand_src = :lu

...........
...........

2 rows selected

sql>select sql_id,plan_hash_value from v$sql where sql_text like 'select /*+ index(ia i_demand_src) */ %';

SQL_ID	      PLAN_HASH_VALUE
------------- ---------------
vbw0xrhd0nv00	    1852372768

Now we want the plan_hash_value 1852372768 to be used by our original query.So,

SET SERVEROUTPUT ON
DECLARE
  l_plans_loaded  PLS_INTEGER;
BEGIN
  l_plans_loaded := DBMS_SPM.load_plans_from_cursor_cache(
    sql_id => '&sql_id',
    plan_hash_value => '&plan_hash_value',
    sql_handle => '&handle');
  DBMS_OUTPUT.put_line('Plans Loaded: ' || l_plans_loaded);
END;
/
Enter value for sql_id: vbw0xrhd0nv00  (The new sql_id, with index)
Enter value for plan_hash_value: 1852372768 (The new plan_hash_value for the new sql_id using index)
Enter value for handle: SQL_677er77f1f7bf077 (The sql_handle of the original/problemetic sql with sql_id 9x7g8gyjzr95d)

SQL_Id and plan_hash_value is more than enough to make it possible to create an SQL Baseline for one query using the execution plan for the other query.

The new baseline will be Enabled and Accepted ‘YES’ and for older baseline ACCEPTED will be changed to NO.

Flush the sql_id once after the new baseline is created ;)

Hope this helps!!!!

Few Scripts for Identify Performance Issues using DBA_HIST view

It has been pretty long that i had blogged.Past year was little busy on both personal and professional front. But this year i am planning to be more active in sharing and learning and with not only Oracle DBMS but could be few others too.

Now, coming back to this blog, i wanted to share certain sql scripts which i have been using mostly for doing the RCA for issues. Sometimes issues are reported lately and DBA are asked for RCA.In past few month i worked on multiple commit latency and high CPU spikes issue which lead to writing these scripts to identify the events, sqlids, module etc in order to catch the culprit.

If the issue is currently being worked on then V$ACTIVE_SESSION_HISTORY is the best place to start with along with v$lock and few other views. In case, spike was seen few hours/a day back, the data to diagnose can be retrieved from DBA_HIST_ACTIVE_SESS_HISTORY.

As we know the time when issue occurred we can use the below sql to identify the top most EVENTS which happened during that time frame. I am displaying the events which have count(*) > 50.

break on TIME skip 1 

accept start_time prompt 'Enter start time [ DD-MM-YY HH24:MI ]: '
accept end_time prompt 'Enter end time [ DD-MM-YY HH24:MI ]: '


select to_char(sample_time,'DD-MM HH24:MI') time,event,count(*) from dba_hist_active_sess_history where sample_time between to_date ('&start_time','DD-MM-YY HH24:MI') and to_date ('&end_time','DD-MM-YY HH24:MI') group by to_char(sample_time,'DD-MM HH24:MI'),event having count(*) > 50 order by 1;

Output –


TIME	    EVENT						 COUNT(1)
----------- -------------------------------------------------- ----------
29-11 09:29 db file sequential read		                    143

29-11 09:31 db file sequential read		                    183
	    library cache: mutex X		                    120
	    log file sync			                    656

29-11 09:32 db file sequential read		                    153
	    library cache: mutex X		                    129
	    log file sync			                    285

To check the sqlid which were most active during that time frame we can use the below sql –

break on TIME skip 1 

accept start_time prompt 'Enter start time [ DD-MM-YY HH24:MI ]: '
accept end_time prompt 'Enter end time [ DD-MM-YY HH24:MI ]: '
accept N prompt 'Enter rownumber to display [N]: '


col module for a45
col opname for a15
select time,sql_id,plan_hash_value,opname,module,count from 
        (select to_char(sample_time,'DD-MM-YY HH24:MI') time,sql_id,sql_plan_hash_value plan_hash_value,module,SQL_OPNAME opname,count(1) count,
          ROW_NUMBER ()  OVER (PARTITION BY to_char(sample_time,'DD-MM-YY HH24:MI') order by count(1) DESC) as rownumber 
          from  dba_hist_active_sess_history  where sample_time between to_date ('&start_time','DD-MM-YY HH24:MI') and to_date ('&end_time','DD-MM-YY HH24:MI') and sql_id is not nul
l group by to_char(sample_time,'DD-MM-YY HH24:MI'),sql_id,sql_plan_hash_value,module,SQL_OPNAME)
where rownumber <=&N;


undef start_time
undef end_time
undef N

Output –


TIME	       SQL_ID	     PLAN_HASH_VALUE OPNAME	     MODULE						COUNT
-------------- ------------- --------------- --------------- --------------------------------------------- ----------
10-01-14 07:35 aurvkajbfxr0z	  2569592323 SELECT	     ABC				         3
	       ccdks1ftnc7x5	   641461876 SELECT	     ABC				         3
	       27yu9pxlppscn	  2950873079 SELECT	     XYZ			                 2
	       38u8w2hohzhha		   0 INSERT	     DEF			                 2
	       89bqc3gp18zya	   438142338 INSERT	     XYZ				         2

10-01-14 07:36 0y95krfumnbr5	   488120578 DELETE	     XYZ				         5
	       1gqi6rs1nj113	  3888582233 INSERT	     DEF				         3
	       2sktfcq1vmd9r		   0 INSERT	     ABC				         2
	       3nbmuejym2ppk		   0 PL/SQL EXECUTE  ABC					    2
	       4vdjfsmzqbyhq	   324236703 SELECT	     ABC				         2

To find top executions between certain time frame –

col BEGIN_INTERVAL_TIME format a30
col module format a50;
col plan_hash_value for 99999999999999

break on TIME skip 1 

accept start_time prompt 'Enter start time [ DD-MM-YY HH24:MI ]: '
accept end_time prompt 'Enter end time [ DD-MM-YY HH24:MI ]: '
accept N prompt 'Enter rownumber to display [N]: '

select to_char(BEGIN_INTERVAL_TIME,'DD-MM-YY HH24:MI') time,sql_id,PLAN_HASH_VALUE,module,"Executions","BG/exec","DR/exec","ET/exec","CT/exec" from (
select  hs.BEGIN_INTERVAL_TIME,
        hss.sql_id,
        hss.plan_hash_value,
        hss.MODULE,
        sum(hss.EXECUTIONS_DELTA) "Executions",
        round(sum(hss.BUFFER_GETS_DELTA)/decode(nvl(sum(hss.EXECUTIONS_DELTA),0),0,1,sum(hss.EXECUTIONS_DELTA))) "BG/exec",
        round(sum(hss.DISK_READS_DELTA)/decode(nvl(sum(hss.EXECUTIONS_DELTA),0),0,1,nvl(sum(hss.EXECUTIONS_DELTA),0))) "DR/exec",
        round(sum(hss.ELAPSED_TIME_DELTA)/decode(nvl(sum(hss.EXECUTIONS_DELTA),0),0,1,nvl(sum(hss.EXECUTIONS_DELTA),0))/1000000) "ET/exec",
        round(sum(hss.CPU_TIME_DELTA)/decode(nvl(sum(hss.EXECUTIONS_DELTA),0),0,1,nvl(sum(hss.EXECUTIONS_DELTA),0))/1000000) "CT/exec",
        ROW_NUMBER ()  OVER (PARTITION BY hs.BEGIN_INTERVAL_TIME ORDER BY sum(hss.EXECUTIONS_DELTA) DESC) as rownumber   
from    dba_hist_sqlstat hss, dba_hist_snapshot hs
where   hss.snap_id=hs.snap_id
and     hs.BEGIN_INTERVAL_TIME between to_date ('&start_time','DD-MM-YY HH24:MI') and to_date ('&end_time','DD-MM-YY HH24:MI')
group by hs.BEGIN_INTERVAL_TIME, hss.sql_id, hss.plan_hash_value, hss.MODULE order by 1) where rownumber <= &N;

undef start_time
undef end_time
undef N

ASH and DBA_HIST are awesome views available to diagnose an issue.
I hope the scripts will be useful for you too!!!

Identifying Bloated Index in Oracle

Indexes have always been a topic of interest for DBA/Developers. When it comes to index rebuild there have been many opinions floating across internet on when to rebuild these indexes. Many do say when the BLEVEL is > 3 one should rebuild the indexes. I don’t believe in that and i think i have never seen BLEVEL > 3 for index till now.

Over a period of time, the index can get fragmented because of the DML’s occurring on the table. The free space within the block of index can get used depending on the incoming column value, maintaining the index structure (sorted).

Now, suppose you have a table with one of the column as CREATION_DATE sysdate(DEFAULT), and every night data is loaded into it and suppose as per the application logic previous date data is deleted. Now an index having creation_date column will slowly and gradually increase in size and as the left side of the index will always be empty and index keeps growing toward right side.

With this kind of indexes, the performance many degrade for sqls , the plans many flip etc. It is sometimes good to rebuild indexes.But how to identify which indexes to be rebuild, is the question.

I was working on finding which indexes are bloated and below is the sql based on few logic

WITH spv AS (select di.table_name,
spv.object_name,
di.leaf_blocks,
di.index_type,
di.num_rows,
decode(di.uniqueness,'UNIQUE',0,1)uniq_ind,
di.last_analyzed,sum(bytes),
sum(io_cost) 
from v$sql_plan spv,dba_indexes di
     where spv.object_owner = 'ANAND' and         
           spv.object_type LIKE '%INDEX%' and 
           spv.object_name=di.index_name and 
           spv.object_owner=di.owner and 
           di.leaf_blocks > 1000
     group by di.table_name,
              spv.object_name,
              di.leaf_blocks,
              di.index_type,
              di.num_rows,
              di.uniqueness,
              di.last_analyzed order by 3)
select spv.table_name,
       spv.OBJECT_NAME index_name,
       spv.leaf_blocks leaf_blocks,
       round (100 / 90 *(spv.num_rows * (tab.rowid_length + spv.uniq_ind + 4) + sum((tc.avg_col_len)*(tab.num_rows)))/(8192 - 192))target_blocks, 
       round(((((spv.LEAF_BLOCKS) - (100 / 90 *(spv.num_rows * (tab.rowid_length + spv.uniq_ind + 4) +  sum((tc.avg_col_len)*(tab.num_rows)))/(8192 - 192)))/spv.LEAF_BLOCKS)*100)) DIFF_PCT
from spv, 
     (select table_name,num_rows,decode(partitioned,'YES',10,6) rowid_length  from dba_tables where owner='ANAND') tab,
     dba_tab_cols tc,
     dba_ind_columns ic 
where
     spv.table_name=tab.table_name and
     tc.column_name = ic.column_name  and
     tab.TABLE_NAME=tc.table_name and 
     ic.TABLE_NAME=tab.table_name and 
     spv.object_name=ic.INDEX_NAME
having round(((((spv.LEAF_BLOCKS) - (100 / 90 *(spv.num_rows * (tab.rowid_length + spv.uniq_ind + 4) + sum((tc.avg_col_len)*(tab.num_rows)))/(8192 - 192)))/spv.LEAF_BLOCKS)*100)) > 70
group by spv.table_name,
         spv.object_name,
         spv.leaf_blocks,
         spv.num_rows,spv.uniq_ind, 
         tab.rowid_length 
order by 5

Output from a test db –

TABLE_NAME                     INDEX_NAME                     LEAF_BLOCKS TARGET_BLOCKS   DIFF_PCT
------------------------------ ------------------------------ ----------- ------------- ----------
COMPLETED_CUST                 I_CCS_COND                           25340          7236         71
CUST_ONE_ITEMS                 I_CSI_REQ_ID                          7999          2351         71
DEMANDS                        I_DEMAND                             26920          7478         72
.....................................
.....................................
PROP_VALUES                    PK_SID_VALUE                         11847           831         93
SLA_METRS                      PK_SLA_METCS                         12840           129         99

–> leaf_blocks is the actual leaf blocks of the index from dba_indexes.
–> Target_Blocks represents the estimated no. of leaf blocks based on the current stats on the table. So, stats needs to be latest on the table.
–> The above sql displays all the index name where the % difference between leaf_blocks and estimated target blocks is > 70
–> The index names comes from v$sql_plan as those are the indexes being used by the optimizer.
–> The indexes having leaf_blocks > 1000 are selected

Now, lets rebuild the index and see do we get any closer target_blocks

12:53:24 DBA@test:1> select leaf_blocks from dba_indexes where index_name='PK_SLA_METCS';

LEAF_BLOCKS
-----------
        12840

12:52:59 DBA@test:1> alter index xxx.PK_SLA_METCS rebuild online parallel 4;

Index altered.

Elapsed: 00:00:00.87
12:53:24 DBA@test:1> select leaf_blocks from dba_indexes where index_name='PK_SLA_METCS';

LEAF_BLOCKS
-----------
        135  <-- Estimated TARGET_BLOCK was 129 

Elapsed: 00:00:00.43

13:15:46 DBA@test:1> select leaf_blocks from dba_indexes where index_name='PK_SID_VALUE';

LEAF_BLOCKS
-----------
      11847

Elapsed: 00:00:00.45
13:15:55 DBA@test:1> alter index xxx.PK_SID_VALUE rebuild online parallel 4;


Index altered.

Elapsed: 00:00:25.25
13:16:42 DBA@test:1> select leaf_blocks from dba_indexes where index_name='PK_SID_VALUE';

LEAF_BLOCKS
-----------
        840 <-- Estimated TARGET_BLOCK was 831

 

I would say this is just the version 1.0 :)

REFERENCE –> Script to investigate a b-tree index structure (Doc ID 989186.1)