CBC Latch Contention – My Experience

Recently, i got an opportunity to look into CBC latch contention on 9.2.0.8 database version at one of the client’s site.As the number of users connecting to the database increased its performance degraded with poor application performance.

The first thing was to check the statspack report.Lets see what the “Top 5 Wait Events” had to say –

Top 5 Wait events

Event                                       Waits        Time (s)  Ela Time
----------------------------------------- ------------ ----------- --------
latch free                                  1,509,709     280,494   77.44
CPU time                                    36,721        10.14
db file scattered read                      12,965,784    21,092    5.82
db file sequential read                     8,060,280     11,007    3.04
buffer busy waits                           171,452       7,484     2.07

Few more informations :-

Wait Events for DB (Top 3)
                                                                   Avg
                                                   Total           wait     Waits
         Event                Waits     Timeouts   WaitTime(s)     (ms)     /txn
-------------------------- ------------ ---------- -------------  -------- --------
latch free                  1,509,709    755,962     280,494         186     31.9
db file scattered read      12,965,784       0       21,092           2       274
db file sequential read     8,060,280        0       11,007           1      170.3

Latch Sleep breakdown for DB: Instance:  Snaps: 812 -813
-> ordered by misses desc
Get
Spin&Latch Name             Requests       Misses     Sleeps
----------------------- ---------------- -----------  ------
multiblock read objects    39,774,699     2,237,038   192,388
cache buffers chains       914,095,905    2,147,734   745,743
library cache              95,843,429     1,742,253   346,113

Checking for the top most sql query with “SQL ordered by Gets ” –

                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
     66,719,592          575      116,034.1   14.9  7142.96 #########  674460927

Checking for the top most sql query with “SQL ordered by Reads” –

                                                     CPU      Elapsd
 Physical Reads  Executions  Reads per Exec %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
     37,702,363          575       65,569.3   57.3  7142.96 #########  674460927

The sql with hash value 674460927 is at the top for ordered by gets and reads.Looking at “Top 5 Physical Reads per Segment for DB“, shows the table used in the sql (with hash value 674460927) at the top most with 60% of total physical reads during an hour snapshot.

Interestingly, when checking the explain plan (using explain plan for)for that particular sql query showed :-

--------------------------------------------------------------------------------------------
| Id  | Operation                     |  Name                      | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                            |     3 |    99 |    18 |
|*  1 |  FILTER                       |                            |       |       |       |
|   2 |   NESTED LOOPS                |                            |     3 |    99 |    18 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| TABLE_A                    |     3 |    69 |    12 |
|*  4 |     INDEX RANGE SCAN          | IN_TAB_A                   |    10 |       |     3 |
|*  5 |    TABLE ACCESS BY INDEX ROWID| TABLE_B                    |     1 |    10 |     2 |
|*  6 |     INDEX UNIQUE SCAN         | IN_TAB_B                   |     1 |       |     1 |
----------------------------------------------------------------------------------------------

But while checking the plan from v$sql_plan showed Full Table Scan on TABLE_A.The same was confirmed from the TKPROF.

select id,parent_id,LPAD(' ', level-1) || operation || ' ' || options operation, object_name, cost  from v$sql_plan where hash_value=674460927;

       ID  PARENT_ID OPERATION                                OBJECT_NAME                                    COST
--------- ---------- ---------------------------------------- ---------------------------------------- ----------
        0            SELECT STATEMENT                                                                       10753
        1          0 NESTED LOOPS                                                                           10753
        2          1 TABLE ACCESS FULL                        TABLE_A                                       10747
        3          1 TABLE ACCESS BY INDEX ROWID              TABLE_B                                         2
        4          3 INDEX UNIQUE SCAN                        IN_TAB_B                                        1

Won’t forget to mention and thanks Tanel Poder ( http://blog.tanelpoder.com/ )for the awesome “LatchProf and LatchProfX” scripts (http://www.tanelpoder.com/files/scripts/latchprof.sql).

The TKPROF output for the query with hash value 674460927 showed :-

call    count  cpu elapsed   disk    query  current rows
------- ------ --- --------  ------  ------ ------- ----
Parse     4     0     0       0        0      0      0
Execute   4     0     0       0        4      0      0
Fetch     4     0     62.26  471166  472065   0      1
------- ------ ----- ------- ------ ------- ------- ----
total    12     0     62.27  471166  472069   0      1

To fetch just 1 row from the database, the query performed “472069” logical I/O and “471166” physical reads which definitely was too huge.

About CBC LATCH

This latch is needed when user processes try to scan the SGA for database cache buffers. If the same block is needed by other multiple sessions, then they need to obtain the CACHE BUFFERS CHAINS latch to walk the same chain to access the same block. Access to a block is normally a fast operation but if concurrent users access a block fast enough, repeatedly then simple access to the block can become an bottleneck. The most common occurrence of cbc (cache buffer chains) latch contention happens when multiple users are running nest loop joins on a table and accessing the table driven into via an index.
Reducing contention for the cache buffer chains latch will usually require reducing logical I/O rates by tuning and minimizing the I/O requirements of the SQL involved.

The application code was wrapped and hence no changes were possible to be made.So thought of playing with the index stats.The clustering factor of the index “IN_TAB_A” was almost equivalent to the number of rows in the tables and hence the index was not getting used.Also, point to note is the “COST” in the explain plans shown above.The explain plan of the query showed index being used and Cost as 18,while the actual plan (from v$sql_plan as well tkprof) showed cost of “10753”.So was it a case of explain plan lying???

Tried to play with the index stats but was of not much use.So, based on the query created a composite index and without gathering the stats on the newly created index, took the 10046 trace of the session running the application.The tkprof output for the same sql query with hash value 674460927 showed :-

call      count   cpu     elapsed     disk    query   current rows
------- ------- -------- ---------- -------- -------- ------- -------
Parse       3      0.00     0.00        0        0        0      0
Execute     3      0.00     0.00        0        4        0      0
Fetch       3      0.00     2.01       1404     4212      0      1
------- ------- -------- ---------- -------- -------- ------- -------
total       9      0.00     2.02       1404     4216      0      1

The Row Source Operation in the tkprof also confirmed the new composite index being used with no more db file scattered reads.Also, the sql query was no more in the statspack report.

The reason for “without gathering the stats” being in bold above is, the moment stats was gathered on that particular new composite index,it was no more being used, and the query showed full table scan with excessive logical I/O.

The latch free wait event dropped for top in “Top 5 Wait events” to fifth position with 8-15% of Total Elapsed Time .Also, one of the major thing which was noticed was runqueue length on the server.The runqueue lenght was almost ~100-150 during the peak hours and hence more powerful server was recommended to the client.Few days later new server was bought by client and now there is no more latch free wait event in the statspack report. 🙂

In the end,i would like to thanks few Oracle Gurus who were helpfull in some or the other ways:-

1. Randolf Geist
2. Tanel Poder
3. ViveK Sharma
4. Kyle Hailey
5. Riyaj Shamsudeen

References :-

http://sites.google.com/site/embtdbo/wait-event-documentation/oracle-latch-cache-buffers-chains

http://viveklsharma.wordpress.com/2010/05/13/cache-buffers-chains-latch-contention/

http://www.pythian.com/news/1135/tuning-latch-contention-cache-buffers-chain-latches/

http://tech.e2sn.com/oracle/troubleshooting/latch-contention-troubleshooting

http://orainternals.wordpress.com/2008/07/30/tuning-latch-contention-cache-buffers-chain-latches/

2 thoughts on “CBC Latch Contention – My Experience

  1. Have you finally solved this problem for your client.

    If you want to change the plan of this SQL , your can manually change the clustering factor of the index ,or use stored outline/SQL profiles/SQL Plan Baseline etc..

    1. Hi,

      Thanks for visiting the blog.Yes, issue is resolved. Tried playing with the clustering factor but was not able to get the accurate value which would have made the query use index.So, then created a composite index and checked the tkprof for module and it worked.Also, changing the server helped in reducing in run queue length.

      Regards,
      Anand

Leave a comment