latch: cache buffers chains and rollback

I had an interesting encounter with latch: cbc contention early this week. During my oncall I received page for Load of 206.81 exceeded threshold of 150. After I logged into server , the server load average was continously increasing and all the top PIDs were of oracle processes. After logging into database, I saw multiple sessions waiting on latch: cache buffer chains wait event

load average: 258.52, 244.27, 226.15

select username,sql_id,event,count(*) from v$session where event like 'latch%' group by username,sql_id,event order by 4;
 
USERNAME                  SQL_ID        EVENT                            COUNT(*)
------------------------- ------------- ------------------------------ ----------
USER_EAS4                5j9xrj07xqysz latch free                              1
USER_EAS7                4f6dfhuxbz18y latch free                              1
USER_EAS7                a1wwm6tpukyh8 latch free                              1
USER_EAS7                c1sa78t1rvxkq latch free                              1
USER_EAS7                gc4xfk544n98v latch free                              2
USER_EAS7                bmhmdqvt9c2qz latch free                              2
USER_EAS7                4ycfn45y3vj5p latch free                              8
USER_EAS7                4ycfn45y3vj5p latch: cache buffers chains            23
USER_EAS7                9cg18w6mfqphn latch: cache buffers chains            51
USER_EAS7                9cg18w6mfqphn latch free                            144

sql monitor report of one session for sqlid 9cg18w6mfqphn

Global Stats
=======================================================
| Elapsed |   Cpu   | Concurrency |  Other   | Buffer |
| Time(s) | Time(s) |  Waits(s)   | Waits(s) |  Gets  |
=======================================================
|    2243 |    2132 |        0.00 |      111 |   260M |
=======================================================
 
SQL Plan Monitoring Details (Plan Hash Value=2956480150)
=============================================================================================================================================================
| Id   |               Operation               |         Name         |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail |
|      |                                       |                      | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |
=============================================================================================================================================================
|    0 | SELECT STATEMENT                      |                      |         |      |           |        |     1 |          |          |                 |
|    1 |   FILTER                              |                      |         |      |           |        |     1 |          |          |                 |
|    2 |    PARTITION RANGE SINGLE             |                      |      18 |  133 |           |        |     1 |          |          |                 |
|    3 |     TABLE ACCESS BY LOCAL INDEX ROWID | MESSAGE_DEMO_CTC     |      18 |  133 |           |        |     1 |          |          |                 |
| -> 4 |      INDEX RANGE SCAN                 | MESSAGE_DEMO_CTC_ID6 |      18 |  133 |      2246 |     +1 |     1 |        0 |   100.00 | Cpu (2194)      |
=============================================================================================================================================================

AWR performance trend for the sqlid for past 5 days:-

@awr_sqlid_perf_trend 9cg18w6mfqphn 5 .25
 
TIME                 EXECUTIONS   ET_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC  DISK_READS_1EXEC
------------------- ----------- ------------ ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- -----------------
23.09.2015 00:15:00          12      117.923          117.667           .001           .000           .000           .000                 .000      23049569.333              .083
23.09.2015 00:30:00          14     1284.493         1280.963           .002           .000           .000           .000                 .000     243817703.571             1.214
23.09.2015 00:45:00          26     1579.532         1574.306           .016           .000           .000           .000                 .000              .000             5.962
23.09.2015 01:00:00          41     1729.918         1718.699           .002           .000           .000           .000                 .000      26916357.415              .317
23.09.2015 01:15:00          26     3850.767         3739.324           .011           .000           .000           .000                 .000              .000             1.731
23.09.2015 01:30:00          27     4565.527         3868.424           .017           .000           .000           .000                 .000      82060590.481             4.407
 

At the first glance it seemed to me as the sql was newly introduced to the database as AWR showed it from the same day, even though I had asked from 5 days snapshots.

The buffer gets per execution for the sql was too high and with multiple sessions with the same sqlid caused latch cbc and in turn high load. So, had to find what was causing this high buffer gets.

The execution plan for sql looked fine. The stats were fine and the index picked by the optimizer was fine too.At this point came in “latchprofx” by Tanel Poder 🙂

SELECT * FROM (
        SELECT
            event
          , TRIM(TO_CHAR(p1, 'XXXXXXXXXXXXXXXX')) latch_addr
          , TRIM(ROUND(RATIO_TO_REPORT(COUNT(*)) OVER () * 100, 1))||'%' PCT
          , COUNT(*)
        FROM
            v$active_session_history
        WHERE
           event = 'latch: cache buffers chains'
       AND session_state = 'WAITING'
       GROUP BY
           event
        , p1
       ORDER BY
           COUNT(*) DESC
   )
   WHERE ROWNUM <= 10
  /  

EVENT                                  LATCH_ADDR        PCT              COUNT(*)
-----------------------------------  ----------------- ---------------- ----------
latch: cache buffers chains           7B9A57DB50        92.2%                 448
latch: cache buffers chains           7AE4EBCD68        2.5%                   12
latch: cache buffers chains           7B682705B8        1%                      5
latch: cache buffers chains           7B6FB0AFD0        .8%                     4
latch: cache buffers chains           7ADA39D6B8        .4%                     2
latch: cache buffers chains           7AE4D206F8        .2%                     1
latch: cache buffers chains           7B4A1A04B0        .2%                     1
latch: cache buffers chains           7B475726C0        .2%                     1
latch: cache buffers chains           7ACB819888        .2%                     1
latch: cache buffers chains           7AC6958BA8        .2%                     1

@latchprofx sid,name,sqlid,object % 7B9A57DB50 100000
 
-- LatchProfX 1.21 by Tanel Poder ( http://www.tanelpoder.com )
 
  SID NAME                                SQLID                      OBJECT       Held       Gets  Held %     Held ms Avg hold ms
----- ----------------------------------- --------------- ----------------- ---------- ---------- ------- ----------- -----------
 7155 cache buffers chains                9cg18w6mfqphn            1ECA461B          4          4     .00       2.025        .506
 1696 cache buffers chains                4ycfn45y3vj5p            1ECA461B          3          3     .00       1.519        .506
 8689 cache buffers chains                9cg18w6mfqphn            1ECA461B          3          3     .00       1.519        .506
 3809 cache buffers chains                9cg18w6mfqphn            1ECA461B          3          3     .00       1.519        .506
 3969 cache buffers chains                9cg18w6mfqphn            1ECA461B          3          3     .00       1.519        .506
 2742 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 2482 cache buffers chains                4ycfn45y3vj5p            1ECA461B          2          2     .00       1.013        .506
  439 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 1514 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 6894 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 6524 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 6101 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506
 3545 cache buffers chains                9cg18w6mfqphn            1ECA461B          2          2     .00       1.013        .506

All rows point to object “1ECA461B”

@dba 1ECA461B
 
    RFILE#     BLOCK# DUMP_CMD
---------- ---------- ------------------------------------------------
       123     673307 -- alter system dump datafile 123 block 673307

select name from v$datafile where file#=123;
 
NAME
---------------------------------------------------------------
+DATA01/usrprd04/datafile/undotbs1.1134.818027637

Ok, so something related to undo!!!!.

The alert log showed ORA-01555 for an Insert SQL in the same table

Wed Sep 23 02:44:32 2015
ORA-01555 caused by SQL statement below (SQL ID: 72pwjk9bxyygw, Query Duration=13521 sec, SCN: 0x0891.8fff8587):
 
                        INSERT INTO message_demo_ctc (.....)  select ........
						
Wed Sep 23 02:45:26 2015
Completed checkpoint up to RBA [0x2a16.2.10], SCN: 9421305667060

User/Pid                   Megs Used MINUTES_ACTIVE MODULE                                             Recs upd/del
-------------------- --------------- -------------- -------------------------------------------------- ------------
USER_EAS7/(3858)             392.82            264 cli@mesagae_demo_ctc.php                              3,945,675

Once the rollback completed, eventhing was back to normal. The buffer gets for sqlid ‘9cg18w6mfqphn’ was down to 24,950 from ~652617k. We had waited for almost 30-40mins for the rollback, could have killed the SID and SMON would have made rollback using parallel processes, which could have been faster.

2 thoughts on “latch: cache buffers chains and rollback

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s