Index Sample Size

Recently i was asked a question, what would be the sample size percentage for indexes when tables stats are gathered with estimate percent of 5 and cascade set to true.My answer to it was 5 within a second.I had this believe that the sample size percentage for an index is same to the estimate_percent given while gathering the stats.So it was time for some test –

SQL> select * from v$version;
 
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for 32-bit Windows: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
 
SQL> drop table STAT_TEST purge;
 
Table dropped.
 
SQL> create table stat_test (
  2  c1 number not null,
  3  c2 number not null,
  4  c3 varchar2(300) not null);
 
Table created.
 
SQL> create index stat_c2_idx on stat_test(c2);
 
Index created.
 
SQL>
SQL> create unique index stat_c1_indx on stat_test(c1);
 
Index created.
 
SQL> create index stat_c3_idx on stat_test(c3);
 
Index created.
 
SQL> insert into stat_test
  2  select
  3  rownum as c1,
  4  1 as c2,
  5  rpad('A',300,'A') as c3
  6  from
  7  dual
  8  connect by
  9  level / 
Enter value for tblown: ANAND
Enter value for tblnm: STAT_TEST
 
SE SORT1                          SORT2                          S INDEX_NAME                     LAST_ANALYZED   NUM_ROWS SAMPLE_PCT DISTINCT_KEYS CLUSTERING_FACTOR FLAGS
-- ------------------------------ ------------------------------ - ------------------------------ ------------- ---------- ---------- ------------- ----------------- ----------
I  ANAND                          STAT_C2_IDX                      STAT_C2_IDX                                                                                N
I  ANAND                          STAT_C1_INDX                     STAT_C1_INDX                                                                               U
I  ANAND                          STAT_C3_IDX                      STAT_C3_IDX                                                                                N

Gathered stats on the index

SQL> exec dbms_stats.gather_index_stats(ownname =>'ANAND',indname =>'STAT_C1_INDX',estimate_percent=>5,degree => DBMS_STATS.AUTO_DEGREE);
 
PL/SQL procedure successfully completed.
 
SQL> exec dbms_stats.gather_index_stats(ownname =>'ANAND',indname =>'STAT_C2_IDX',estimate_percent=>5,degree => DBMS_STATS.AUTO_DEGREE);
 
PL/SQL procedure successfully completed.
 
SQL> exec dbms_stats.gather_index_stats(ownname =>'ANAND',indname =>'STAT_C3_IDX',estimate_percent=>5,degree => DBMS_STATS.AUTO_DEGREE);
 
PL/SQL procedure successfully completed.

As i have given estimate_percent i would expect sample_percentage to be 5%. So lets check –

SQL> / 
Enter value for tblown: ANAND
Enter value for tblnm: STAT_TEST
 
SE SORT1                          SORT2                          S INDEX_NAME                     LAST_ANALYZED   NUM_ROWS SAMPLE_PCT DISTINCT_KEYS CLUSTERING_FACTOR FLAGS
-- ------------------------------ ------------------------------ - ------------------------------ ------------- ---------- ---------- ------------- ----------------- ----------
I  ANAND                          STAT_C2_IDX                      STAT_C2_IDX                    20120518 0035    1011697         40             1             46109 N
I  ANAND                          STAT_C1_INDX                     STAT_C1_INDX                   20120518 0031    1016203         61       1016203             47680 U
I  ANAND                          STAT_C3_IDX                      STAT_C3_IDX                    20120518 0035     998800          5             1             68460 N

Deleted the stats on the index and gathered with dbms_stats.gather_table_stats with estimate
_percentage of 5% to check what happens in such case

SQL> exec dbms_stats.gather_table_stats(ownname =>'ANAND',tabname =>'STAT_TEST',estimate_percent=>5,degree => DBMS_STATS.AUTO_DEGREE);
 
PL/SQL procedure successfully completed.
 
SQL> / 
Enter value for tblown: ANAND
Enter value for tblnm: STAT_TEST
 
SE SORT1                          SORT2                          S INDEX_NAME                     LAST_ANALYZED   NUM_ROWS SAMPLE_PCT DISTINCT_KEYS CLUSTERING_FACTOR FLAGS
-- ------------------------------ ------------------------------ - ------------------------------ ------------- ---------- ---------- ------------- ----------------- ----------
I  ANAND                          STAT_C2_IDX                      STAT_C2_IDX                    20120518 0037    1008329         40             1             45928 N
I  ANAND                          STAT_C1_INDX                     STAT_C1_INDX                   20120518 0037    1007530         61       1007530             47296 U
I  ANAND                          STAT_C3_IDX                      STAT_C3_IDX                    20120518 0037    1008480          5             1             68880 N
 
SQL> select 'T ' seg_type,
  2         t.owner sort1,
  3         t.table_name sort2,
  4         null sort3,
  5         t.table_name,
  6         to_char(t.last_analyzed,'yyyymmdd hh24mi') last_analyzed,
  7         t.num_rows,
  8         -- t.sample_size,
  9         decode(t.num_rows,
 10                null,to_number(null),
 11                0,100,
 12                round((t.sample_size*100)/t.num_rows,0)) sample_pct,
 13    from dba_tables t
 14   where t.owner = upper('&tblown')
 15     and t.table_name like upper('&tblnm')
 16  / 
Enter value for tblown: ANAND
Enter value for tblnm: STAT_TEST
 
SE SORT1                          SORT2                          S TABLE_NAME                LAST_ANALYZED   NUM_ROWS SAMPLE_PCT 
-- ------------------------------ ------------------------------ - ------------------------- ------------- ---------- ---------- -
T  ANAND                          STAT_TEST                        STAT_TEST                 20120518 0037    1001020          5

So the table shows sample_pct as 5% but for indexes is different.

I posted the same on OTN forum and Sir Jonathan Lewis replied —

According to a comment I wrote a couple of years ago ( https://forums.oracle.com/forums/thread.jspa?threadID=1116700 ) Oracle used to have a note on MOS stating that because the data in an index was sorted, and therefore easily able to produce a distorted picture on a small sample, the index sample was adjusted if necessary to aim for a minimum number of leaf blocks which is about 900.

The two indexes where you’ve got large samples very short keys, so the 5% was probably too few blocks. The index where you got a 5% sample was a long key, so maybe the 5% was comfortably over 900.

After which i did few checks and observed that is if an index is having upto ~1100 leaf blocks, then stats with 100% sample is being generated on the index and as the leaf block increase oracle internally determines the estimate percentage for sampling.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s