Histogram on VARCHAR2 column having first 32 character identicals

Recently i read the below link and came to know that,while creating histogram on a VARCHAR2 column, Oracle will only consider the first 32 characters in the column. In other words, if the first 32 bytes of the column values are identical all of the values end up in a single bucket of the histogram even though in reality the values are different and are suppose to be in different histogram buckets.

http://blogs.oracle.com/optimizer/entry/how_do_i_drop_an_existing_histogram_on_a_column_and_stop_the_auto_stats_gathering_job_from_creating

1. Create table with column VARCHAR2(35) and insert 35 characters with first 32 characters identical.

SQL> create table hist_test_var (c1 VARCHAR2(35));

Table created.

SQL> 
SQL> insert into hist_test_var select 'abcdefghijklmnopqrstuvwzyzabcdefghi' from dual connect by level  
SQL> insert into hist_test_var select 'abcdefghijklmnopqrstuvwzyzabcdefabc' from dual connect by level  
SQL> insert into hist_test_var select 'abcdefghijklmnopqrstuvwzyzabcdefdef' from dual connect by level  
SQL> insert into hist_test_var select 'sdhskjhdwehdwehdhwdnskdnkwsndkwdhwj' from dual connect by level  
SQL> insert into hist_test_var select 'agjhgsdsjgduywdhsdlksdjlksjdwdjliwj' from dual connect by level  
SQL> commit;

Commit complete.

For the first 3 insert statements the first 32 characters are the same “abcdefghijklmnopqrstuvwzyzabcdef”.Rest 2 are different.

Gather the stats

SQL> exec dbms_stats.gather_table_stats (user,'hist_test_var')

PL/SQL procedure successfully completed.

SQL> 
SQL> col low_val     for a8
SQL> col high_val    for a8
SQL> col data_type   for a9
SQL> col column_name for a11
SQL> 
SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_VAR' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /

COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS                                                                                                                                                        
----------- -------- -------- --------- ---------- --------------- -----------                                                                                                                                                        
C1          abcdefgh sdhskjhd VARCHAR2          .2 NONE                      1                                                                                                                                                        
            ijklmnop wehdwehd                                                                                                                                                                                                         
            qrstuvwz hwdnskdn                                                                                                                                                                                                         
            yzabcdef kwsndkwd                                     

The above shows lack of histogram on C1. Ran few select statements on the table and gathered the stats back

SQL> exec dbms_stats.gather_table_stats(user,'hist_test_var');

PL/SQL procedure successfully completed.

SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_VAR' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /

COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS                                                                                                                                                        
----------- -------- -------- --------- ---------- --------------- -----------                                                                                                                                                        
C1          abcdefgh sdhskjhd VARCHAR2  .000010092 FREQUENCY                 3                                                                                                                                                        
            ijklmnop wehdwehd                                                                                                                                                                                                         
            qrstuvwz hwdnskdn                                                                                                                                                                                                         
            yzabcdef kwsndkwd                     

The above shows NUM_BUCKET as 3, which means while creating the histogram oracle considered “abcdefghijklmnopqrstuvwzyzabcdefghi” , “abcdefghijklmnopqrstuvwzyzabcdefabc” and “abcdefghijklmnopqrstuvwzyzabcdefdef” as same and end up in a single bucket of the histogram even though in reality the values are different and are suppose to be in different histogram buckets.This is because Oracle considers only the first 32 characters in the column.

Lets test with exact 32 characters, keeping 31 characters same.

SQL> create table hist_test_var_31 (c1 VARCHAR2(32));

Table created.

SQL> 
SQL> insert into hist_test_var_31 select 'abcdefghijklmnopqrstuvwzyzabcdeF' from dual connect by level  
SQL> insert into hist_test_var_31 select 'abcdefghijklmnopqrstuvwzyzabcdeG' from dual connect by level  
SQL> insert into hist_test_var_31 select 'abcdefghijklmnopqrstuvwzyzabcdeH' from dual connect by level  
SQL> insert into hist_test_var_31 select 'sdhskjhdwehdwehdhwdnskdnkwsndkwd' from dual connect by level  
SQL> insert into hist_test_var_31 select 'agjhgsdsjgduywdhsdlksdjlksjdwdjl' from dual connect by level  
SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats (user,'hist_test_var_31')

PL/SQL procedure successfully completed.

SQL> 
SQL> 
SQL> col low_val     for a8
SQL> col high_val    for a8
SQL> col data_type   for a9
SQL> col column_name for a11
SQL> 
SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_VAR_31' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /

COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS                                                                                                                                                        
----------- -------- -------- --------- ---------- --------------- -----------                                                                                                                                                        
C1          abcdefgh sdhskjhd VARCHAR2          .2 NONE                      1                                                                                                                                                        
            ijklmnop wehdwehd                                                                                                                                                                                                         
            qrstuvwz hwdnskdn                                                                                                                                                                                                         
            yzabcdeG kwsndkwd                                                                                                                                                                                                         
                                                                                                                                                                                                                                      
REM #### RAN FEW SELECT QUERIES ####

SQL> exec dbms_stats.gather_table_stats(user,'hist_test_var_31');

PL/SQL procedure successfully completed.

SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_VAR_31' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /

COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS                                                                                                                                                        
----------- -------- -------- --------- ---------- --------------- -----------                                                                                                                                                        
C1          abcdefgh sdhskjhd VARCHAR2  .000010121 FREQUENCY                 5                                                                                                                                                        
            ijklmnop wehdwehd                                                                                                                                                                                                         
            qrstuvwz hwdnskdn                                                                                                                                                                                                         
            yzabcdeG kwsndkwd                                                                                                                                                                                                         
                                                                                                                                                                                                                                      

SQL> 

Above shows NUM_BUCKETS 5. The 32nd character of all the distinct values were different, hence making the value different and putting it in a separate bucket.

Does the same happen for column with NUMBER data type.

SQL> create table hist_test_num (c1 NUMBER(35));

Table created.

SQL> 
SQL> 
SQL> insert into hist_test_num select '01234567891011121314151617181910212' from dual connect by level  
SQL> insert into hist_test_num select '01234567891011121314151617181910213' from dual connect by level  
SQL> insert into hist_test_num select '01234567891011121314151617181910214' from dual connect by level  
SQL> insert into hist_test_num select '11111111111111111111111111111111214' from dual connect by level  
SQL> insert into hist_test_num select '22222222222222222222222222222222012' from dual connect by level  
SQL> commit;

Commit complete.

SQL> 
SQL> 
SQL> exec dbms_stats.gather_table_stats (user,'hist_test_num')

PL/SQL procedure successfully completed.

SQL> 
SQL> 
SQL> col low_val     for a8
SQL> col high_val    for a8
SQL> col data_type   for a9
SQL> col column_name for a11
SQL> 
SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_NUM' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /

COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS                                                                                                                                                        
----------- -------- -------- --------- ---------- --------------- -----------                                                                                                                                                        
C1          12345678 22222222 NUMBER            .2 NONE                      1                                                                                                                                                        
            91011121 22222222                                                                                                                                                                                                         
            31415161 22222222                                                                                                                                                                                                         
            71819102 22222222                                                                                                                                                                                                         
            12       012                                                                                                                                                                                                              
                                                                                                                                                                                                                                      

REM #### RAN FEW SELECT STATEMENTS ####

SQL> exec dbms_stats.gather_table_stats(user,'hist_test_num');

PL/SQL procedure successfully completed.

SQL> select
  2     a.column_name,
  3     display_raw(a.low_value,b.data_type) as low_val,
  4     display_raw(a.high_value,b.data_type) as high_val,
  5     b.data_type,
  6     a.density,
  7     a.histogram,
  8     a.num_buckets
  9  from
 10     user_tab_col_statistics a, user_tab_cols b
 11  where
 12     a.table_name='HIST_TEST_NUM' and
 13     a.table_name=b.table_name and
 14     a.column_name=b.column_name
 15  /

COLUMN_NAME LOW_VAL  HIGH_VAL DATA_TYPE    DENSITY HISTOGRAM       NUM_BUCKETS                                                                                                                                                        
----------- -------- -------- --------- ---------- --------------- -----------                                                                                                                                                        
C1          12345678 22222222 NUMBER    9.9674E-06 FREQUENCY                 5                                                                                                                                                        
            91011121 22222222                                                                                                                                                                                                         
            31415161 22222222                                                                                                                                                                                                         
            71819102 22222222                                                                                                                                                                                                         
            12       012                                                                                                                                                                                                              
                                                                                                                                                                                                                                      

References –
http://structureddata.org/2011/06/08/implicit-datatype-conversion-histograms-bad-execution-plan/
http://blogs.oracle.com/optimizer/entry/how_do_i_drop_an_existing_histogram_on_a_column_and_stop_the_auto_stats_gathering_job_from_creating

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s