Gather_table_stats - Histograms / Buckets

Hello guys,
i have seen that the default behavior of the pl/sql procedure gather_table_stats changed from 9i to 10g.
The default value for method_opt was "FOR ALL COLUMNS SIZE 1" in 9i and now it changed to "get_param('METHOD_OPT')" which is default "FOR ALL COLUMNS SIZE AUTO".
Ok now in my environment i don't need histograms on columns. As far as i understand the value "FOR ALL COLUMNS SIZE 1" means no histograms....
BUT.. why is one bucket needed for "no histograms".. as far is i understand all table colums are partitioned in one intvervall and the most frequently value (estimated or computed) in each column is stored as endpoint value.
But if i have one bucket with one most frequently value.. i am using a histogram... or does oracle don't use this endpoint value if method_opt is set to "SIZE 1" and it thinks that all values of the columns are spreaded the same way...
In my opinion it would make more sense to set "SIZE 0", if i don't want to use histograms... or in other words "why is one bucket needed for no histograms"...
Maybe you can clarify..
Regards
Stefan

Hello Satheesh,
Oracle assign high value and low value in one bucket for no histogramsCan i have a look at these values?
Because in the oracle documentation stands:
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm
Column statistics
* Number of distinct values (NDV) in column
* Number of nulls in column
* Data distribution (histogram)
But there is nowhere described that a low and high value is stored... or is it included in "Data distribution" ... but 1 bucket is not a "real" histogram
Regards
Stefan

Similar Messages

Extended statistics issue

Hello!
I have a problem with extended statistics on 11.2.0.3
Here is the script I run
drop table col_stats;
create table col_stats as
select 1 a, 2 b,
from dual
connect by level<=100000;
insert into col_stats (
select 2, 1,
from dual
connect by level<=100000);
-- check the a,b distribution
A
    B
COUNT(1)
2
    1
100000
1
    2
100000
-- extended stats DEFINITION
select dbms_stats.create_extended_stats('A','COL_STATS','(A,B)') name
from dual;
-- set estimate_percent to 100%
EXEC dbms_stats.SET_TABLE_prefs ('A','COL_STATS','ESTIMATE_PERCENT',100);
-- check the changes
select dbms_stats.get_prefs ('ESTIMATE_PERCENT','A','COL_STATS')
from dual;
-- NOW GATHER COLUMN STATS
BEGIN
DBMS_STATS.GATHER_TABLE_STATS (
    OWNNAME    => 'A',
    TABNAME    => 'COL_STATS',
    METHOD_OPT => 'FOR ALL COLUMNS' );
END;
set autotrace traceonly explain
select * from col_stats where a=1 and b=1;
SQL> select * from col_stats where a=1 and b=1;
Execution Plan
Plan hash value: 1829175627
| Id | Operation         | Name      | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT |           | 50000 |   683K|   177 (2)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| COL_STATS | 50000 |   683K|   177 (2)| 00:00:03 |
Predicate Information (identified by operation id):
   1 - filter("A"=1 AND "B"=1)
How come the optimizer expects 50000 rows?
Thanks in advance.
Rob

RobK wrote:
SQL> select * from col_stats where a=1 and b=1;
Execution Plan
Plan hash value: 1829175627
| Id | Operation         | Name      | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT |           | 50000 |   683K|   177 (2)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| COL_STATS | 50000 |   683K|   177 (2)| 00:00:03 |
Predicate Information (identified by operation id):
   1 - filter("A"=1 AND "B"=1)
How come the optimizer expects 50000 rows?
Thanks in advance.
Rob
This is an expected behavior.
When you create extended statistics then it creates histogram for column groups(and it is virtual column).
The query predicate "where a=1 and b=1" is actually out-of-range predicate for that virtual column. In such cases optimizer should be estimate
selectivity as 0 (so cardinality 1) but in such cases optimizer uses density(in our case actually used Newdensity) of column(or your virtual columns).
Let see following information(this is exact your case)
NUM_ROWS
200000
COLUMN_NAME                                           NUM_DISTINCT     DENSITY            HISTOGRAM
A                                                                  2                         .00000250            FREQUENCY
B                                                                  2                          .00000250           FREQUENCY
SYS_STUNA$6DVXJXTP05EH56DTIR0X          2                          .00000250           FREQUENCY
COLUMN_NAME                    ENDPOINT_NUMBER               ENDPOINT_VALUE
A                                          100000                                      1
A                                          200000                                      2
B                                          100000                                      1
B                                          200000                                      2
SYS_STUNA$6DVXJXTP05EH56DTIR0X          100000             1977102303
SYS_STUNA$6DVXJXTP05EH56DTIR0X          200000             7894566276
Your predicate is "where a=1 and b=1" and it is equivalent with "where SYS_STUNA$6DVXJXTP05EH56DTIR0X = sys_op_combined_hash (1, 1)"
As you know with frequency histogram selectivity for equ(=) predicate is (E_endpoint-B_Endpoint)/num_rows. Here predicate value has located
between E_endpoint and B_Endpoint histogram buckets(endpoint numbers). But sys_op_combined_hash (1, 1) = 7026129190895635777. So then how can
i compare this value and according histogram endpoint values?. Answer is when creating histogram oracle do not use exact sys_op_combined_hash(x,y)
but it also apply MOD function, so you have to compare MOD (sys_op_combined_hash (1, 1), 9999999999)(which is equal 1598248696) with endpoint values
. So 1598248696 this is not locate between any endpoint number. Due to optimizer use NewDensity as density(in this case can not endpoint inf)
In below trace file you clearly can see that
BASE STATISTICAL INFORMATION
Table Stats::
Table: COL_STATS Alias: COL_STATS
    #Rows: 200000 #Blks: 382 AvgRowLen: 18.00
Access path analysis for COL_STATS
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for COL_STATS[COL_STATS]
Column (#1):
    NewDensity:0.250000, OldDensity:0.000003 BktCnt:200000, PopBktCnt:200000, PopValCnt:2, NDV:2
Column (#2):
    NewDensity:0.250000, OldDensity:0.000003 BktCnt:200000, PopBktCnt:200000, PopValCnt:2, NDV:2
Column (#3):
    NewDensity:0.250000, OldDensity:0.000003 BktCnt:200000, PopBktCnt:200000, PopValCnt:2, NDV:2
ColGroup (#1, VC) SYS_STUNA$6DVXJXTP05EH56DTIR0X
    Col#: 1 2    CorStregth: 2.00
ColGroup Usage:: PredCnt: 2 Matches Full: #1 Partial: Sel: 0.2500
Table: COL_STATS Alias: COL_STATS
    Card: Original: 200000.000000 Rounded: 50000 Computed: 50000.00 Non Adjusted: 50000.00
Access Path: TableScan
    Cost: 107.56 Resp: 107.56 Degree: 0
      Cost_io: 105.00 Cost_cpu: 51720390
      Resp_io: 105.00 Resp_cpu: 51720390
Best:: AccessPath: TableScan
         Cost: 107.56 Degree: 1 Resp: 107.56 Card: 50000.00 Bytes: 0
Note that NewDensity calculated as 1/(2*num_distinct)= 1/4=0.25 for frequency histogram!.
CBO used column groups statistic and estimated cardinality was 200000*0.25=50000.
Remember that they are permanent statistics and RDBMS gathered they by analyzing actual table data(Even correlation columns data).
But dynamic sampling can be good in your above situation, due to it is calculate selectivity in run time using sampling method together real predicate.
For other situation you can see extends statistics is great help for estimation like where a=2 and b=1 because this is actual data and according information(stats/histograms) stored in dictionary.
SQL> select * from col_stats where a=2 and b=1;
Execution Plan
Plan hash value: 1829175627
| Id | Operation         | Name      | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT |           |   100K|   585K|   108   (3)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| COL_STATS |   100K|   585K|   108   (3)| 00:00:02 |
Predicate Information (identified by operation id):
   1 - filter("A"=2 AND "B"=1)
and from trace file
Table Stats::
Table: COL_STATS Alias: COL_STATS
    #Rows: 200000 #Blks: 382 AvgRowLen: 18.00
Access path analysis for COL_STATS
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for COL_STATS[COL_STATS]
Column (#1):
    NewDensity:0.250000, OldDensity:0.000003 BktCnt:200000, PopBktCnt:200000, PopValCnt:2, NDV:2
Column (#2):
    NewDensity:0.250000, OldDensity:0.000003 BktCnt:200000, PopBktCnt:200000, PopValCnt:2, NDV:2
Column (#3):
    NewDensity:0.250000, OldDensity:0.000003 BktCnt:200000, PopBktCnt:200000, PopValCnt:2, NDV:2
ColGroup (#1, VC) SYS_STUNA$6DVXJXTP05EH56DTIR0X
    Col#: 1 2    CorStregth: 2.00
ColGroup Usage:: PredCnt: 2 Matches Full: #1 Partial: Sel: 0.5000
Table: COL_STATS Alias: COL_STATS
    Card: Original: 200000.000000 Rounded: 100000 Computed: 100000.00 Non Adjusted: 100000.00
Access Path: TableScan
    Cost: 107.56 Resp: 107.56 Degree: 0
      Cost_io: 105.00 Cost_cpu: 51720365
      Resp_io: 105.00 Resp_cpu: 51720365
Best:: AccessPath: TableScan
         Cost: 107.56 Degree: 1 Resp: 107.56 Card: 100000.00 Bytes: 0
Lets calculate:
MOD (sys_op_combined_hash (2, 1), 9999999999)=1977102303 and for it (e_endpoint-b_enpoint)/num_rows=(200000-100000)/200000=0.5
and result card=sel*num_rows(or just e_endpoint-b_enpoint)=100000.

BR0301W SQL error -1017

Hi,
I want to management and monitoring to a oracle remote database (JAVA database). I´ve made the connection from dbacockpit and I´ve put the user of schema sapsr3db and the connection it´s succesfully established
-- MESSAGE Database connection Java_db established successfully
But I want to schedule the administrative actions for DB, via DB13 I´ve always the next message error:
Parameters:-u / -jid LOG__20110501373222 -sid D10 -c force -p initepd.sap -sd
BR0002I BRARCHIVE 7.00 (52)
BR0252E Function fopen() failed for 'F:oracleorasap102databaseinitepd.sap' at location BrInitSapRead-1
BR0253E errno 13: Permission denied
BR0159E Error reading BR*Tools profile F:oracleorasap102databaseinitepd.sap
BR0280I BRARCHIVE time stamp: 2011-05-01
BR0301W SQL error -1017 at location BrDbConnect-2, SQL statement:
'CONNECT /'
ORA-01017: invalid username/password; logon denied
BR0310W Connect to database instance D10 failed
BR0007I End of offline redo log processing: aefuiaiw.log 2011-05-01
BR0280I BRARCHIVE time stamp: 2011-05-01
BR0005I BRARCHIVE terminated with errors
External program terminated with exit code 3
BRARCHIVE returned error status E
I´ve found several notes with this error 480266 , 400241, 113747, 651351 but I´m not sure that It´s the same error.
More thanks.

<owner> | (<owner_list>)
default: all SAP owners
next_owner = sapr3
database objects to adapt next extents
all | all_ind | special | [<owner>.]<table> | [<owner>.]<index>
| [<owner>.][<prefix>]*[<suffix>] | <tablespace> | (<object_list>)
default: all abjects of selected owners, example:
next_table = (SDBAH, SAPR3.SDBAD)
database objects to be excluded from adapting next extents
all_part | [<owner>.]<table> | [<owner>.]<index>
| [<owner>.][<prefix>]*[<suffix>] | <tablespace> | (<object_list>)
default: no exclusion, example:
next_exclude = (SDBAH, SAPR3.SDBAD)
database objects to get special next extent size
all_sel:<size>[/<limit>] | [<owner>.]<table>:<size>[/<limit>]
| [<owner>.]<index>:<size>[/<limit>]
| [<owner>.][<prefix>]*[<suffix>]:<size>[/<limit>]
| (<object_size_list>)
default: according to table category, example:
next_special = (SDBAH:100K, SAPR3.SDBAD:1M/200)
maximum next extent size
default: 2 GB - 5 * <database_block_size>
next_max_size = 1G
maximum number of next extents
default: 0 - unlimited
next_limit_count = 300
database owner of objects to update statistics
<owner> | (<owner_list>)
default: all SAP owners
stats_owner = sapr3
database objects to update statistics
all | all_ind | all_part | missing | info_cubes | dbstatc_tab
| dbstatc_mon | dbstatc_mona | [<owner>.]<table> | [<owner>.]<index>
| [<owner>.][<prefix>]*[<suffix>] | <tablespace> | (<object_list>)
| harmful | locked | system_stats | oradict_stats | oradict_tab
default: all abjects of selected owners, example:
stats_table = (SDBAH, SAPR3.SDBAD)
database objects to be excluded from updating statistics
all_part | info_cubes | [<owner>.]<table> | [<owner>.]<index>
| [<owner>.][<prefix>]*[<suffix>] | <tablespace> | (<object_list>)
default: no exclusion, example:
stats_exclude = (SDBAH, SAPR3.SDBAD)
method for updating statistics for tables not in DBSTATC
E | EH | EI | EX | C | CH | CI | CX | A | AH | AI | AX | E= | C= | =H
| =I | =X | +H | +I
default: according to internal rules
stats_method = E
sample size for updating statistics for tables not in DBSTATC
P<percentage_of_rows> | R<thousands_of_rows>
default: according to internal rules
stats_sample_size = P10
number of buckets for updating statistics with histograms
default: 75
stats_bucket_count = 75
threshold for collecting statistics after checking
<threshold> | (<threshold> [, all_part:<threshold>
| info_cubes:<threshold> | [<owner>.]<table>:<threshold>
| [<owner>.][<prefix>]*[<suffix>]:<threshold>
| <tablespace>:<threshold> | <object_list>])
default: 50%
stats_change_threshold = 50
number of parallel threads for updating statistics
default: 1
stats_parallel_degree = 1
processing time limit in minutes for updating statistics
default: 0 - no limit
stats_limit_time = 0
parameters for calling DBMS_STATS supplied package
all:R|B[<buckets>|A|S|R]:0|<degree>A|D
| all_part:R|B[<buckets>|A|S|R]:0|<degree>A|D
| info_cubes:R|B:A|D|0|<degree>
| [<owner>.]<table>:R|B[<buckets>|A|S|R]:0|<degree>A|D
| [<owner>.][<prefix>]*[<suffix>]:R|B[<buckets>|A|S|R]:0|<degree>A|D
| (<object_list>) | NO
R|B[<buckets>|A|S|R]:
'R' - row sampling, 'B' - block sampling,
<buckets> - histogram buckets count, 'A' - auto buckets count,
'S' - skew only, 'R' - repeat
<degree>A|D:
<degree> - dbms_stats parallel degree, '0' - table degree,
'A' - auto degree, 'D' - default degree
default: ALL:R:0
stats_dbms_stats = ([ALL:R:1,][<owner>.]<table>:R:<degree>,...)
definition of info cube tables
default | rsnspace_tab | [<owner>.]<table>
| [<owner>.][<prefix>]*[<suffix>] | (<object_list>) | null
default: rsnspace_tab
stats_info_cubes = (/BIC/D, /BI0/D, ...)
special statistics settings
(<table>:[<owner>]:<active>:[<method>]:[<sample>], ...)
stats_special = (<special_list>)
recovery type [complete | dbpit | tspit | reset | restore | apply
| disaster]
default: complete
recov_type = complete
directory for brrecover file copies
default: $SAPDATA_HOME/sapbackup
recov_copy_dir = E:\oracle\D10\sapbackup
time period for searching for backups
0 - all available backups, >0 - backups from n last days
default: 30
recov_interval = 30
degree of paralelism for applying archive log files
0 - use Oracle default parallelism, 1 - serial, >1 - parallel
default: Oracle default
recov_degree = 0
number of lines for scrolling in list menus
0 - no scrolling, >0 - scroll n lines
default: 20
scroll_lines = 20
time period for displaying profiles and logs
0 - all available logs, >0 - logs from n last days
default: 30
show_period = 30
directory for brspace file copies
default: $SAPDATA_HOME/sapreorg
space_copy_dir = E:\oracle\D10\sapreorg
directory for table export dump files
default: $SAPDATA_HOME/sapreorg
exp_dump_dir = E:\oracle\D10\sapreorg
database tables for reorganization
[<owner>.]<table> | [<owner>.][<prefix>]*[<suffix>]
| [<owner>.][<prefix>]%[<suffix>] | (<table_list>)
no default
reorg_table = (SDBAH, SAPR3.SDBAD)
database indexes for rebuild
[<owner>.]<index> | [<owner>.][<prefix>]*[<suffix>]
| [<owner>.][<prefix>]%[<suffix>] | (<index_list>)
no default
rebuild_index = (SDBAH0, SAPR3.SDBAD0)
database tables for export
[<owner>.]<table> | [<owner>.][<prefix>]*[<suffix>]
| [<owner>.][<prefix>]%[<suffix>] | (<table_list>)
no default
exp_table = (SDBAH, SAPR3.SDBAD)
database tables for import
<table> | (<table_list>)
no default
do not specify table owner in the list - use -o|-owner option for this
imp_table = (SDBAH, SDBAD)

Method_opt = 'FOR ALL COLUMNS SIZE REPEAT'

Hi all Gurus,
We have a script to take statitics (see below), it runs every day and All the tables has de "MONITORING" option in a 9.2.0.5 database.
My question is concern to "method_opt =>'FOR ALL COLUMNS SIZE REPEAT',OPTIONS=>'GATHER EMPTY',estimate_percent =>5);
So, for a new table with columnns and indexes (with "monitoring" option), will it create statistics or histograms statistics when the script run for the first time on the table??? and then, will continue with/without histograms???
begin
DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO();
DBMS_STATS.GATHER_DATABASE_STATS(DEGREE=>4,GRANULARITY=>'ALL',CASCADE=>TRUE,method_opt =>'FOR ALL COLUMNS SIZE REPEAT',OPTIONS=>'GATHER EMPTY',estimate_percent =>5);
DBMS_STATS.GATHER_DATABASE_STATS(estimate_percent =>5,OPTIONS=>'GATHER STALE',method_opt =>'FOR ALL COLUMNS SIZE REPEAT',degree => 4, cascade=>true,STATTAB=>'TABLA_ESTADISTICAS',STATID=>to_char(sysdate,'yymmdd'),STATOWN=>'OPER');
end;
Regards,

Hi all Gurus,
We have a script to take statitics (see below), it runs every day and All the tables has de "MONITORING" option in a 9.2.0.5 database.
My question is concern to "method_opt =>'FOR ALL COLUMNS SIZE REPEAT',OPTIONS=>'GATHER EMPTY',estimate_percent =>5);
So, for a new table with columnns and indexes (with "monitoring" option), will it create statistics or histograms statistics when the script run for the first time on the table??? and then, will continue with/without histograms???
begin
DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO();
DBMS_STATS.GATHER_DATABASE_STATS(DEGREE=>4,GRANULARITY=>'ALL',CASCADE=>TRUE,method_opt =>'FOR ALL COLUMNS SIZE REPEAT',OPTIONS=>'GATHER EMPTY',estimate_percent =>5);
DBMS_STATS.GATHER_DATABASE_STATS(estimate_percent =>5,OPTIONS=>'GATHER STALE',method_opt =>'FOR ALL COLUMNS SIZE REPEAT',degree => 4, cascade=>true,STATTAB=>'TABLA_ESTADISTICAS',STATID=>to_char(sysdate,'yymmdd'),STATOWN=>'OPER');
end;
Regards,
{code}
I have taken following explanation from documentation:
{code}
METHOD_OPT - The value controls column statistics collection and histogram creation. It accepts either of the following options, or both in combination:
FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
FOR COLUMNS [size clause] column|attribute [size_clause] [,column|attribute [size_clause]...]
size_clause is defined as size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}
column is defined as column := column_name | (extension)
- integer : Number of histogram buckets. Must be in the range [1,254].
- REPEAT : Collects histograms only on the columns that already have histograms.
- AUTO : Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.
- SKEWONLY : Oracle determines the columns to collect histograms based on the data distribution of the columns.
- column_name : name of a column
- extension : can be either a column group in the format of (column_name, colume_name [, ...]) or an expression
The default is FOR ALL COLUMNS SIZE AUTO.
{code}
GATHER EMPTY: Gathers statistics on objects which currently have no statistics. Return a list of objects found to have no statistics.
Reference: http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_stats.htm
Please go through the link, it will give you more clear picture on DBMS_STATS.
Regards,
S.K.

INTERNAL_FUNCTION for IN List 11.2.0.2

deleting this thread..
Edited by: OraDBA02 on Oct 3, 2012 2:31 PM

>
Why would a histogram list show any of a large number of values that may not exist?
>
Why would I have any way of knowing what data exists and what doesn't? I'm pointing out the lack of a histogram for a value that is being used by the query so OP can comment on whether there are any values or not.
Data posted by OP appears to show 100 million rows and 15 distinct values for SW_OPERATION but there are only 11 histogram buckets. That seems a little odd to me so would like OP to explain why the other 4 values have no histograms. The total for the 11 buckets posted is tiny compared to 100 million.
One possible thing for OP to test is using ONLY filters that have histograms and a query that uses an index instead of a FTS.
>
And a nitpick, if any partitioning key is involved with changing a column, that will err.
>
Can you post the code that you tested with and the error you got?
This code works just fine for me. Table is one from another forum question I had lying around - date data should not be stored this way
CREATE TABLE SCOTT.FOO1
EVENT_TIME VARCHAR2(18 BYTE),
EVENT_ID NUMBER(38)
PARTITION BY RANGE (EVENT_TIME)
PARTITION EVT_2011 VALUES LESS THAN ('20120101000000'),
PARTITION EVT_201201 VALUES LESS THAN ('20120201000000'),
PARTITION EVT_201202 VALUES LESS THAN ('20120301000000')
INSERT INTO FOO1 VALUES('20120100000000', 1)
INSERT INTO FOO1 VALUES ('20120200000000', 2)
INSERT INTO FOO1 VALUES ('20120300000000', 3)
COMMIT
alter table foo1 modify (event_time varchar2(18 char))
alter table foo1 modify (event_time varchar2(18 byte))I don't get any errors. I did not test with either global or local indexes.

Help explaining data in dba_histograms

Hi all,
On Oracle 11gr2, I need to understand the output of dba_histograms for a column with frequency based histogram:
COLUMN_ID COLUMN_NAME     DATA_TYPE    NUM_DISTINCT NUM_NULLS    DENSITY HISTOGRAM
        94 DP_STR          DATE                    7          0 1.6182E-08 FREQUENCY
OWN TABLE_NAME COLUMN_NAME    ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_
FH   XF_TNOG_SL DP_STR         811              2456473
FH   XF_TNOG_SL DP_STR         1242             2456474
FH   XF_TNOG_SL DP_STR         1984             2456475
FH   XF_TNOG_SL DP_STR         2969             2456476
FH   XF_TNOG_SL DP_STR         3792             2456477
FH   XF_TNOG_SL DP_STR         4626             2456478
FH   XF_TNOG_SL DP_STR         5486             2456479
7 rows selected.
SQL> select DP_STR from FH.XF_TNOG_SL where rownum <= 10;
DAT_POST_
05-JUL-13
05-JUL-13
05-JUL-13
05-JUL-13
05-JUL-13
05-JUL-13
05-JUL-13
05-JUL-13
05-JUL-13
05-JUL-13
10 rows selected.
My question here is that since it's a frequency based histograms, shouldn't it be showing me the actual 7 distinct dates and their respective frequency instead of a number like 811, 1241 etc ?

ENDPOINT_NUMBER is defined as the "Histogram Bucket Number" in the documentation. In your case of a Frequency Histogram, this represents the incremental number of rows for each bucket. Thus, the first bucket has 811 rows, the second bucket has (1242-811) rows, the third bucket has (1984-1242) rows etc.
The ENDPOINT_NUMBER is a numeric (Oracle Internal) representation of the DATE values for that column. Thus, the 7 distinct dates are represented as 2456473, 2456474, 2456475 etc
Hemant K Chitale

Misaligned SQLPlus output

Hi,
Yes, I know this should go in the SQLPlus (tumbleweed) forum, but somebody there asked exactly the same question over a month ago and it still hasn't received even one attempted reply!
So, simple question; why is the following output misaligned? Specifically, the value 14 for SAMPLE_SIZE is placed in the middle of its column, and consequently LAST_ANALYZED (21-feb-13) is shoved into the SAMPLE_SIZE column:
1 select column_name, data_type, avg_col_len, density, num_distinct NDV, histogram, num_buckets buckets, sample_
size, last_analyzed,data_type
2 from dba_tab_cols
3 where owner = 'SCOTT'
4 and table_name = 'EMP'
5 and column_name = 'EMPNO'
6* order by internal_column_id
SYS@ORCL> /
COLUMN_NAME               DATA_TYPE AVG_COL_LEN     DENSITY          NDV HISTOGRAM       BUCKETS SAMPLE_SIZE LAST_
ANAL DATA_TYPE
EMPNO                     NUMBER               4 .071428571           14 NONE                  1       14 21-FEB-1
3 NUMBER
SYS@ORCL>Btw, the **** above should read A N A L (without the spaces) as in LAST_ANALYZED but a rather enthusiastic filter seems to be at work.
Second question - I was wondering if I had entered a COLUMN ... FORMAT command that had screwed things up..but as far as I can tell, there is no way to retrieve the list of column formats that SQLPlus is currently using - or is there?
*****Edit - ignore the second question - I just found that you can simply type
COLUMNto get a listing of all column formatting instructions currently in use. I checked to see if SAMPLE_SIZE has any formatting applied to it and it does not.
Many thanks,
Jason
Edited by: Jason_942375 on Mar 25, 2013 9:53 PM
Edited by: Jason_942375 on Mar 25, 2013 9:55 PM

Jason_942375 wrote:
Hi guys
Sorry, forgot to "watch" the thread.
Well, thanks for your advice, which all seems to boil down to setting linesize. I should have said, I had already looked at this and discounted it as responsible for the issue. I've been able to refine the problem with the following use case:
SCOTT@ORCL> SHOW LINESIZE
linesize 120
SCOTT@ORCL> DROP TABLE T;
Table dropped.
SCOTT@ORCL> create table T (X number);
Table created.
SCOTT@ORCL> insert into t values (14);
1 row created.
SCOTT@ORCL> commit;
Commit complete.
SCOTT@ORCL> select * from t;
X
14
SCOTT@ORCL> select lpad('Z',100,'Z') dummy , X from T;
DUMMY                                                                                                      X
ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ      14
SCOTT@ORCL>What you can see is that the formatting of the column X changes. When just that column is selected, all is fine. When it is output next to other column output, it gets misaligned.
I think it may have something to do with the Window width, which I currently have set to 115.
If I set the linesize to a few less (not one less) than 115, it seems to wrap the next column okay. But more testing is needed (though it's not exactly the kind of thing you want to waste hours on...although it's bloody annoying!!).Probably a combination of your window width, AND the font. If it is a 'proportional' font, a space will not be as wide as a "Z" or a "W" ... a string of spaces will tend to pull everything following it to the left. Oracle and SQLPlus have no control over how your windows client renders things.

Creating buckets or histograms with a query

I know I can do what is below with a case statement defining each range. How do I make it dynamic?
I know about the width_bucket and ntile functions. I am having trouble getting them to do what I want.
I created the following table to get counts of duplicate names
create table name_dupe as
select name,count(*) mycount
from mynametable
group by name
having count(*) > 1
I now want to find out how many duplicate names fit into certian counts.
for example
How many names have between 101 and 200 duplicates, 201 and 300. 301 and 400
and so on. I may want to change the bucket ranges.
how do I do this? I think its with width_bucket and not ntile, but I can't get it to work. This iw what I tried.
select distinct mycount,dupes
from (
select mycount,width_bucket(mycount,200,300,30) dupes
from name_dupe
where dupes > 0
order by mycount
sample output:
mycount dupe
    887         31
    909         31
    993         31
   1034         31
   1341         31
   1431         31
   1490         31
   1604         31
   1664         31
   1721         31
   2106         31Edited by: Guess2 on Oct 6, 2011 11:15 AM
Edited by: Guess2 on Oct 6, 2011 11:50 AM

Guess2 wrote:
I know I can do what is below with a case statement defining each range. How do I make it dynamic?
I know about the width_bucket and ntile functions. I am having trouble getting them to do what I want.
I created the following table to get counts of duplicate names
create table name_dupe as
select name,count(*) mycount
from mynametable
group by name
having count(*) > 1
I now want to find out how many duplicate names fit into certian counts.
for example
How many names have between 101 and 200 duplicates, 201 and 300. 301 and 400
and so on. I may want to change the bucket ranges.
how do I do this? I think its with width_bucket and not ntile, but I can't get it to work. This iw what I tried.
select distinct mycount,dupes
from (
select mycount,width_bucket(mycount,200,300,30) dupes
from name_dupe
where dupes > 0
order by mycount
sample output:
mycount dupe
887         31
909         31
993         31
1034         31
1341         31
1431         31
1490         31
1604         31
1664         31
1721         31
2106         31Edited by: Guess2 on Oct 6, 2011 11:15 AM
Edited by: Guess2 on Oct 6, 2011 11:50 AM
width_bucket(mycount,200,300,30) According to this function you've tried to build 30-bucket histogram from 200 to 300 with step 3 (300-200)/30 all overloaded values more then 300 went to group 31.
If you want to cover your range you should use function with parameter like this:
width_bucket(mycount,200,3000,30)

Histograms num buckets

Hi.
We have a table with about 1 400 000 rows. It is accessed
with select * from table where status = :1.
:1 equals the value 1.
Status is number(1) with thee destinct values 1,2,3.
1 represents from 0 to 100 rows dependent of time of day.
2 represent 70 000 rows and 3 represent 1 330 000 rows.
The column is indexed and statistics gathered on the table has given a one bucked histogram on this column (Default out of gather_schema_stats or something).
Anyhow, CBO choosed full table scan on the table. Any suggestion on how to build up statistics on this one to prevent full table scan?
Thanks.
Kjell Ove

Hi Rob.
This really did the trick.
New histograms:
OWNER TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
TV2_WEBTV EMAIL_LOG STATUS 1 1
TV2_WEBTV EMAIL_LOG STATUS 1334058 2
TV2_WEBTV EMAIL_LOG STATUS 1406975 3
Explain plan:
Execution Steps:
Step # Step Name
3 SELECT STATEMENT
2 TV2_WEBTV.EMAIL_LOG TABLE ACCESS [BY INDEX ROWID]
1 TV2_WEBTV.IDX_EMAIL_LOG_STATUS INDEX [RANGE SCAN]
So, is this really three bucket that represents each of the unique values in this column?
Thanks
Kjell Ove

Trying to make sense of histogram data

Hi,
I'm trying to create some histograms for some highly skewed columns and try to make sense of the gathered statistics.
The table I am analyzing has two columns:
TRE_ID_O indicates the transaction number in which the row was added.
TRE_ID_V indicates the transaction number in which the row was deleted.
Transaction # 1 loads a lot of data. All loaded rows will have TRE_ID_O = 1 and TRE_ID_V = null.
In subsequent transactions rows are added and deleted. After some time the data distribution over the transaction ID's looks like this:
Count(*) TRE_ID_O
944940      1
1        2
1        4
2        5
1        6
1        7
1        8
1        12
1        13
1        14
1        15
1        16
1        17
1        18
1        19
1        20
1        21
COUNT(*) TRE_ID_V
944940      <null>
1        2
1        4
2        5
1        6
1        7
1        8
1        12
1        13
1        14
1        15
1        16
1        17
1        18
1        19
1        20
1        21Using the index on tre_id_o and tre_id_v for transaction numbers > 1 will be very selective.
Histogram data:
DBMS_STATS.GATHER_TABLE_STATS('NGM101','NGG_BASISCOMPONENT', METHOD_OPT => 'FOR COLUMNS SIZE auto tre_id_o');
DBMS_STATS.GATHER_TABLE_STATS('NGM101','NGG_BASISCOMPONENT', METHOD_OPT => 'FOR COLUMNS SIZE auto tre_id_v'); In DBA_HISTOGRAMS I find:
COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
TRE_ID_V    1               2
TRE_ID_V    2               4
TRE_ID_V    4               5
TRE_ID_V    5               6
TRE_ID_V    6               7
TRE_ID_V    7               8
TRE_ID_V    8               12
TRE_ID_V    9               13
TRE_ID_V    10              14
TRE_ID_V    11              15
TRE_ID_V    12              16
TRE_ID_V    13              17
TRE_ID_V    14              18
TRE_ID_V    15              19
TRE_ID_V    16              20
TRE_ID_V    17              21
TRE_ID_O    5500            1     Why the difference between TRE_ID_V and TRE_ID_O? (I found that the <null> value in tre_id_v makes the difference but why?)
Why is there only 1 bucket for TRE_ID_O ?
thanks Rene

Hello Rene,
if only one bucket is existing... you have no histograms or in other words:
One bucket means no histograms. (see also metalink note #175258.1)
-> method_opt => 'FOR ALL COLUMNS SIZE 1', Table and columns statistics. No histogram generated
Why the difference between TRE_ID_V and TRE_ID_O? (I found that the <null> value in tre_id_v makes the difference but why?)Let's have a look at the oracle documentation for DBMS_STATS.GATHER_TABLE_STATS:
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#sthref8129
-> - AUTO : Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.
Is this column TRE_ID_O not used in much queries or only used very infrequently? That would be an explanation
Try to gather the statitics only with data distribution
-> DBMS_STATS.GATHER_TABLE_STATS('NGM101','NGG_BASISCOMPONENT', METHOD_OPT => 'FOR COLUMNS SIZE SKEWONLY tre_id_o');
Btw. you maybe wonder why the null values are not shown in the buckets for column TRE_ID_V:
Have a look here:
http://www.ioug.org/client_files/members/select_pdf/05q2/SelectQ205_Kanagaraj.pdf
-> Before we move on, let us state that histograms are essentially “buckets” that
specify a range of values in a column. The Oracle kernel sorts the non-null
values in the column and groups them into the specified number of these
buckets so that each bucket holds the same number of data points, bounded
by the end point value of the previous bucket.
In the pdf document there are also some sql traces which show how oracle determine the columns, etc..
@ Rob:
As Sybrand already suggested, Oracle already has the statistics that your table contains 944957 rows of which 944940 are NULL for column tre_id_vOk that but i will make no sense... because histograms are created for the column tre_id_v.
Regards
Stefan

Auto estimate percent and histograms or small est percent no histograms by default?

In the past I've used custom written statistics gathering scripts that by default gather statistics on large tables with a small estimate percentage and FOR ALL COLUMNS SIZE 1. They allow the estimate percentage to be set higher on individual tables and allow me to choose individual columns to have histograms with the maximum number of buckets. The nice thing about this approach was that the statistics ran very efficiently by default and they could be dialed up as needed to tune individual queries.
But, with 11g you can set preferences at the table level so that the automatic stats job or even a manual run of gather_table_stats will use your settings. Also, there is some special sampling algorithm associated with auto_sample_size that you miss out on if you manually set the estimate percentage. So, I could change my approach to query tuning and statistics gathering to use AUTO_SAMPLE_SIZE and FOR ALL COLUMNS SIZE AUTO by default and then override these if needed to tune a query or if needed to make the statistics run in an acceptable length of time.
I work in an largish company with a team of about 10 DBAs and a bunch of developers and over 100 Oracle databases so we can't really carefully keep track of each system. Having the defaults be less resource intensive saves me the grief of having stats jobs run too long, but it requires intervention when a query needs to be tuned. Also, with my custom scripts I get a lot of hassles from people wanting to know why I'm not using the Oracle default settings so that is a negative. But I've seen the default settings, even on 11.2, run much too long on large tables. Typically our systems start small and then data grows and grows until things start breaking so the auto stats may seem fine at first, but eventually they will start running too long. Also, when the stats jobs run forever or with the automatic jobs don't finish in the window I get complaints about the stats taking too long or not being updated. So, either direction has its pros and cons as far as I can tell.
So, my question, fueled by a post work day Sierra Nevada Pale Ale, is whether anyone else has this same dillema and how you have resolved it. I appreciate any input on this subject.
- Bobby

Hi Bobby,
> Also, when the stats jobs run forever or with the automatic jobs don't finish in the window I get complaints about the stats taking too long or not being updated
Complaints about "not updated" statistics are very common, but they don't need to be "up-to-date" ... they just need to be representative ... a very important difference
> I think it comes down to estimate percentage and histograms. If you can't get the percentage and histograms you want to complete in time you have to dial them back but then the stats aren't as good.
Be aware of how histograms are gathered under-the-hood for each column. There are some nice improvements with AUTO_SAMPLE_SIZE (Blog: How does AUTO_SAMPLE...). However if you want to know more about the NDV improvements or the synopses - please check out the paper of Amit Poddar One Pass Distinct Sampling.
Personally i usually start with "FOR ALL COLUMNS SIZE 1" and AUTO_SAMPLE_SIZE (from the scratch) and change only the table preferences that need to be adjusted (like histograms due to data skew, etc.). The other basic stuff works pretty well in most cases and so i usually don't need to put much effort in it (you don't need to fix something, that is not broke). There also issues with aggressive sampling and CBO behavior changes (like fix_control 5483301:off/on), but this needs to be cross-checked from case to case of course.
Regards
Stefan

Selectivity for non-pupular value in Height based Histograms.

Hi,
I wanted to check how optimizer calculates the cardinality/selectivity for a value which is not popular and histogram is height based histograms.
Following is the small test case (Version is 11.2.0.1) platform hpux
create table t1 (
       skew    not null,
       padding
as
/* with generator as (
select --+ materialize
       rownum id
from all_objects
where rownum <= 5000
select /*+ ordered use_nl(v2) */
     v1.id,
     rpad('x',400)
from
    generator v1,
    generator v2
where
   v1.id <= 80
and
   v2.id <= 80
and
   v2.id <= v1.id
;Following is the table stats:
SQL> select count(*) from t1;
COUNT(*)
      3240
SQL> exec dbms_stats.gather_table_stats('SYS','T1',cascade=>TRUE, estimate_percent => null, method_opt => 'for all columns size 75');
PL/SQL procedure successfully completed.
SQL> select column_name,num_distinct,density,num_buckets from dba_tab_columns where table_name='T1';
COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS
SKEW                                     80 .013973812          75
PADDING                                   1 .000154321           1
SQL> select endpoint_number, endpoint_value from dba_tab_histograms where column_name='SKEW' and table_name='T1' order by endpoint_number;
ENDPOINT_NUMBER ENDPOINT_VALUE
              0              1
              1              9
              2             13
              3             16
              4             19
              5             21
              6             23
              7             25
              8             26
              9             28
             10             29
ENDPOINT_NUMBER ENDPOINT_VALUE
             11             31
             12             32
             13             33
             14             35
             15             36
             16             37
             17             38
             18             39
             19             40
             20             41
             21             42
ENDPOINT_NUMBER ENDPOINT_VALUE
             22             43
             23             44
             24             45
             25             46
             26             47
             27             48
             28             49
             29             50
             30             51
             32             52
             33             53
ENDPOINT_NUMBER ENDPOINT_VALUE
             34             54
             35             55
             37             56
             38             57
             39             58
             41             59
             42             60
             43             61
             45             62
             46             63
             48             64
ENDPOINT_NUMBER ENDPOINT_VALUE
             49             65
             51             66
             52             67
             54             68
             56             69
             57             70
             59             71
             60             72
             62             73
             64             74
             66             75
ENDPOINT_NUMBER ENDPOINT_VALUE
             67             76
             69             77
             71             78
             73             79
             75             80
60 rows selected.Checking the selectivity for value 75(which is the popular value as per information from dba_tab_histograms
SQL> set autotrace on
SQL> select count(*) from t1 where skew=75;
COUNT(*)
        75
Execution Plan
Plan hash value: 4273422929
| Id | Operation         | Name | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT |       |     1 |     3 |     1   (0)| 00:00:01 |
|   1 | SORT AGGREGATE   |       |     1 |     3 |            |          |
|* 2 |   INDEX RANGE SCAN| T1_I1 |    86 |   258 |     1   (0)| 00:00:01 |
Predicate Information (identified by operation id):
   2 - access("SKEW"=75)Skipped the Statistics information for keep example short.
selectivity for 75 (popular value) = 2/75 = 0.02666
Cardinality for 75 is = selectivity * num_rows = 0.02666*3240 = 86.3784 (rounded to 86) >> Here selectivity and cardinality are correct and displayed in autotrace.
SQL> select count(*) from t1 where skew=8;
COUNT(*)
         8
Execution Plan
Plan hash value: 4273422929
| Id | Operation         | Name | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT |       |     1 |     3 |     1   (0)| 00:00:01 |
|   1 | SORT AGGREGATE   |       |     1 |     3 |            |          |
|* 2 |   INDEX RANGE SCAN| T1_I1 |    29 |    87 |     1   (0)| 00:00:01 |
Predicate Information (identified by operation id):
   2 - access("SKEW"=8)how the cardinality is 29 calculated. I think the formula for selectivity is
select for 1(non popular value) = density * num_rows = .013973812 * num_rows (which is 45 approx) but in autotrace its 29
SQL> select count(*) from t1 where skew = 46;
COUNT(*)
        46
Execution Plan
Plan hash value: 4273422929
| Id | Operation         | Name | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT |       |     1 |     3 |     1   (0)| 00:00:01 |
|   1 | SORT AGGREGATE   |       |     1 |     3 |            |          |
|* 2 |   INDEX RANGE SCAN| T1_I1 |    29 |    87 |     1   (0)| 00:00:01 |
Predicate Information (identified by operation id):
   2 - access("SKEW"=46)46 is also non popular value
So how the value is calculated for these values?

Your example seems to be based on Jonathan Lewis's article:
http://jonathanlewis.wordpress.com/2012/01/03/newdensity/
In this article, he does walk through the calculation of selectivity for non-popular values.
The calculation is not density but NewDensity, as seen in a 10053 trace, which takes into account the number of non-popular values AND the number of non-popular buckets
The article describes exactly how 29 is derived.Hi Dom,
Yes i used the same sample script of create the data sets. I should have checked Jonathan's blog for new density calculations. So selectivity works out as two either ways
1) selectivity(non popular) = Newdensity(took from 10053 traces) * rum_rows
or
2) non-popular rows/ non_popular values (where non-popular values can be derived from 10053 traces and non popular rows are (3240 * (74-31)/74 = ) 1883
Thanks for pointing to right blog

Use of Index, Histograms, etc

Hi all,
We're using Oracle 9.2.04.
I have a table with 500000 rows.
So I have a query that returns only 30242 for a month, like:
SELECT * FROM T1
WHERE TO_CHAR(DT, 'MM/YYYY') = TO_CHAR(ADD_MONTHS(SYSDATE, -1), 'MM/YYYY')
I have a index for this column:
CREATE INDEX IND_T1_DT_FMT ON T1 (TO_CHAR(DT, 'MM/YYYY'))
TABLESPACE TBS_SOME_USER;
There are statistics for this table.
Looking the table data, I have the following distribution:
Qty     MON/YY %
1         Feb-09     0.000219142
99         Apr-09     0.021695016
38439     May-09     8.42358314
98231     Jun-09     21.52649641
1         Jul-06     0.000219142
139959     Jul-09     30.6708362
1         Aug-02     0.000219142
1         Aug-07     0.000219142
141362     Aug-09     30.97829184
30242     Sep-09      6.62727962
7990              1.750941213But when a perform the query (that returns 30242 rows - 6.63% of table):
SELECT * FROM T1
WHERE TO_CHAR(DT, 'MM/YYYY') = TO_CHAR(ADD_MONTHS(SYSDATE, -1), 'MM/YYYY')
Oracle uses FTS:
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=432 Card=45633 Bytes
=3011778)
1 0 TABLE ACCESS (FULL) OF 'T1' (Cost=432 Card=45633 Bytes
=3011778)
So, Oracle should not use the index in this case?
Is there any way to gather statistics for this table with a index based function?
Something like this:
EXECUTE DBMS_STATS.GATHER_TABLE_STATS(ownname => 'U1',
tabname => 'T1', method_opt => 'FOR COLUMNS TO_CHAR(DT, ''MM/YYYY'')',
cascade => true, degree => 4);
How can I create histograms for this case?
Or other solution, like Partition?
thank you very much!!!!

Always treat dates like dates.
This
SELECT * FROM T1
WHERE TO_CHAR(DT, 'MM/YYYY') = TO_CHAR(ADD_MONTHS(SYSDATE, -1), 'MM/YYYY')Should be more like this:
SELECT * FROM T1
WHERE DT BETWEEN TRUNC(ADD_MONTHS(SYSDATE,-1),'MM') AND TRUNC(SYSDATE,'MM')-1 ;Then you should index DT.
But, should this query use the index?
Touch and go at 6.63%.
Give it a go using dates as dates and see if it makes a difference.
Is there a problem with the performance of the FTS?

Performance issue with extreme data distribution using histogram

Hi
We have a performance stability issue which we later found out is cause by the bind variable and histogram for a particular column when it was use as part of equality predicate. Assume the column name is parent_id0.
There is also an index on parent_id0.
Our temporary workaround is to install the good plan when it is started.
This is on database 10.2.0.3. I have a table with 2570149 rows, there is one common value(value 0) that represent about 99.91% of the total rows.
When i do
select parent_id0, count(*)
from table1
group by parent_id0
order by parent_id0;
I'm getting 187 rows and i would assume to have 187 buckets to have a better representation. The first row have the count nearly to 99.91%. The rest rows are having count something like 1 or 2 or less than 200.
With the auto gather, Oracle came up with 5 bucket. When i check the sample size, i only see oracle uses 2.215% of all the total rows at that time.
Column name Endpoint num Endpoint value
PARENT_ID0     5,579     0
PARENT_ID0     5,582     153,486,811
PARENT_ID0     5,583     156,240,279
PARENT_ID0     5,584     163,081,173
PARENT_ID0     5,585     168,255,656
Is the problem due to the wrong sample size and hence the histogram is miscalculated.
When i trace the sql with 10053, i see something like this..seems like some value is not capture in the histogram.
Using prorated density: 3.9124e-07 of col #2 as selectivity of out-of-range value pred
What i need to do to have a correct and stable execution plan?
Thank you

Hi, its an OLTP environment.
The problem is this sql has 4 tables to join, table1 (2.5 mil rows), table2 (4 mil), table3 (4.6 mil) and table4 (20 mil)
By right, the table with the highest filter ratio is table1. However, from the plan, oracle is using table3 as the driving table. The moment i take away the parent_id0 as part of the predicate, Oracle choose the right driving table (table1).
Here is the sql structure
select ...
from table1, table2, table3, table4
where table1.id = :1 and table1.parent_id0 :=2
and ...
We have index on id column too.
From the application, the application will never pass in value 0 for the parent_id0. Therefore, we will be querying 0.09 percent all the time from that particular query.
p/s: i'm sorry that i'm not able to paste the exact sql text here

Gather_table_stats with a method opt of "for all indexed columns size 0"

I have 9 databases I support that contain the same structure, and very similar data concentrations. We are seeing inconsistent performance in the different databases due to bind variable peeking.. I have tracked it down to the Min and Max values that are gathered during the analyze. I analyze from one cluster, and export/import those statistics into the other clusters.. I then go about locking down the stats gathered. Some of the statistics are on tables that contain transient data (the older data is purged, and new data gets a new PK sequence number).
Since I am gathering statistics with a 'FOR ALL INDEXED COLUMNS SIZE 1', a min and max value are grabbed. This value is only appropriate for a short period of time, and only for a specific database. I do want oracle to know the density to help calculate, but I don't want cardinality based on whether the current bind values fall in this range..
Example
COLUMN PK
When I analyze the min is 1 and max is 5. I then let the database run, and the new min is 100 and max is 105.. same number of rows, but different min/max. At first a select * from table where pk>=1 and pk <=5 would return a cardinality of 5.. Later, a seelct * from tables where pk>=100 and pk<=105 would return a cardinaility 1.
Any ideas how to avoid this other than trying set min and max to something myself (like min =1 max = 99999999). ??

MarkDPowell wrote:
The Oracle documentation on bind variable peeking said it did not peek without histograms and I cannot remember ever seeing on 9.2 where the trace showed otherwise. Mark,
see this simple test case run on 9.2.0.8. No histograms, but bind variable peeking, as you can see that the EXPLAIN PLAN output generated by AUTOTRACE differs from the estimated cardinality of the actual plan used at runtime.
Which documentation do you refer to?
SQL>
SQL> alter session set nls_language = 'AMERICAN';
Session altered.
SQL>
SQL> drop table bind_peek_test;
Table dropped.
SQL>
SQL> create table bind_peek_test
2 as
3 select
4            100 as n1
5          , cast(dbms_random.string('a', 20) as varchar2(20)) as filler
6 from
7            dual
8 connect by
9            level <= 1000;
Table created.
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'bind_peek_test', method_opt=>'FOR ALL COLUMNS SIZE 1')
PL/SQL procedure successfully completed.
SQL>
SQL> variable n number
SQL>
SQL> variable n2 number
SQL>
SQL> alter system flush shared_pool;
System altered.
SQL>
SQL> exec :n := 1; :n2 := 50;
PL/SQL procedure successfully completed.
SQL>
SQL> set autotrace traceonly
SQL>
SQL> select * from bind_peek_test where n1 >= :n and n1 <= :n2;
no rows selected
Execution Plan
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1000 Bytes=24
          000)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIND_PEEK_TEST' (Cost=2 Card=100
          0 Bytes=24000)
Statistics
        236 recursive calls
          0 db block gets
         35 consistent gets
          0 physical reads
          0 redo size
        299 bytes sent via SQL*Net to client
        372 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          4 sorts (memory)
          0 sorts (disk)
          0 rows processed
SQL>
SQL> set autotrace off
SQL>
SQL> select
2            cardinality
3 from
4            v$sql_plan
5 where
6            cardinality is not null
7 and      hash_value in (
8    select
9           hash_value
10    from
11           v$sql
12    where
13           sql_text like 'select * from bind_peek_test%'
14    );
CARDINALITY
          1
SQL>
SQL> alter system flush shared_pool;
System altered.
SQL>
SQL> exec :n := 100; :n2 := 100;
PL/SQL procedure successfully completed.
SQL>
SQL> set autotrace traceonly
SQL>
SQL> select * from bind_peek_test where n1 >= :n and n1 <= :n2;
1000 rows selected.
Execution Plan
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1000 Bytes=24
          000)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIND_PEEK_TEST' (Cost=2 Card=100
          0 Bytes=24000)
Statistics
        236 recursive calls
          0 db block gets
        102 consistent gets
          0 physical reads
          0 redo size
      34435 bytes sent via SQL*Net to client
       1109 bytes received via SQL*Net from client
         68 SQL*Net roundtrips to/from client
          4 sorts (memory)
          0 sorts (disk)
       1000 rows processed
SQL>
SQL> set autotrace off
SQL>
SQL> select
2            cardinality
3 from
4            v$sql_plan
5 where
6            cardinality is not null
7 and      hash_value = (
8    select
9           hash_value
10    from
11           v$sql
12    where
13           sql_text like 'select * from bind_peek_test%'
14    );
CARDINALITY
       1000
SQL>
SQL> spool offRegards,
Randolf
Oracle related stuff blog:
http://oracle-randolf.blogspot.com/
SQLTools++ for Oracle (Open source Oracle GUI for Windows):
http://www.sqltools-plusplus.org:7676/
http://sourceforge.net/projects/sqlt-pp/

Gather_table_stats - Histograms / Buckets

Similar Messages

Maybe you are looking for