HASH JOIN Probe Residual

Hello!
I'm having a query, that joins two large tables. Optimizer chooses HASH MATCH operator (that's ok)
But when I inspect the operator in the query plan, there's Probe Residual part that copmpares THE SAME COLUMS I join for.
Is it a right situation, that I always have this Probe Residual on compairing the same columns as join?
Hash Match operator takes 63% of the query and I want to optimize it.
The join fields are of the SAME type, I know that's importatnt.
Thank You!

Since I don't know too much of this stuff by heart, I googled on Probe Residual, and this blog post from SQL Server MVP Rob Farley could be a start:
http://sqlblog.com/blogs/rob_farley/archive/2011/03/22/probe-residual-when-you-have-a-hash-match-a-hidden-cost-in-execution-plans.aspx
One thing I do know: the percentages you see in the plan are estimates, even if you look at the actual plan. The certainly needs to be take with a grain of salt, as your really bottleneck may be elsewhere.
Erland Sommarskog, SQL Server MVP, [email protected]

Similar Messages

Parallel Hash Join always swapping to TEMP

Hi,
I've experienced some strange behaviour on Oracle 9.2.0.5 recently: simple query hash joining two tables - smaller with 16k records/1 mb of size and bigger with 2.5m records/1.5 gb of size is swapping to TEMP when launched in parallel mode (4 set of PQ slaves). What is strange serial execution is running as expected - in-memory Hash Join occurs. It's worth to add that both parallel and serial execution properly selects smaller table as inner one but parallel query always decides to buffer the source data (no matter how big is it).
To be more precise - all table stats are gathered, I have enough PGA memory assigned to queries (WORKAREA_POLICY_SIZE=AUTO, PGA_AGGREGATE_TARGET=6GB) and I properly analyze the results. Even hidden parameter SMMPX_MAX_SIZE is properly set to about 2GB, the issue is that parallel execution still decides to swap (even if the inner data size for each slave is about 220kb!).
I dig into the traces (10104 event) and found some substantial difference between serial and parallel execution. It looks like some internal flag orders PQ slaves to always buffer the data, here is what I found in PQ slave trace:
HASH JOIN STATISTICS (INITIALIZATION)
Original memory: 4428800
Memory after all overhead: 4283220
Memory for slots: 3809280
Calculated overhead for partitions and row/slot managers: 473940
Hash-join fanout: 8
Number of partitions: 9
Number of slots: 15
Multiblock IO: 31
Block size(KB): 8
Cluster (slot) size(KB): 248
Hash-join fanout (manual): 8
Cluster/slot size(KB) (manual): 280
Minimum number of bytes per block: 8160
Bit vector memory allocation(KB): 128
Per partition bit vector length(KB): 16
Maximum possible row length: 1455
Estimated build size (KB): 645
Estimated Row Length (includes overhead): 167
Immutable Flags:
BUFFER the output of the join for Parallel Query
kxhfSetPhase: phase=BUILD
kxhfAddChunk: add chunk 0 (sz=32) to slot table
kxhfAddChunk: chunk 0 (lbs=800003ff640ebb50, slotTab=800003ff640ebce8) successfuly added
kxhfSetPhase: phase=PROBE_1
Bolded is the part that is not present in serial mode. Unfortunatelly I cannot find anything that could help identifying the reason or setting that drives this behaviour :(
Best regards
Bazyli
Edited by: user10419027 on Oct 13, 2008 3:53 AM

Jonathan,
Distribution seems to be as expected (HASH/HASH), please have a look on the query plan:
PLAN_TABLE_OUTPUT
| Id | Operation            | Name                         | Rows | Bytes | Cost | TQ    |IN-OUT| PQ Distrib |
|   0 | SELECT STATEMENT     |                               |   456K|    95M|   876 |        |      |            |
|* 1 | HASH JOIN           |                               |   456K|    95M|   876 | 43,02 | P->S | QC (RAND) |
|   2 |   TABLE ACCESS FULL | SH30_8700195_9032_0_TMP_TEST | 16555 |   468K|    16 | 43,00 | P->P | HASH       |
|   3 |   TABLE ACCESS FULL | SH30_8700195_9031_0_TMP_TEST | 2778K|   503M|   860 | 43,01 | P->P | HASH       |
Predicate Information (identified by operation id):
   1 - access(NVL("A"."PROD_ID",'NULL!')=NVL("B"."PROD_ID",'NULL!') AND
              NVL("A"."PROD_UNIT_OF_MEASR_ID",'NULL!')=NVL("B"."PROD_UNIT_OF_MEASR_ID",'NULL!'))Let me also share with you trace files from parallel and serial execution.
First, serial execution (only 10104 event details):
Dump file /opt/oracle/admin/cdwep4/udump/cdwep401_ora_18729.trc
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2.0.5
System name:     HP-UX
Node name:     ethp1018
Release:     B.11.11
Version:     U
Machine:     9000/800
Instance name: cdwep401
Redo thread mounted by this instance: 1
Oracle process number: 100
Unix process pid: 18729, image: oracle@ethp1018 (TNS V1-V3)
kxhfInit(): enter
kxhfInit(): exit
*** HASH JOIN STATISTICS (INITIALIZATION) ***
Original memory: 4341760
Memory after all overhead: 4163446
Memory for slots: 3301376
Calculated overhead for partitions and row/slot managers: 862070
Hash-join fanout: 8
Number of partitions: 8
Number of slots: 13
Multiblock IO: 31
Block size(KB): 8
Cluster (slot) size(KB): 248
Hash-join fanout (manual): 8
Cluster/slot size(KB) (manual): 240
Minimum number of bytes per block: 8160
Bit vector memory allocation(KB): 128
Per partition bit vector length(KB): 16
Maximum possible row length: 1455
Estimated build size (KB): 1083
Estimated Row Length (includes overhead): 67
# Immutable Flags:
kxhfSetPhase: phase=BUILD
kxhfAddChunk: add chunk 0 (sz=32) to slot table
kxhfAddChunk: chunk 0 (lbs=800003ff6c063b20, slotTab=800003ff6c063cb8) successfuly added
kxhfSetPhase: phase=PROBE_1
qerhjFetch: max build row length (mbl=110)
*** END OF HASH JOIN BUILD (PHASE 1) ***
Revised row length: 68
Revised row count: 16555
Revised build size: 1089KB
kxhfResize(enter): resize to 12 slots (numAlloc=8, max=13)
kxhfResize(exit): resized to 12 slots (numAlloc=8, max=12)
Slot table resized: old=13 wanted=12 got=12 unload=0
*** HASH JOIN BUILD HASH TABLE (PHASE 1) ***
Total number of partitions: 8
Number of partitions which could fit in memory: 8
Number of partitions left in memory: 8
Total number of slots in in-memory partitions: 8
Total number of rows in in-memory partitions: 16555
   (used as preliminary number of buckets in hash table)
Estimated max # of build rows that can fit in avail memory: 55800
### Partition Distribution ###
Partition:0    rows:2131       clusters:1      slots:1      kept=1
Partition:1    rows:1975       clusters:1      slots:1      kept=1
Partition:2    rows:1969       clusters:1      slots:1      kept=1
Partition:3    rows:2174       clusters:1      slots:1      kept=1
Partition:4    rows:2041       clusters:1      slots:1      kept=1
Partition:5    rows:2092       clusters:1      slots:1      kept=1
Partition:6    rows:2048       clusters:1      slots:1      kept=1
Partition:7    rows:2125       clusters:1      slots:1      kept=1
*** (continued) HASH JOIN BUILD HASH TABLE (PHASE 1) ***
Revised number of hash buckets (after flushing): 16555
Allocating new hash table.
*** (continued) HASH JOIN BUILD HASH TABLE (PHASE 1) ***
Requested size of hash table: 4096
Actual size of hash table: 4096
Number of buckets: 32768
kxhfResize(enter): resize to 14 slots (numAlloc=8, max=12)
kxhfResize(exit): resized to 14 slots (numAlloc=8, max=14)
freeze work area size to: 4357K (14 slots)
*** (continued) HASH JOIN BUILD HASH TABLE (PHASE 1) ***
Total number of rows (may have changed): 16555
Number of in-memory partitions (may have changed): 8
Final number of hash buckets: 32768
Size (in bytes) of hash table: 262144
kxhfIterate(end_iterate): numAlloc=8, maxSlots=14
*** (continued) HASH JOIN BUILD HASH TABLE (PHASE 1) ***
### Hash table ###
# NOTE: The calculated number of rows in non-empty buckets may be smaller
#       than the true number.
Number of buckets with   0 rows:      21129
Number of buckets with   1 rows:       8755
Number of buckets with   2 rows:       2024
Number of buckets with   3 rows:        433
Number of buckets with   4 rows:        160
Number of buckets with   5 rows:         85
Number of buckets with   6 rows:         69
Number of buckets with   7 rows:         41
Number of buckets with   8 rows:         32
Number of buckets with   9 rows:         18
Number of buckets with between 10 and 19 rows:         21
Number of buckets with between 20 and 29 rows:          1
Number of buckets with between 30 and 39 rows:          0
Number of buckets with between 40 and 49 rows:          0
Number of buckets with between 50 and 59 rows:          0
Number of buckets with between 60 and 69 rows:          0
Number of buckets with between 70 and 79 rows:          0
Number of buckets with between 80 and 89 rows:          0
Number of buckets with between 90 and 99 rows:          0
Number of buckets with 100 or more rows:          0
### Hash table overall statistics ###
Total buckets: 32768 Empty buckets: 21129 Non-empty buckets: 11639
Total number of rows: 16555
Maximum number of rows in a bucket: 24
Average number of rows in non-empty buckets: 1.422373
=====================
.... (lots of fetching) ....
qerhjFetch: max probe row length (mpl=0)
qerhjFreeSpace(): free hash-join memory
kxhfRemoveChunk: remove chunk 0 from slot tableAnd finally, PQ slave output (only one trace, please note Immutable Flag that I believe orders Oracle to buffer to TEMP):
Dump file /opt/oracle/admin/cdwep4/bdump/cdwep401_p002_4640.trc
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2.0.5
System name:     HP-UX
Node name:     ethp1018
Release:     B.11.11
Version:     U
Machine:     9000/800
Instance name: cdwep401
Redo thread mounted by this instance: 1
Oracle process number: 86
Unix process pid: 4640, image: oracle@ethp1018 (P002)
kxhfInit(): enter
kxhfInit(): exit
*** HASH JOIN STATISTICS (INITIALIZATION) ***
Original memory: 4428800
Memory after all overhead: 4283220
Memory for slots: 3809280
Calculated overhead for partitions and row/slot managers: 473940
Hash-join fanout: 8
Number of partitions: 9
Number of slots: 15
Multiblock IO: 31
Block size(KB): 8
Cluster (slot) size(KB): 248
Hash-join fanout (manual): 8
Cluster/slot size(KB) (manual): 280
Minimum number of bytes per block: 8160
Bit vector memory allocation(KB): 128
Per partition bit vector length(KB): 16
Maximum possible row length: 1455
Estimated build size (KB): 645
Estimated Row Length (includes overhead): 167
# Immutable Flags:
BUFFER the output of the join for Parallel Query
kxhfSetPhase: phase=BUILD
kxhfAddChunk: add chunk 0 (sz=32) to slot table
kxhfAddChunk: chunk 0 (lbs=800003ff640ebb50, slotTab=800003ff640ebce8) successfuly added
kxhfSetPhase: phase=PROBE_1
qerhjFetch: max build row length (mbl=96)
*** END OF HASH JOIN BUILD (PHASE 1) ***
Revised row length: 54
Revised row count: 4203
Revised build size: 221KB
kxhfResize(enter): resize to 16 slots (numAlloc=8, max=15)
kxhfResize(exit): resized to 16 slots (numAlloc=8, max=16)
Slot table resized: old=15 wanted=16 got=16 unload=0
*** HASH JOIN BUILD HASH TABLE (PHASE 1) ***
Total number of partitions: 8
Number of partitions which could fit in memory: 8
Number of partitions left in memory: 8
Total number of slots in in-memory partitions: 8
Total number of rows in in-memory partitions: 4203
   (used as preliminary number of buckets in hash table)
Estimated max # of build rows that can fit in avail memory: 85312
### Partition Distribution ###
Partition:0    rows:537        clusters:1      slots:1      kept=1
Partition:1    rows:554        clusters:1      slots:1      kept=1
Partition:2    rows:497        clusters:1      slots:1      kept=1
Partition:3    rows:513        clusters:1      slots:1      kept=1
Partition:4    rows:498        clusters:1      slots:1      kept=1
Partition:5    rows:543        clusters:1      slots:1      kept=1
Partition:6    rows:547        clusters:1      slots:1      kept=1
Partition:7    rows:514        clusters:1      slots:1      kept=1
*** (continued) HASH JOIN BUILD HASH TABLE (PHASE 1) ***
Revised number of hash buckets (after flushing): 4203
Allocating new hash table.
*** (continued) HASH JOIN BUILD HASH TABLE (PHASE 1) ***
Requested size of hash table: 1024
Actual size of hash table: 1024
Number of buckets: 8192
kxhfResize(enter): resize to 18 slots (numAlloc=8, max=16)
kxhfResize(exit): resized to 18 slots (numAlloc=8, max=18)
freeze work area size to: 5812K (18 slots)
*** (continued) HASH JOIN BUILD HASH TABLE (PHASE 1) ***
Total number of rows (may have changed): 4203
Number of in-memory partitions (may have changed): 8
Final number of hash buckets: 8192
Size (in bytes) of hash table: 65536
kxhfIterate(end_iterate): numAlloc=8, maxSlots=18
*** (continued) HASH JOIN BUILD HASH TABLE (PHASE 1) ***
### Hash table ###
# NOTE: The calculated number of rows in non-empty buckets may be smaller
#       than the true number.
Number of buckets with   0 rows:       5284
Number of buckets with   1 rows:       2177
Number of buckets with   2 rows:        510
Number of buckets with   3 rows:        104
Number of buckets with   4 rows:         51
Number of buckets with   5 rows:         14
Number of buckets with   6 rows:         14
Number of buckets with   7 rows:         13
Number of buckets with   8 rows:         12
Number of buckets with   9 rows:          4
Number of buckets with between 10 and 19 rows:          9
Number of buckets with between 20 and 29 rows:          0
Number of buckets with between 30 and 39 rows:          0
Number of buckets with between 40 and 49 rows:          0
Number of buckets with between 50 and 59 rows:          0
Number of buckets with between 60 and 69 rows:          0
Number of buckets with between 70 and 79 rows:          0
Number of buckets with between 80 and 89 rows:          0
Number of buckets with between 90 and 99 rows:          0
Number of buckets with 100 or more rows:          0
### Hash table overall statistics ###
Total buckets: 8192 Empty buckets: 5284 Non-empty buckets: 2908
Total number of rows: 4203
Maximum number of rows in a bucket: 16
Average number of rows in non-empty buckets: 1.445323
kxhfWrite: hash-join is spilling to disk
kxhfWrite: Writing dba=950281 slot=8 part=8
kxhfWrite: Writing dba=950312 slot=9 part=8
kxhfWrite: Writing dba=950343 slot=10 part=8
kxhfWrite: Writing dba=950374 slot=11 part=8
.... (lots of writing) ....
kxhfRead(): Reading dba=950281 into slot=15
kxhfIsDone: waiting slot=15 lbs=800003ff640ebb50
kxhfRead(): Reading dba=950312 into slot=16
kxhfIsDone: waiting slot=16 lbs=800003ff640ebb50
kxhfRead(): Reading dba=950343 into slot=17
kxhfFreeSlots(800003ff7c068918): all=0 alloc=18 max=18
EmptySlots:15 8 9 10 11 12 13
PendingSlots:
kxhfIsDone: waiting slot=17 lbs=800003ff640ebb50
kxhfRead(): Reading dba=950374 into slot=15
kxhfFreeSlots(800003ff7c068918): all=0 alloc=18 max=18
EmptySlots:16 8 9 10 11 12 13
PendingSlots:
.... (lots of reading) ....
qerhjFetchPhase2(): building a hash table
kxhfFreeSlots(800003ff7c068980): all=1 alloc=18 max=18
EmptySlots:2 4 6 1 0 7 5 3 14 17 16 15 8 9 10 11 12 13
PendingSlots:
qerhjFreeSpace(): free hash-join memory
kxhfRemoveChunk: remove chunk 0 from slot tableWhy do you think it's surprising that Oracle utilizes TEMP? Basing on traces Oracle seems to be very sure it should spill to disk. I believe the key to answer is this immutable flag printing "BUFFER the output of the join for Parallel Query" - as I mentioned in one of previous posts it's opposite to "Not BUFFER(execution) output of the join for PQ" which appears in some traces found on internet.
Best regards
Bazyli

Query Degradation--Hash Join Degraded

Hi All,
I found one query degradation issue.I am on 10.2.0.3.0 (Sun OS) with optimizer_mode=ALL_ROWS.
This is a dataware house db.
All 3 tables involved are parition tables (with daily partitions).Partitions are created in advance and ELT jobs loads bulk data into daily partitions.
I have checked that CBO is not using local indexes-created on them which i believe,is appropriate because when i used INDEX HINT, elapsed time increses.
I checked giving index hint for all tables one by one but dint get any performance improvement.
Partitions are daily loaded and after loading,partition-level stats are gathered with dbms_stats.
We are collecting stats at partition level(granularity=>'PARTITION').Even after collecting global stats,there is no change in access pattern.Stats gather command is given below.
PROCEDURE gather_table_part_stats(i_owner_name,i_table_name,i_part_name,i_estimate:= DBMS_STATS.AUTO_SAMPLE_SIZE, i_invalidate IN VARCHAR2 := 'Y',i_debug:= 'N')
Only SOT_KEYMAP.IPK_SOT_KEYMAP is GLOBAL.Rest all indexes are LOCAL.
Earlier,we were having BIND PEEKING issue,which i fixed but introducing NO_INVALIDATE=>FALSE in stats gather job.
Here,Partition_name (20090219) is being passed through bind variables.
SELECT a.sotrelstg_sot_ud sotcrct_sot_ud,
b.sotkey_ud sotcrct_orig_sot_ud, a.ROWID stage_rowid
FROM (SELECT sotrelstg_sot_ud, sotrelstg_sys_ud,
sotrelstg_orig_sys_ord_id, sotrelstg_orig_sys_ord_vseq
FROM sot_rel_stage
WHERE sotrelstg_trd_date_ymd_part = '20090219'
AND sotrelstg_crct_proc_stat_cd = 'N'
AND sotrelstg_sot_ud NOT IN(
SELECT sotcrct_sot_ud
FROM sot_correct
WHERE sotcrct_trd_date_ymd_part ='20090219')) a,
(SELECT MAX(sotkey_ud) sotkey_ud, sotkey_sys_ud,
sotkey_sys_ord_id, sotkey_sys_ord_vseq,
sotkey_trd_date_ymd_part
FROM sot_keymap
WHERE sotkey_trd_date_ymd_part = '20090219'
AND sotkey_iud_cd = 'I'
--not to select logical deleted rows
GROUP BY sotkey_trd_date_ymd_part,
sotkey_sys_ud,
sotkey_sys_ord_id,
sotkey_sys_ord_vseq) b
WHERE a.sotrelstg_sys_ud = b.sotkey_sys_ud
AND a.sotrelstg_orig_sys_ord_id = b.sotkey_sys_ord_id
AND NVL(a.sotrelstg_orig_sys_ord_vseq, 1) = NVL(b.sotkey_sys_ord_vseq, 1);
During normal business hr, i found that query takes 5-7 min(which is also not acceptable), but during high load business hr,it is taking 30-50 min.
I found that most of the time it is spending on HASH JOIN (direct path write temp).We have sufficient RAM (64 GB total/41 GB available).
Below is the execution plan i got during normal business hr.
| Id | Operation                 | Name                | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
|   1 | HASH GROUP BY            |                     |      1 |      1 |   7844K|00:05:28.78 |      16M|    217K| 35969 |       |       |          |         |
|* 2 |   HASH JOIN               |                     |      1 |      1 |   9977K|00:04:34.02 |      16M|    202K| 20779 |   580M|    10M| 563M (1)|     650K|
|   3 |    NESTED LOOPS ANTI      |                     |      1 |      6 |   7855K|00:01:26.41 |      16M|   1149 |      0 |       |       |          |         |
|   4 |     PARTITION RANGE SINGLE|                     |      1 |    258K|   8183K|00:00:16.37 |   25576 |   1149 |      0 |       |       |          |         |
|* 5 |      TABLE ACCESS FULL    | SOT_REL_STAGE       |      1 |    258K|   8183K|00:00:16.37 |   25576 |   1149 |      0 |       |       |          |         |
|   6 |     PARTITION RANGE SINGLE|                     |   8183K|    326K|    327K|00:01:10.53 |      16M|      0 |      0 |       |       |          |         |
|* 7 |      INDEX RANGE SCAN     | IDXL_SOTCRCT_SOT_UD |   8183K|    326K|    327K|00:00:53.37 |      16M|      0 |      0 |       |       |          |         |
|   8 |    PARTITION RANGE SINGLE |                     |      1 |    846K|     14M|00:02:06.36 |     289K|    180K|      0 |       |       |          |         |
|* 9 |     TABLE ACCESS FULL     | SOT_KEYMAP          |      1 |    846K|     14M|00:01:52.32 |     289K|    180K|      0 |       |       |          |         |
I will attached the same for high load business hr once query gives results.It is still executing for last 50 mins.
INDEX STATS (INDEXES ARE LOCAL INDEXES)
TABLE_NAME                          INDEX_NAME                          COLUMN_NAME        COLUMN_POSITION   NUM_ROWS DISTINCT_KEYS CLUSTERING_FACTOR
SOT_REL_STAGE                       IDXL_SOTRELSTG_SOT_UD               SOTRELSTG_SOT_UD                 1   25461560      25461560            184180
SOT_REL_STAGE                                                           SOTRELSTG_TRD_DATE               2   25461560      25461560            184180
                                                                        _YMD_PART
TABLE_NAME                          INDEX_NAME                          COLUMN_NAME        COLUMN_POSITION   NUM_ROWS DISTINCT_KEYS CLUSTERING_FACTOR
SOT_KEYMAP                          IDXL_SOTKEY_ENTORDSYS_UD            SOTKEY_ENTRY_ORD_S               1 1012306940             3          38308680
                                                                        YS_UD
SOT_KEYMAP                          IDXL_SOTKEY_HASH                    SOTKEY_HASH                      1 1049582320    1049582320        1049579520
SOT_KEYMAP                                                              SOTKEY_TRD_DATE_YM               2 1049582320    1049582320        1049579520
                                                                        D_PART
SOT_KEYMAP                          IDXL_SOTKEY_SOM_ORD                 SOTKEY_SOM_UD                    1 1023998560     268949136         559414840
SOT_KEYMAP                                                              SOTKEY_SYS_ORD_ID                2 1023998560     268949136         559414840
SOT_KEYMAP                          IPK_SOT_KEYMAP                      SOTKEY_UD                        1 1030369480    1015378900          24226580
TABLE_NAME                          INDEX_NAME                          COLUMN_NAME        COLUMN_POSITION   NUM_ROWS DISTINCT_KEYS CLUSTERING_FACTOR
SOT_CORRECT                         IDXL_SOTCRCT_SOT_UD                 SOTCRCT_SOT_UD                   1 412484756     412484756         411710982
SOT_CORRECT                                                             SOTCRCT_TRD_DATE_Y               2 412484756     412484756         411710982
                                                                        MD_PART
INDEX partiton stas (from dba_ind_partitions)
INDEX_NAME                     PARTITION_NAME       STATUS       BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR   NUM_ROWS SAMPLE_SIZE LAST_ANALYZ GLO
IDXL_SOTCRCT_SOT_UD            P20090219            USABLE            1         372        327879            216663     327879      327879 20-Feb-2009 YES
IDXL_SOTKEY_ENTORDSYS_UD       P20090219            USABLE            2        2910             3             36618     856229      856229 19-Feb-2009 YES
IDXL_SOTKEY_HASH               P20090219            USABLE            2        7783        853956            853914     853956      119705 19-Feb-2009 YES
IDXL_SOTKEY_SOM_ORD            P20090219            USABLE            2        6411        531492            157147     799758      132610 19-Feb-2009 YES
IDXL_SOTRELSTG_SOT_UD          P20090219            USABLE            2       13897       9682052             45867    9682052      794958 20-Feb-2009 YESThanks in advance.
Bhavik Desai

Hi Randolf,
Thanks for the time you spent on this issue.I appreciate it.
Please see my comments below:
1. You've mentioned several times that you're passing the partition name as bind variable, but you're obviously testing the statement with literals rather than bind
variables. So your tests obviously don't reflect what is going to happen in case of the actual execution. The cardinality estimates are potentially quite different when
using bind variables for the partition key.
Yes.I intentionaly used literals in my tests.I found couple of times that plan used by the application and plan generated by AUTOTRACE+EXPLAIN PLAN command...is same and
caused hrly elapsed time.
As i pointed out earlier,last month we solved couple of bind peeking issue by intproducing NO_VALIDATE=>FALSE in stats gather procedure,which we execute just after data
load into such daily partitions and before start of jobs which executes this query.
Execution plans From AWR (with parallelism on at table level DEGREE>1)-->This plan is one which CBO has used when degradation occured.This plan is used most of the times.
ELAPSED_TIME_DELTA BUFFER_GETS_DELTA DISK_READS_DELTA CURSOR(SELECT*FROMTA
        1918506000          46154275              918 CURSOR STATEMENT : 4
CURSOR STATEMENT : 4
PLAN_TABLE_OUTPUT
SQL_ID 39708a3azmks7
SELECT A.SOTRELSTG_SOT_UD SOTCRCT_SOT_UD, B.SOTKEY_UD SOTCRCT_ORIG_SOT_UD, A.ROWID STAGE_ROWID FROM (SELECT SOTRELSTG_SOT_UD,
SOTRELSTG_SYS_UD, SOTRELSTG_ORIG_SYS_ORD_ID, SOTRELSTG_ORIG_SYS_ORD_VSEQ FROM SOT_REL_STAGE WHERE SOTRELSTG_TRD_DATE_YMD_PART = :B1 AND
SOTRELSTG_CRCT_PROC_STAT_CD = 'N' AND SOTRELSTG_SOT_UD NOT IN( SELECT SOTCRCT_SOT_UD FROM SOT_CORRECT WHERE SOTCRCT_TRD_DATE_YMD_PART =
:B1 )) A, (SELECT MAX(SOTKEY_UD) SOTKEY_UD, SOTKEY_SYS_UD, SOTKEY_SYS_ORD_ID, SOTKEY_SYS_ORD_VSEQ, SOTKEY_TRD_DATE_YMD_PART FROM
SOT_KEYMAP WHERE SOTKEY_TRD_DATE_YMD_PART = :B1 AND SOTKEY_IUD_CD = 'I' GROUP BY SOTKEY_TRD_DATE_YMD_PART, SOTKEY_SYS_UD,
SOTKEY_SYS_ORD_ID, SOTKEY_SYS_ORD_VSEQ) B WHERE A.SOTRELSTG_SYS_UD = B.SOTKEY_SYS_UD AND A.SOTRELSTG_ORIG_SYS_ORD_ID =
B.SOTKEY_SYS_ORD_ID AND NVL(A.SOTRELSTG_ORIG_SYS_ORD_VSEQ, 1) = NVL(B.SOTKEY_SYS_ORD_VSEQ, 1)
Plan hash value: 1213870831
| Id | Operation                     | Name                | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ |IN-OUT| PQ Distrib |
|   0 | SELECT STATEMENT              |                     |       |       | 19655 (100)|          |       |       |        |      |            |
|   1 | PX COORDINATOR               |                     |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)         | :TQ10003            |     1 |   116 | 19655   (1)| 00:05:54 |       |       | Q1,03 | P->S | QC (RAND) |
|   3 |    HASH GROUP BY              |                     |     1 |   116 | 19655   (1)| 00:05:54 |       |       | Q1,03 | PCWP |            |
|   4 |     PX RECEIVE                |                     |     1 |   116 | 19655   (1)| 00:05:54 |       |       | Q1,03 | PCWP |            |
|   5 |      PX SEND HASH             | :TQ10002            |     1 |   116 | 19655   (1)| 00:05:54 |       |       | Q1,02 | P->P | HASH       |
|   6 |       HASH GROUP BY           |                     |     1 |   116 | 19655   (1)| 00:05:54 |       |       | Q1,02 | PCWP |            |
|   7 |        NESTED LOOPS ANTI      |                     |     1 |   116 | 19654   (1)| 00:05:54 |       |       | Q1,02 | PCWP |            |
|   8 |         HASH JOIN             |                     |     1 |   102 | 19654   (1)| 00:05:54 |       |       | Q1,02 | PCWP |            |
|   9 |          PX JOIN FILTER CREATE| :BF0000             |    13M|   664M| 2427   (3)| 00:00:44 |       |       | Q1,02 | PCWP |            |
| 10 |           PX RECEIVE          |                     |    13M|   664M| 2427   (3)| 00:00:44 |       |       | Q1,02 | PCWP |            |
| 11 |            PX SEND HASH       | :TQ10000            |    13M|   664M| 2427   (3)| 00:00:44 |       |       | Q1,00 | P->P | HASH       |
| 12 |             PX BLOCK ITERATOR |                     |    13M|   664M| 2427   (3)| 00:00:44 |   KEY |   KEY | Q1,00 | PCWC |            |
| 13 |              TABLE ACCESS FULL| SOT_REL_STAGE       |    13M|   664M| 2427   (3)| 00:00:44 |   KEY |   KEY | Q1,00 | PCWP |            |
| 14 |          PX RECEIVE           |                     |    27M| 1270M| 17209   (1)| 00:05:10 |       |       | Q1,02 | PCWP |            |
| 15 |           PX SEND HASH        | :TQ10001            |    27M| 1270M| 17209   (1)| 00:05:10 |       |       | Q1,01 | P->P | HASH       |
| 16 |            PX JOIN FILTER USE | :BF0000             |    27M| 1270M| 17209   (1)| 00:05:10 |       |       | Q1,01 | PCWP |            |
| 17 |             PX BLOCK ITERATOR |                     |    27M| 1270M| 17209   (1)| 00:05:10 |   KEY |   KEY | Q1,01 | PCWC |            |
| 18 |              TABLE ACCESS FULL| SOT_KEYMAP          |    27M| 1270M| 17209   (1)| 00:05:10 |   KEY |   KEY | Q1,01 | PCWP |            |
| 19 |         PARTITION RANGE SINGLE|                     | 16185 |   221K|     0   (0)|          |   KEY |   KEY | Q1,02 | PCWP |            |
| 20 |          INDEX RANGE SCAN     | IDXL_SOTCRCT_SOT_UD | 16185 |   221K|     0   (0)|          |   KEY |   KEY | Q1,02 | PCWP |            |
Other Execution plan from AWR
ELAPSED_TIME_DELTA BUFFER_GETS_DELTA DISK_READS_DELTA CURSOR(SELECT*FROMTA
        1053251381                 0             2925 CURSOR STATEMENT : 4
CURSOR STATEMENT : 4
PLAN_TABLE_OUTPUT
SQL_ID 39708a3azmks7
SELECT A.SOTRELSTG_SOT_UD SOTCRCT_SOT_UD, B.SOTKEY_UD SOTCRCT_ORIG_SOT_UD, A.ROWID STAGE_ROWID FROM (SELECT SOTRELSTG_SOT_UD,
SOTRELSTG_SYS_UD, SOTRELSTG_ORIG_SYS_ORD_ID, SOTRELSTG_ORIG_SYS_ORD_VSEQ FROM SOT_REL_STAGE WHERE SOTRELSTG_TRD_DATE_YMD_PART = :B1 AND
SOTRELSTG_CRCT_PROC_STAT_CD = 'N' AND SOTRELSTG_SOT_UD NOT IN( SELECT SOTCRCT_SOT_UD FROM SOT_CORRECT WHERE SOTCRCT_TRD_DATE_YMD_PART =
:B1 )) A, (SELECT MAX(SOTKEY_UD) SOTKEY_UD, SOTKEY_SYS_UD, SOTKEY_SYS_ORD_ID, SOTKEY_SYS_ORD_VSEQ, SOTKEY_TRD_DATE_YMD_PART FROM
SOT_KEYMAP WHERE SOTKEY_TRD_DATE_YMD_PART = :B1 AND SOTKEY_IUD_CD = 'I' GROUP BY SOTKEY_TRD_DATE_YMD_PART, SOTKEY_SYS_UD,
SOTKEY_SYS_ORD_ID, SOTKEY_SYS_ORD_VSEQ) B WHERE A.SOTRELSTG_SYS_UD = B.SOTKEY_SYS_UD AND A.SOTRELSTG_ORIG_SYS_ORD_ID =
B.SOTKEY_SYS_ORD_ID AND NVL(A.SOTRELSTG_ORIG_SYS_ORD_VSEQ, 1) = NVL(B.SOTKEY_SYS_ORD_VSEQ, 1)
Plan hash value: 3434900850
| Id | Operation                     | Name                | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ |IN-OUT| PQ Distrib |
|   0 | SELECT STATEMENT              |                     |       |       | 1830 (100)|          |       |       |        |      |            |
|   1 | PX COORDINATOR               |                     |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)         | :TQ10003            |     1 |   131 | 1830   (2)| 00:00:33 |       |       | Q1,03 | P->S | QC (RAND) |
|   3 |    HASH GROUP BY              |                     |     1 |   131 | 1830   (2)| 00:00:33 |       |       | Q1,03 | PCWP |            |
|   4 |     PX RECEIVE                |                     |     1 |   131 | 1830   (2)| 00:00:33 |       |       | Q1,03 | PCWP |            |
|   5 |      PX SEND HASH             | :TQ10002            |     1 |   131 | 1830   (2)| 00:00:33 |       |       | Q1,02 | P->P | HASH       |
|   6 |       HASH GROUP BY           |                     |     1 |   131 | 1830   (2)| 00:00:33 |       |       | Q1,02 | PCWP |            |
|   7 |        NESTED LOOPS ANTI      |                     |     1 |   131 | 1829   (2)| 00:00:33 |       |       | Q1,02 | PCWP |            |
|   8 |         HASH JOIN             |                     |     1 |   117 | 1829   (2)| 00:00:33 |       |       | Q1,02 | PCWP |            |
|   9 |          PX JOIN FILTER CREATE| :BF0000             | 1010K|    50M|   694   (1)| 00:00:13 |       |       | Q1,02 | PCWP |            |
| 10 |           PX RECEIVE          |                     | 1010K|    50M|   694   (1)| 00:00:13 |       |       | Q1,02 | PCWP |            |
| 11 |            PX SEND HASH       | :TQ10000            | 1010K|    50M|   694   (1)| 00:00:13 |       |       | Q1,00 | P->P | HASH       |
| 12 |             PX BLOCK ITERATOR |                     | 1010K|    50M|   694   (1)| 00:00:13 |   KEY |   KEY | Q1,00 | PCWC |            |
| 13 |              TABLE ACCESS FULL| SOT_KEYMAP          | 1010K|    50M|   694   (1)| 00:00:13 |   KEY |   KEY | Q1,00 | PCWP |            |
| 14 |          PX RECEIVE           |                     |    11M|   688M| 1129   (3)| 00:00:21 |       |       | Q1,02 | PCWP |            |
| 15 |           PX SEND HASH        | :TQ10001            |    11M|   688M| 1129   (3)| 00:00:21 |       |       | Q1,01 | P->P | HASH       |
| 16 |            PX JOIN FILTER USE | :BF0000             |    11M|   688M| 1129   (3)| 00:00:21 |       |       | Q1,01 | PCWP |            |
| 17 |             PX BLOCK ITERATOR |                     |    11M|   688M| 1129   (3)| 00:00:21 |   KEY |   KEY | Q1,01 | PCWC |            |
| 18 |              TABLE ACCESS FULL| SOT_REL_STAGE       |    11M|   688M| 1129   (3)| 00:00:21 |   KEY |   KEY | Q1,01 | PCWP |            |
| 19 |         PARTITION RANGE SINGLE|                     | 5209 | 72926 |     0   (0)|          |   KEY |   KEY | Q1,02 | PCWP |            |
| 20 |          INDEX RANGE SCAN     | IDXL_SOTCRCT_SOT_UD | 5209 | 72926 |     0   (0)|          |   KEY |   KEY | Q1,02 | PCWP |            |
EXECUTION PLAN AFTER SETTING DEGREE=1 (It was also degraded)
| Id | Operation                 | Name                | Rows | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
|   0 | SELECT STATEMENT          |                     |     1 |   129 |       | 42336   (2)| 00:12:43 |       |       |
|   1 | HASH GROUP BY            |                     |     1 |   129 |       | 42336   (2)| 00:12:43 |       |       |
|   2 |   NESTED LOOPS ANTI       |                     |     1 |   129 |       | 42335   (2)| 00:12:43 |       |       |
|* 3 |    HASH JOIN              |                     |     1 |   115 |    51M| 42334   (2)| 00:12:43 |       |       |
|   4 |     PARTITION RANGE SINGLE|                     |   846K|    41M|       | 8241   (1)| 00:02:29 |    81 |    81 |
|* 5 |      TABLE ACCESS FULL    | SOT_KEYMAP          |   846K|    41M|       | 8241   (1)| 00:02:29 |    81 |    81 |
|   6 |     PARTITION RANGE SINGLE|                     | 8161K|   490M|       | 12664   (3)| 00:03:48 |    81 |    81 |
|* 7 |      TABLE ACCESS FULL    | SOT_REL_STAGE       | 8161K|   490M|       | 12664   (3)| 00:03:48 |    81 |    81 |
|   8 |    PARTITION RANGE SINGLE |                     | 6525K|    87M|       |     1   (0)| 00:00:01 |    81 |    81 |
|* 9 |     INDEX RANGE SCAN      | IDXL_SOTCRCT_SOT_UD | 6525K|    87M|       |     1   (0)| 00:00:01 |    81 |    81 |
Predicate Information (identified by operation id):
   3 - access("SOTRELSTG_SYS_UD"="SOTKEY_SYS_UD" AND "SOTRELSTG_ORIG_SYS_ORD_ID"="SOTKEY_SYS_ORD_ID" AND
              NVL("SOTRELSTG_ORIG_SYS_ORD_VSEQ",1)=NVL("SOTKEY_SYS_ORD_VSEQ",1))
   5 - filter("SOTKEY_TRD_DATE_YMD_PART"=20090219 AND "SOTKEY_IUD_CD"='I')
   7 - filter("SOTRELSTG_CRCT_PROC_STAT_CD"='N' AND "SOTRELSTG_TRD_DATE_YMD_PART"=20090219)
   9 - access("SOTRELSTG_SOT_UD"="SOTCRCT_SOT_UD" AND "SOTCRCT_TRD_DATE_YMD_PART"=20090219)2. Why are you passing the partition name as bind variable? A statement executing 5 mins. best, > 2 hours worst obviously doesn't suffer from hard parsing issues and
doesn't need to (shouldn't) share execution plans therefore. So I strongly suggest to use literals instead of bind variables. This also solves any potential issues caused
by bind variable peeking.
This is a custom application which uses bind variables to extract data from daily partitions.So,daily automated data extract from daily paritions after load and ELT process.
Here,Value of bind variable is being passed through a procedure parameter.It would be bit difficult to use literals in such application.
3. All your posted plans suffer from bad cardinality estimates. The NO_MERGE hint suggested by Timur only caused a (significant) damage limitation by obviously reducing
the row source size by the group by operation before joining, but still the optimizer is way off, apart from the obviously wrong join order (larger row set first) in
particular the NESTED LOOP operation is causing the main troubles due to excessive logical I/O, as already pointed out by Timur.
Can i ask for alternatives to NESTED LOOP?
4. Your PLAN_TABLE seems to be old (you should see a corresponding note at the bottom of the DBMS_XPLAN.DISPLAY output), because none of the operations have a
filter/access predicate information attached. Since your main issue are the bad cardinality estimates, I strongly suggest to drop any existing PLAN_TABLEs in any non-Oracle
owned schemas because 10g already provides one in the SYS schema (GTT PLAN_TABLE$) exposed via a public synonym, so that the EXPLAIN PLAN information provides the
"Predicate Information" section below the plan covering the "Filter/Access" predicates.
Please post a revised explain plan output including this crucial information so that we get a clue why the cardinality estimates are way off.
I have dropped the old plan.Got above execution plan(listed above in first point) with PREDICATE information.
"As already mentioned the usage of bind variables for the partition name makes this issue potentially worse."
Is there any workaround without replacing bind variable.I am on 10g so 11g's feature will not help !!!
How are you gathering the statistics daily, can you post the exact command(s) used?
gather_table_part_stats(i_owner_name,i_table_name,i_part_name,i_estimate:= DBMS_STATS.AUTO_SAMPLE_SIZE, i_invalidate IN VARCHAR2 := 'Y',i_debug:= 'N')
Thanks & Regards,
Bhavik Desai

Simultaneous hash joins of the same large table with many small ones?

Hello
I've got a typical data warehousing scenario where a HUGE_FACT table is to be joined with numerous very small lookup/dimension tables for data enrichment. Joins with these small lookup tables are mutually independent, which means that the result of any of these joins is not needed to perform another join.
So this is a typical scenario for a hash join: the lookup table is converted into a hashed map in RAM memory, fits there without drama cause it's small and a single pass over the HUGE_FACT suffices to get the results.
Problem is, so far as I can see it in the query plan, these hash joins are not executed simultaneously but one after another, which renders Oracle to do the full scan of the HUGE_FACT (or any intermediary enriched form of it) as many times as there are joins.
Questions:
- is my interpretation correct that the mentioned joins are sequential, not simultaneous?
- if this is the case, is there any possibility to force Oracle to perform these joins simultaneously (building more than one hashed map in memory and doing the single pass over the HUGE_FACT while looking up in all of these hashed maps for matches)? If so, how to do it?
Please note that the parallel execution of a single join at a time is not the matter of the question.
Database version is 10.2.
Thank you very much in advance for any response.

user13176880 wrote:
Questions:
- is my interpretation correct that the mentioned joins are sequential, not simultaneous?Correct. But why do you think this is an issue? Because of this:
which renders Oracle to do the full scan of the HUGE_FACT (or any intermediary enriched form of it) as many times as there are joins.That is (should not be) true. Oracle does one pass of the big table, and then sequentually joins to each of the hashmaps (of each of the smaller tables).
If you show us the execution plan, we can be sure of this.
- if this is the case, is there any possibility to force Oracle to perform these joins simultaneously (building more than one hashed map in memory and doing the single pass over the HUGE_FACT while looking up in all of these hashed maps for matches)? If so, how to do it?Yes there is. But again you should not need to resort to such a solution. What you can do is use subquery factoring (WITH clause) in conjunction with the MATERIALIZE hint to first construct the cartesian join of all of the smaller (dimension) tables. And then join the big table to that.

End of file on communication channel when doing a (hash) join

I'm having a problem with a simple join between two tables (hash join, from the explain), one containing about 650k rows, the other 730k.
The query runs fine until I add to the select list a geometry field (oracle spatial). In that case I receive an end-of-file error, which seems to be caused by a memory saturation. I'm not sure about this, as I can't understand the trace very well...
I've increased the pga_aggregate_target and the pgamax_size but this have little effects on the problem...
Any hint? Could it be caused by the geometry field size?
thanks,
giovanni

Thanks for the quick reply. Here it is what you asked:
select from v$version;*
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
"CORE     10.2.0.4.0     Production"
TNS for 32-bit Windows: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
an excerpt from the trace file:
*** ACTION NAME:() 2010-04-16 12:48:17.796
*** MODULE NAME:(SQL*Plus) 2010-04-16 12:48:17.796
*** SERVICE NAME:(ABRUZZO) 2010-04-16 12:48:17.796
*** SESSION ID:(149.10) 2010-04-16 12:48:17.796
*** 2010-04-16 12:48:17.796
ksedmp: internal or fatal error
ORA-07445: trovata eccezione: dump della memoria [ACCESS_VIOLATION] [_kokekd2m+24] [PC:0x11CF9E0] [ADDR:0xBA8C193C] [UNABLE_TO_READ] []
Current SQL statement for this session:
select g.cr375_idobj,g.cr375_geometria,t.cr374_codicectr,t.cr374_scarpt_cont
from DBTI.Cr374_Scarpata t, DBTI.Cr375g_a_Scarp g
where t.cr374_idobj=g.cr375_idobj
check trace file c:\oracle\product\10.2.0\db_1\rdbms\trace\abruzzo_ora_0.trc for preloading .sym file messages
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
_kokekd2m+24                  00000000
kokeq2iro+178       CALLrel kokekd2m+0 99384C8 9937624 BA8C18D8
9968588 3660D954 3A52AC00
__VInfreq__rworupo+ CALLrel _kokeq2iro+0         9968218 A05C810 EA7
321
_kxhrUnpack+71       CALLreg 00000000             99693C8 A05C810 0 9 0
qerhjWalkHashBucke CALLrel kxhrUnpack+0 9967DD0 99693C8 9 A05C810 0
t+210
__PGOSF352__qerhjIn CALLrel _qerhjWalkHashBucke A05CED4 99693C0 8
nerProbeHashTable+4 t+0
78
_kdstf0000101km+230 CALLreg 00000000
kdsttgr+1263        CALLrel kdstf0000101km+0

Setting PGA_AGGREGATE_TARGET to resolve Hash Join issues?

I'm running some fairly big queries against our data warehouse, 10s millions of rows in some of the tables we are joining. The response times we are getting vary but its not unusual for these to run for many hours (8 and 12 for a couple of the more recent ones). Looking at the long ops I see that it can take 30000 seconds to do a hash join. Full Table Scans are reasonably fast, the cost of the query is pretty good but the hash/sort/merge operations kill the queries every time.
As a developer with access to Google I am obviously well qualified to be telling our DBAs what they need to do to improve the performance of my query - and I stumbled upon the PGA_AGGREGATE_TARGET parameter. Its pretty low in my current environment, 50Mb until I go them to up it to 200Mb (they wouldn't go bigger). But from what I've been reading I think this needs to be alot bigger, the sizes of some of my tables are huge (and often I'll be joining entire datasets and grouping/aggregate them).
But am I looking at the wrong thing, some comments I've read suggest its limited to 200Mb anyway (a hidden parameter?) and that we'd be better off setting manual AREASIZE parameters, is this the case?
Any advice (knowing that I've not supplied explain plans, real sizings etc), but a point in the right direction would be greatly appreciated.
Cheers
Richard

Hi Richard,
Could you give some more details related to your problem:
Which DB version is exactly used ?
Which OS is installed on the DB Server?
Is it a 32 bit or 64 bit OS?
How large is your SGA?
How much free memory (RAM) do you have on your server?
Is the query executed in parallel?
But even without these information I can clearly say that 50MB is far too small.
By default I would recommend, for a DWH, to set the PGA twice as large as the SGA.
You may also have a look at the following documentation:
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96533/memory.htm#49321
http://www.pythian.com/documents/Working_with_Automatic_PGA.ppt
http://www.miracleltd.com/msdbf2007/js.ppt
http://www.vldb.org/conf/2002/S29P03.pdf
http://www.sloug.org/presentations/If_Memory_Richmond_Shee.ppt
Regards
Maurice

Why optimizer prefers nested loop over hash join?

What do I look for if I want to find out why the server prefers a nested loop over hash join?
The server is 10.2.0.4.0.
The query is:
SELECT p.*
    FROM t1 p, t2 d
    WHERE d.emplid = p.id_psoft
      AND p.flag_processed = 'N'
      AND p.desc_pool = :b1
      AND NOT d.name LIKE '%DUPLICATE%'
      AND ROWNUM < 2tkprof output is:
Production
call     count       cpu    elapsed       disk      query    current        rows
Parse        1      0.01       0.00          0          0          4           0
Execute      1      0.00       0.01          0          4          0           0
Fetch        1    228.83     223.48          0    4264533          0           1
total        3    228.84     223.50          0    4264537          4           1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 108 (SANJEEV)
Rows     Row Source Operation
      1 COUNT STOPKEY (cr=4264533 pr=0 pw=0 time=223484076 us)
      1   NESTED LOOPS (cr=4264533 pr=0 pw=0 time=223484031 us)
10401    TABLE ACCESS FULL T1 (cr=192 pr=0 pw=0 time=228969 us)
      1    TABLE ACCESS FULL T2 (cr=4264341 pr=0 pw=0 time=223182508 us)Development
call     count       cpu    elapsed       disk      query    current        rows
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.01          0          4          0           0
Fetch        1      0.05       0.03          0        512          0           1
total        3      0.06       0.06          0        516          0           1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 113 (SANJEEV)
Rows     Row Source Operation
      1 COUNT STOPKEY (cr=512 pr=0 pw=0 time=38876 us)
      1   HASH JOIN (cr=512 pr=0 pw=0 time=38846 us)
     51    TABLE ACCESS FULL T2 (cr=492 pr=0 pw=0 time=30230 us)
    861    TABLE ACCESS FULL T1 (cr=20 pr=0 pw=0 time=2746 us)

sanjeevchauhan wrote:
What do I look for if I want to find out why the server prefers a nested loop over hash join?
The server is 10.2.0.4.0.
The query is:
SELECT p.*
FROM t1 p, t2 d
WHERE d.emplid = p.id_psoft
AND p.flag_processed = 'N'
AND p.desc_pool = :b1
AND NOT d.name LIKE '%DUPLICATE%'
AND ROWNUM < 2
You've got already some suggestions, but the most straightforward way is to run the unhinted statement in both environments and then force the join and access methods you would like to see using hints, in your case probably "USE_HASH(P D)" in your production environment and "FULL(P) FULL(D) USE_NL(P D)" in your development environment should be sufficient to see the costs and estimates returned by the optimizer when using the alternate access and join patterns.
This give you a first indication why the optimizer thinks that the chosen access path seems to be cheaper than the obviously less efficient plan selected in production.
As already mentioned by Hemant using bind variables complicates things a bit since EXPLAIN PLAN is not reliable due to bind variable peeking performed when executing the statement, but not when explaining.
Since you're already on 10g you can get the actual execution plan used for all four variants using DBMS_XPLAN.DISPLAY_CURSOR which tells you more than the TKPROF output in the "Row Source Operation" section regarding the estimates and costs assigned.
Of course the result of your whole exercise might be highly dependent on the actual bind variable value used.
By the way, your statement is questionable in principle since you're querying for the first row of an indeterministic result set. It's not deterministic since you've defined no particular order so depending on the way Oracle executes the statement and the physical storage of your data this query might return different results on different runs.
This is either an indication of a bad design (If the query is supposed to return exactly one row then you don't need the ROWNUM restriction) or an incorrect attempt of a Top 1 query which requires you to specify somehow an order, either by adding a ORDER BY to the statement and wrapping it into an inline view, or e.g. using some analytic functions that allow you specify a RANK by a defined ORDER.
This is an example of how a deterministic Top N query could look like:
SELECT
FROM
SELECT p.*
    FROM t1 p, t2 d
    WHERE d.emplid = p.id_psoft
      AND p.flag_processed = 'N'
      AND p.desc_pool = :b1
      AND NOT d.name LIKE '%DUPLICATE%'
ORDER BY <order_criteria>
WHERE ROWNUM <= 1;Regards,
Randolf
Oracle related stuff blog:
http://oracle-randolf.blogspot.com/
SQLTools++ for Oracle (Open source Oracle GUI for Windows):
http://www.sqltools-plusplus.org:7676/
http://sourceforge.net/projects/sqlt-pp/

Seeking advice on a heavy hash join

We have to self-join a 190 million row table 160 times. On our production 9i database we give "RULE" hint and it finishes in about an hour using nested loop join, On our test 10g database, since "RULE" is not available any more. The optimiser chooses to use hash join (160 levels). And query never finishes in a reasonable time. We have gradually increase the hash_area_size from 8M to 512M, thinking that will help. But apparently it does not. Can anyone provide suggestions? Thanks.

There is an approach using Analytics Functions that may be useful here. The idea is to get all the data values required with 1 reference to Tab2 rather than 160, and let Analytics do the work of building the result set.
The goal here is to 'never do multiple references to the same table where 1 reference can suffice (usually with the aid of Analytics)'.
There are 2 variations depending on whether the ID values here (0,10,20,30,...) always follow an ascending pattern or if they don't. The second example is more generic and will also cover the ascending case.
name ID val ID val ID val ID val ID val
name1 0 1 10 1 20 1 30 1 40 1
-- Performance Summary - This demo used a max value of 1600 vs 15000
for a net number of rows of 2.5 million versus 225 million.
Any of the test views ( replacing 5,10, or 160 tab2 references) using the Analytics function used at most 4099 consistent gets. The original approach used 20553 consistent gets for 5 tab2 references, 41016 consistent gets for 10 tab2 references. This was in 10g, doing the hash join. I did try alter session set optimizer_mode = rule (just for test purposes) and that resulted in 46462 consistent gets for the 10 tab2 reference view while it did a merge join operation.
Autotrace for the Analytics version to replace the original 160 table sql.
JT(147)@JTDB10G>select * from analytics_2_joins;
1600 rows selected.
Statistics
0 recursive calls
0 db block gets
4099 consistent gets
3710 physical reads
0 redo size
1112904 bytes sent via SQL*Net to client
1250 bytes received via SQL*Net from client
81 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1600 rows processed
-- Minimal examples:
--As always, test thoroughly before using in production.
select name,
     id0, value value0,
     id1, value value1,
     id2, value value2,
     id3, value value3,
     id4, value value4
from ( select name, id, value,
     lead(id,0 ) over(partition by name, value order by rn ) id0 ,
     lead(id,1 ) over(partition by name, value order by rn ) id1 ,
     lead(id,2 ) over(partition by name, value order by rn ) id2 ,
     lead(id,3 ) over(partition by name, value order by rn ) id3 ,
     lead(id,4 ) over(partition by name, value order by rn ) id4 ,
rn
from( select tab1.name, i0.id, i0.tab1value value,
(row_number() over (partition by tab1value order by i0.id )) -1 rn
from tab1, tab2 i0 where tab1.name='name1' and i0.tab1value=tab1.value
and i0.id in (0,10,20,30,40)
)) where rn = 0;
-- execute a verions of the smaller analytics approch with bind variables
-- referencing the binds within the numbered in-line view is needed only if
-- the id0, id1, id2 values do not follow the ascending pattern shown in the
-- example. This will handle case where id0 = 30, id2 =20, id4 = 40 , etc.
variable l_id0 number;
variable l_id1 number;
variable l_id2 number;
variable l_id3 number;
variable l_id4 number;
exec :l_id1 := 0;
exec :l_id3 := 10;
exec :l_id2 := 20;
exec :l_id0 := 30;
exec :l_id4 := 40;
select name, bind_rn,
     id0, value value0,
     id1, value value1,
     id2, value value2,
     id3, value value3,
     id4, value value4
from ( select name, id, value,
     lead(id,0 ) over(partition by name, value order by bind_rn ) id0 ,
     lead(id,1 ) over(partition by name, value order by bind_rn ) id1 ,
     lead(id,2 ) over(partition by name, value order by bind_rn ) id2 ,
     lead(id,3 ) over(partition by name, value order by bind_rn ) id3 ,
     lead(id,4 ) over(partition by name, value order by bind_rn ) id4 ,
     bind_rn
from( select tab1.name, i0.id, i0.tab1value value, bind_rn
     from tab1, tab2 i0,
          (select 0 bind_rn, :l_id0 arg_value from dual union
          select 1 , :l_id1 from dual union
          select 2 , :l_id2 from dual union
          select 3 , :l_id3 from dual union
          select 4 , :l_id4 from dual ) table_of_args
     where tab1.name='name1' and i0.tab1value=tab1.value
-- and i0.id in (0,10,20,30,40)
and i0.id = table_of_args.arg_value
)) where bind_rn = 0;
-- Full Test Case
-- table setup
drop table tab1;
drop table tab2;
create table tab1(name varchar2(100), value number) pctfree 0;
create table tab2(id number, tab1value number) pctfree 0;
begin
for x in 0 .. 1600 loop
for y in 1 .. 1600 loop
     insert into tab1 values ('name' || x, y);
end loop;
end loop;
end;
-- 15000 results in 225,000,000
-- 1600 results in 2,560,000
begin
for x in 0 .. 1600 loop
for y in 1 .. 1600 loop
     insert into tab2 values (x, y);
end loop;
end loop;
end;
commit;
CREATE BITMAP INDEX NAME_BITMAP ON TAB1(NAME);
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'JTOMMANEY',TABNAME => 'TAB1', -
     estimate_percent => 20,     CASCADE => TRUE);
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'JTOMMANEY',TABNAME => 'TAB2', -
     estimate_percent => 20,     CASCADE => TRUE);
alter session set optimizer_mode = 'RULE';
-- set up some views both the original approach, and the analytis approach
create view original_5_tab2_tables_join as
select tab1.name name,
     i0.id id0, i0.tab1value value0,
     i1.id id1, i1.tab1value value1,
     i2.id id2, i2.tab1value value2,
     i3.id id3, i3.tab1value value3,
     i4.id id4, i4.tab1value value4
from tab1,
tab2 i0, tab2 i1, tab2 i2, tab2 i3, tab2 i4
where tab1.name='name1'
and (i0.id=0 and i0.tab1value=tab1.value)
and (i1.id=10 and i1.tab1value=tab1.value)
and (i2.id=20 and i2.tab1value=tab1.value)
and (i3.id=30 and i3.tab1value=tab1.value)
and (i4.id=40 and i4.tab1value=tab1.value);
create view replace_5_tab2_joins as
select name,
     id0, value value0,
     id1, value value1,
     id2, value value2,
     id3, value value3,
     id4, value value4
from ( select name, id, value,
     lead(id,0 ) over(partition by name, value order by rn ) id0 ,
     lead(id,1 ) over(partition by name, value order by rn ) id1 ,
     lead(id,2 ) over(partition by name, value order by rn ) id2 ,
     lead(id,3 ) over(partition by name, value order by rn ) id3 ,
     lead(id,4 ) over(partition by name, value order by rn ) id4 ,
rn from( select tab1.name, i0.id, i0.tab1value value,
(row_number() over (partition by tab1value order by i0.id )) -1 rn
from tab1, tab2 i0 where tab1.name='name1' and i0.tab1value=tab1.value
and i0.id in (0,10,20,30,40)
)) where rn = 0;
create view original_10_tab2_tables_join as
select tab1.name name,
     i0.id id0, i0.tab1value value0,
     i1.id id1, i1.tab1value value1,
     i2.id id2, i2.tab1value value2,
     i3.id id3, i3.tab1value value3,
     i4.id id4, i4.tab1value value4,
     i5.id id5, i5.tab1value value5,
     i6.id id6, i6.tab1value value6,
     i7.id id7, i7.tab1value value7,
     i8.id id8, i8.tab1value value8,
     i9.id id9, i9.tab1value value9
from tab1,
tab2 i0, tab2 i1, tab2 i2, tab2 i3, tab2 i4,
tab2 i5, tab2 i6, tab2 i7, tab2 i8, tab2 i9
where tab1.name='name1'
and (i0.id=0 and i0.tab1value=tab1.value)
and (i1.id=10 and i1.tab1value=tab1.value)
and (i2.id=20 and i2.tab1value=tab1.value)
and (i3.id=30 and i3.tab1value=tab1.value)
and (i4.id=40 and i4.tab1value=tab1.value)
and (i5.id=50 and i5.tab1value=tab1.value)
and (i6.id=60 and i6.tab1value=tab1.value)
and (i7.id=70 and i7.tab1value=tab1.value)
and (i8.id=80 and i8.tab1value=tab1.value)
and (i9.id=90 and i9.tab1value=tab1.value);
create view replace_10_tab2_joins as
select name,
     id0, value value0,
     id1, value value1,
     id2, value value2,
     id3, value value3,
     id4, value value4,
     id5, value value5,
     id6, value value6,
     id7, value value7,
     id8, value value8,
     id9, value value9
from ( select name, id, value,
     lead(id,0 ) over(partition by name, value order by rn ) id0 ,
     lead(id,1 ) over(partition by name, value order by rn ) id1 ,
     lead(id,2 ) over(partition by name, value order by rn ) id2 ,
     lead(id,3 ) over(partition by name, value order by rn ) id3 ,
     lead(id,4 ) over(partition by name, value order by rn ) id4 ,
     lead(id,5 ) over(partition by name, value order by rn ) id5 ,
     lead(id,6 ) over(partition by name, value order by rn ) id6 ,
     lead(id,7 ) over(partition by name, value order by rn ) id7 ,
     lead(id,8 ) over(partition by name, value order by rn ) id8 ,
     lead(id,9 ) over(partition by name, value order by rn ) id9 ,
rn from( select tab1.name, i0.id, i0.tab1value value,
(row_number() over (partition by tab1value order by i0.id )) -1 rn
from tab1, tab2 i0 where tab1.name='name1' and i0.tab1value=tab1.value
and i0.id in (0,10,20,30,40,50,60,70,80,90)
)) where rn = 0;
-- set up some views both the original approach, and the analytics approach
spool cr_v1.sql may need to clean up heading, linefeed from created file
begin
dbms_output.put_line('create or replace view original_160_joins as select /*+ rule */ tab1.name ');
for x in 0 .. 160 loop
dbms_output.put_line( ',i' || x || '.id id' || x || ' ,i' || x || '.tab1value value' || x ) ;
end loop;
dbms_output.put_line('from tab1' );
for x in 0 .. 160 loop
dbms_output.put_line( ',tab2 i' || x ) ;
end loop;
dbms_output.put_line(' where tab1.name = ''name1''' );
for x in 0 .. 160 loop
dbms_output.put_line( ' and i' || x || '.id=' || (x * 10) || ' and i' || x || '.tab1value=tab1.value ' ) ;
end loop;
dbms_output.put_line( ' ;');
end;
--spool off
--@cr_v1.sql
spool cr_v2.sql may need to clean up heading, linefeed from created file
begin
dbms_output.put_line('create or replace view analytics_2_joins as select name ');
for x in 0 .. 160 loop
dbms_output.put_line( ',id' || x || ', value value' || x ) ;
end loop;
dbms_output.put_line('from ( select name, id, value ' );
for x in 0 .. 160 loop
dbms_output.put_line( ',lead(id,' || x || ') over(partition by name, value order by rn ) id' || x ) ;
end loop;
dbms_output.put_line(' , rn from( select tab1.name, i0.id, i0.tab1value value, ');
dbms_output.put_line(' (row_number() over (partition by tab1value order by i0.id )) -1 rn ');
dbms_output.put_line(' from tab1, tab2 i0 where tab1.name=''name1'' and i0.tab1value=tab1.value and i0.id in ( ');
for x in 0 .. 159 loop
dbms_output.put_line( (x * 10) || ',' ) ;
end loop;
dbms_output.put_line( ' 1600))) where rn = 0;');
end;
--spool off
--@cr_v2.sql
-- We now have 6 views established
-- Original Approach     Analytics Approach w/ 1 tab2 reference
-- 5 tab2s     original_5_tab2_tables_join      replace_5_tab2_joins
-- 10 tab2s     original_10_tab2_tables_join      replace_10_tab2_joins
--160 tab2s original_160_joins          analytics_2_joins
-- plus we will use call the version with bind variables, but not from a view.
-- Data validation:
select 'orig_minus_new: ' || count(*) from
( select * from original_5_tab2_tables_join minus select * from replace_5_tab2_joins ) union
select 'new_minus_orig: ' || count(*) from
( select * from replace_5_tab2_joins minus select * from original_5_tab2_tables_join );
select 'orig_minus_new: ' || count(*) from
( select * from original_10_tab2_tables_join minus select * from replace_10_tab2_joins ) union
select 'new_minus_orig: ' || count(*) from
( select * from replace_10_tab2_joins minus select * from original_10_tab2_tables_join );
select 'orig_minus_new: ' || count(*) from
( select * from original_160_joins minus select * from analytics_2_joins );
select 'new_minus_orig: ' || count(*) from
( select * from analytics_2_joins minus select * from original_160_joins );
-- Performance test
alter session set workarea_size_policy=manual ;
alter session set sort_area_size = 64000000;
alter session set hash_area_size = 64000000;
set autotrace traceonly stat
select * from original_5_tab2_tables_join;
select * from replace_5_tab2_joins;
select * from original_10_tab2_tables_join;
select * from replace_10_tab2_joins;
select * from analytics_2_joins;
--select * from original_160_joins;
-- execute a verions of the smaller analytics approch with bind variables
-- referencing the binds within the numbered in-line view is needed only if
-- the id0, id1, id2 values do not follow the ascending pattern shown in the
-- example. This will handle case where id0 = 30, id2 =20, id4 = 40 , etc.
variable l_id0 number;
variable l_id1 number;
variable l_id2 number;
variable l_id3 number;
variable l_id4 number;
exec :l_id1 := 0;
exec :l_id3 := 10;
exec :l_id2 := 20;
exec :l_id0 := 30;
exec :l_id4 := 40;
select name, bind_rn,
     id0, value value0,
     id1, value value1,
     id2, value value2,
     id3, value value3,
     id4, value value4
from ( select name, id, value,
     lead(id,0 ) over(partition by name, value order by bind_rn ) id0 ,
     lead(id,1 ) over(partition by name, value order by bind_rn ) id1 ,
     lead(id,2 ) over(partition by name, value order by bind_rn ) id2 ,
     lead(id,3 ) over(partition by name, value order by bind_rn ) id3 ,
     lead(id,4 ) over(partition by name, value order by bind_rn ) id4 ,
     bind_rn
from( select tab1.name, i0.id, i0.tab1value value, bind_rn
     from tab1, tab2 i0,
          (select 0 bind_rn, :l_id0 arg_value from dual union
          select 1 , :l_id1 from dual union
          select 2 , :l_id2 from dual union
          select 3 , :l_id3 from dual union
          select 4 , :l_id4 from dual ) table_of_args
     where tab1.name='name1' and i0.tab1value=tab1.value
-- and i0.id in (0,10,20,30,40)
and i0.id = table_of_args.arg_value
)) where bind_rn = 0;
JT(147)@JTDB10G>select * from original_5_tab2_tables_join;
1600 rows selected.
Statistics
8 recursive calls
2 db block gets
20553 consistent gets
18555 physical reads
0 redo size
52052 bytes sent via SQL*Net to client
1250 bytes received via SQL*Net from client
81 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1600 rows processed
JT(147)@JTDB10G>select * from replace_5_tab2_joins;
1600 rows selected.
Statistics
8 recursive calls
2 db block gets
4101 consistent gets
3710 physical reads
0 redo size
52052 bytes sent via SQL*Net to client
1250 bytes received via SQL*Net from client
81 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1600 rows processed
JT(147)@JTDB10G>select * from original_10_tab2_tables_join;
1600 rows selected.
Statistics
0 recursive calls
0 db block gets
41016 consistent gets
37115 physical reads
0 redo size
85636 bytes sent via SQL*Net to client
1250 bytes received via SQL*Net from client
81 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1600 rows processed
JT(147)@JTDB10G>select * from replace_10_tab2_joins;
1600 rows selected.
Statistics
0 recursive calls
0 db block gets
4099 consistent gets
3710 physical reads
0 redo size
85636 bytes sent via SQL*Net to client
1250 bytes received via SQL*Net from client
81 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1600 rows processed
JT(147)@JTDB10G>select * from analytics_2_joins;
1600 rows selected.
Statistics
0 recursive calls
0 db block gets
4099 consistent gets
3710 physical reads
0 redo size
1112904 bytes sent via SQL*Net to client
1250 bytes received via SQL*Net from client
81 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1600 rows processed
JT(147)@JTDB10G>--select * from original_160_joins;
JT(147)@JTDB10G>
JT(147)@JTDB10G>----------------------------------------------------------------------------------------
JT(147)@JTDB10G>-- execute a verions of the smaller analytics approch with bind variables
JT(147)@JTDB10G>-- referencing the binds within the numbered in-line view is needed only if
JT(147)@JTDB10G>-- the id0, id1, id2 values do not follow the ascending pattern shown in the
JT(147)@JTDB10G>-- example. This will handle case where id0 = 30, id2 =20, id4 = 40 , etc.
JT(147)@JTDB10G>----------------------------------------------------------------------------------------
JT(147)@JTDB10G>
JT(147)@JTDB10G>variable l_id0 number;
JT(147)@JTDB10G>variable l_id1 number;
JT(147)@JTDB10G>variable l_id2 number;
JT(147)@JTDB10G>variable l_id3 number;
JT(147)@JTDB10G>variable l_id4 number;
JT(147)@JTDB10G>
JT(147)@JTDB10G>exec :l_id1 := 0;
PL/SQL procedure successfully completed.
JT(147)@JTDB10G>exec :l_id3 := 10;
PL/SQL procedure successfully completed.
JT(147)@JTDB10G>exec :l_id2 := 20;
PL/SQL procedure successfully completed.
JT(147)@JTDB10G>exec :l_id0 := 30;
PL/SQL procedure successfully completed.
JT(147)@JTDB10G>exec :l_id4 := 40;
PL/SQL procedure successfully completed.
JT(147)@JTDB10G>
JT(147)@JTDB10G>select name, bind_rn,
2      id0, value value0,
3      id1, value value1,
4      id2, value value2,
5      id3, value value3,
6      id4, value value4
7 from ( select name, id, value,
8      lead(id,0 ) over(partition by name, value order by bind_rn ) id0 ,
9      lead(id,1 ) over(partition by name, value order by bind_rn ) id1 ,
10      lead(id,2 ) over(partition by name, value order by bind_rn ) id2 ,
11      lead(id,3 ) over(partition by name, value order by bind_rn ) id3 ,
12      lead(id,4 ) over(partition by name, value order by bind_rn ) id4 ,
13      bind_rn
14 from( select tab1.name, i0.id, i0.tab1value value, bind_rn
15      from tab1, tab2 i0,
16           (select 0 bind_rn, :l_id0 arg_value from dual union
17           select 1 , :l_id1 from dual union
18           select 2 , :l_id2 from dual union
19           select 3 , :l_id3 from dual union
20           select 4 , :l_id4 from dual ) table_of_args
21      where tab1.name='name1' and i0.tab1value=tab1.value
22 -- and i0.id in (0,10,20,30,40)
23 and i0.id = table_of_args.arg_value
24 )) where bind_rn = 0;
1600 rows selected.
Statistics
1 recursive calls
0 db block gets
4099 consistent gets
3707 physical reads
0 redo size
52111 bytes sent via SQL*Net to client
1250 bytes received via SQL*Net from client
81 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1600 rows processed

How to hint hash join order on indexes ?

Hi,
in my 9.2.0.8 DB I've got query like this:
SELECT COUNT (agreementno) cnt
FROM followup
WHERE agreementno = :v001 AND actioncode = :v002 AND resultcode = :v003;
Plan
SELECT STATEMENT CHOOSECost: 11 Bytes: 18 Cardinality: 1
     5 SORT AGGREGATE Bytes: 18 Cardinality: 1
          4 VIEW index$_join$_001 Cost: 11 Bytes: 18 Cardinality: 1
               3 HASH JOIN Bytes: 18 Cardinality: 1
                    1 INDEX RANGE SCAN NON-UNIQUE IDX_FOLLOWUP06 Cost: 13 Bytes: 18 Cardinality: 1
                    2 INDEX RANGE SCAN UNIQUE PK_FOLLOWUP Cost: 13 Bytes: 18 Cardinality: 1 I need to change join order of indexes, so the proble one would be PK_FOLLOWUP .
Of course the best plan is index range scan on pk but during to hight CF Oracle is combining 2 indexes .
Regards.
Greg

Hmm sorry for red-herring.
I guess you could also consider hand-coding the index join, something like:
SELECT /*+ ORDERED USE_HASH (b) */
       COUNT (a.empno)
FROM (SELECT e.empno
       FROM emp e
       WHERE e.empno > 7000) a,
      (SELECT e.empno
       FROM emp e
       WHERE e.ename LIKE 'S%') b
WHERE a.ROWID = b.ROWID;or...
WITH e AS
     (SELECT e.empno, e.ename
        FROM emp e)
SELECT /*+ ORDERED USE_HASH (b) */
       COUNT (a.empno)
FROM   e a, e b
WHERE b.ename LIKE 'S%'
AND    a.empno > 7000
AND    a.ROWID = b.ROWID;

Bind join and hash join

Hi
I would like to know if there is difference between bind join end hash join.
For example If I write sql code in My query tool (in Data Federator Query server administrator XI 3.0) it is traduced in a hashjoin(....);If I set system parameters in right way to produce bind join, hashjoin(...) is present again!
how do i understand the difference?
bye

Query to track the holder and waiter info
People who reach this place for a similar problem can use the above link to find their answer
Regards,
Vishal

Performance Tuning - remove hash join

Hi Every one,
Can some one help in tuning below query, i have hash join taking around 84%,
SELECT
PlanId
,ReplacementPlanId
FROM
( SELECT
pl.PlanId
,xpl.PlanId ReplacementPlanId
,ROW_NUMBER() OVER(PARTITION BY pl.PlanId ORDER BY xpl.PlanId) RN
FROM [dbo].[Plan] pl
JOIN [dbo].[Plan] xpl
ON pl.PPlanId = xpl.PPlanId
AND pl.Name = xpl.Name
WHERE
pl.SDate > (CONVERT(CHAR(10),DATEADD(M,-12,GETDATE()),120)) AND
xpl.Live = 1
AND pl.TypeId = 7
AND xpl.TypeId = 7
) p
WHERE RN = 1
Thanks in advance

Can you show an execution plan of the query? Is that possible to rewrite the query as
Sorry cannot test it right now
SELECT
PlanId
,ReplacementPlanId
FROM
(SELECT PlanId,Name,SDate,TypeId FROM [dbo].[Plan]) pl
CROSS APPLY
SELECT TOP (1) TypeId ,Live FROM [Plan] xpl
WHERE pl.PPlanId = xpl.PPlanId
AND pl.Name = xpl.Name
) AS Der
WHERE pl.SDate > (CONVERT(CHAR(10),DATEADD(M,-12,GETDATE()),120)) AND
Der.Live = 1
AND pl.TypeId = 7
AND Der.TypeId = 7
Best Regards,Uri Dimant SQL Server MVP,
http://sqlblog.com/blogs/uri_dimant/
MS SQL optimization: MS SQL Development and Optimization
MS SQL Consulting:
Large scale of database and data cleansing
Remote DBA Services:
Improves MS SQL Database Performance
SQL Server Integration Services:
Business Intelligence

Nested loop vs Hash Join

Hi,
Both the querys are returning same results, but in my first query hash join and second query nested loop . How ? PLs explain
select *
from emp a,dept b
where a.deptno=b.deptno and b.deptno>20;
6 rows
Plan hash value: 4102772462
| Id | Operation                    | Name    | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT             |         |     6 |   348 |     6 (17)| 00:00:01 |
|* 1 | HASH JOIN                   |         |     6 |   348 |     6 (17)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     3 |    60 |     2   (0)| 00:00:01 |
|* 3 |    INDEX RANGE SCAN          | PK_DEPT |     3 |       |     1   (0)| 00:00:01 |
|* 4 |   TABLE ACCESS FULL          | EMP     |     7 |   266 |     3   (0)| 00:00:01 |
Predicate Information (identified by operation id):
   1 - access("A"."DEPTNO"="B"."DEPTNO")
   3 - access("B"."DEPTNO">20)
   4 - filter("A"."DEPTNO">20)
select *
from emp a,dept b
where a.deptno=b.deptno and b.deptno=30;
6 rows
Plan hash value: 568005898
| Id | Operation                    | Name    | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT             |         |     5 |   290 |     4   (0)| 00:00:01 |
|   1 | NESTED LOOPS                |         |     5 |   290 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    20 |     1   (0)| 00:00:01 |
|* 3 |    INDEX UNIQUE SCAN         | PK_DEPT |     1 |       |     0   (0)| 00:00:01 |
|* 4 |   TABLE ACCESS FULL          | EMP     |     5 |   190 |     3   (0)| 00:00:01 |
Predicate Information (identified by operation id):
   3 - access("B"."DEPTNO"=30)
   4 - filter("A"."DEPTNO"=30)

Hi,
Unless specifically requested, Oracle picks the best execution plan based on estimates of table sizes, column selectivity and many other variables. Even though Oracle does its best to have the estimates as accurate as possible, they are frequently different, and in some cases quite different, from the actual values.
In the first query, Oracle estimated that the predicate “ b.deptno>20” would limit the number of records to 6, and based on that it decided the use Hash Join.
In the second query, Oracle estimated that the predicate “b.deptno=30” would limit the number of records to 5, and based on that it decided the use Nested Loops Join.
The fact that the actual number of records is the same is irrelevant because Oracle used the estimate, rather the actual number of records to pick the best plan.
HTH,
Iordan
Iotzov

What is HASH JOIN ANTI

Hi ,
From my explain plan i see HASH JOIN ANTI .
what arises to this join in my select query ?
select * from ( select * from tbl1 where not exists (select * from tbl2 where tbl1.col1 = tbl2.col1) jj
where not exist (select * from tbl3 where jj.col1 = tbl3.col1))i think it's the dbl not exists that cause this hash join anti
but what impact will this type of join has on my query performance ?
tks & rdgs

Hi ,
but i tot when using SELECT where xx in ( ) or XX NOT in ( ) then it'll for every single record then it would perform the select query within the IN ( ) or NOT IN ( )
wherelse EXISTS or NOT EXISTS will actually do prelisting and then compare from there onwards ?
tks & rdgs

Hash join

I have an index on CAT_MAP_ID column of STM_RPT_ITEM_PH6_MV but I don't know why it's not using nested look join for 21 rows returned (outer) to join 641k rows in STM_RPT_ITEM_PH6_MV table. I think that's the reason this query is consuming very high TEMP (87GB), can anyone help me explain this:
| Id | Operation                             | Name                        | E-Rows | OMem | 1Mem | Used-Mem | Used-Tmp|
|   0 | INSERT STATEMENT                      |                             |        |       |       |          |         |
|   1 | SORT GROUP BY                        |                             |      1 |   131M| 3585K|   90M (1)|     118K|
|* 2 |   VIEW                                |                             |      1 |       |       |          |         |
|   3 |    SORT UNIQUE                        |                             |      1 |   261M| 4982K|   90M (1)|         |
|* 4 |     WINDOW SORT PUSHED RANK           |                             |      1 |   511M| 6878K| 172M (1)|         |
|   5 |      NESTED LOOPS OUTER               |                             |      1 |       |       |          |         |
|   6 |       NESTED LOOPS                    |                             |      1 |       |       |          |         |
*|* 7 |        HASH JOIN                      |                             |      1 | 2047M|    75M| 470M (1)|      87M|*
|   8 |         MAT_VIEW ACCESS BY INDEX ROWID| BSC_RPT_REBSUM_STRUCT_LI_MV |     21 |       |       |          |         |
|   9 |          NESTED LOOPS                 |                             |   3960 |       |       |          |         |
| 10 |           NESTED LOOPS                |                             |    185 |       |       |          |         |
|* 11 |            HASH JOIN                  |                             |   2638 | 1465K|   902K| 4966K (0)|         |
|* 12 |             HASH JOIN                 |                             |   4193 |   841K|   841K| 4992K (0)|         |
| 13 |              TABLE ACCESS FULL        | BSC_RPT_REBSUM_TEMP         |   4193 |       |       |          |         |
| 14 |              TABLE ACCESS FULL        | BSC_RPT_REBSUM_S1           | 83548 |       |       |          |         |
| 15 |             MAT_VIEW ACCESS FULL      | BSC_RPT_REBSUM_MEM_MV       | 52555 |       |       |          |         |
|* 16 |            TABLE ACCESS BY INDEX ROWID| MN_BUCKET_LINE              |      1 |       |       |          |         |
|* 17 |             INDEX RANGE SCAN          | REBATEPAYMENTGRP            |   1460 |       |       |          |         |
|* 18 |           INDEX RANGE SCAN            | BSC_RPT_STRUCT_LI_MV_IDX1   |     22 |       |       |          |         |
| 19 |         MAT_VIEW ACCESS FULL          | BSC_RPT_ITEM_PH6_MV         *|    641K|*       |       |          |         |
|* 20 |        INDEX UNIQUE SCAN              | MN_10291_PK                 |      1 |       |       |          |         |
| 21 |       TABLE ACCESS BY INDEX ROWID     | BSC_SPLIT_PMT               |      1 |       |       |          |         |
|* 22 |        INDEX RANGE SCAN               | BSC_904200_IDX1             |      1 |       |       |          |         |
Predicate Information (identified by operation id):
   2 - filter("RPT"."RNK"=1)
   4 - filter(ROW_NUMBER() OVER ( PARTITION BY "BUCK_LI"."BUCKET_LINE_ID" ORDER BY
              INTERNAL_FUNCTION("ITEM_VW"."BSC_PHLEVEL") DESC )<=1)
   7 - access("LIMV"."CAT_MAP_ID"="ITEM_VW"."CAT_MAP_ID" AND "ITEM_VW"."ITEM_ID"="BUCK_LI"."SALE_ITEM_ID")
11 - access("MEM_MV"."PRC_PROGRAM_ID"="S1"."PRC_PROGRAM_ID" AND
              "MEM_MV"."PARENT_MEMBER_ID"="S1"."MEMBER_ID_CUST")
12 - access("TEMP"."CONTRACT_ID"="S1"."CONTRACT_ID" AND "TEMP"."REBATE_PMT_ID"="S1"."REBATE_PMT_ID")
16 - filter(("BUCK_LI"."PMT_BENEFIT_ID" IS NOT NULL AND "BUCK_LI"."INCLUSION_TYPE"='INC' AND
              "MEM_MV"."CHILD_MEMBER_ID"="BUCK_LI"."SALE_CONTRACTED_CUST_ID"))
17 - access("TEMP"."REBATE_PMT_ID"="BUCK_LI"."REBATE_PMT_ID")
18 - access("LIMV"."PRC_PROGRAM_ID"="S1"."PRC_PROGRAM_ID")
20 - access("PRC"."PRC_PROGRAM_ID"="S1"."PRC_PROGRAM_ID")
22 - access("TEMP"."REBATE_PMT_ID"="SPLIT"."REBATE_PMT_ID")
       filter("SPLIT"."REBATE_PMT_ID" IS NOT NULL)Note
- dynamic sampling used for this statement
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
Query Is:
INSERT INTO MDK_TAB1
            (sale_id, struct_doc_id, pdflg, member_id_cust, paid_end_date, payee, map_name, STM_phlevel, STM_phcode, STM_old_material_num, sold_to_cust_id, member_grp_name,
               prc_program_id, STM_r_af_type, rebate_pmt_id_num, rebate_pmt_id, sale_inv_qty, sales, earn_rebate_amt)
   SELECT   rpt.sale_id, rpt.struct_doc_id, rpt.pdflg, rpt.member_id_cust, rpt.paid_end_date, rpt.payee, rpt.map_name, rpt.STM_phlevel,
            rpt.STM_phcode, rpt.STM_old_material_num, rpt.sold_to_cust_id, rpt.member_grp_name, rpt.prc_program_id, rpt.STM_r_af_type, rpt.rebate_pmt_id_num,
               rpt.rebate_pmt_id, rpt.sales_units, rpt.sales, SUM (rpt.earn_rebate_amt) earn_rebate_amt
       FROM (SELECT DISTINCT buck_li.sale_id, temp.contract_id struct_doc_id,
                             temp.pdflg, s1.member_id_cust,
                             temp.paid_end_date, SPLIT.payee,
                             item_vw.map_name, item_vw.STM_phlevel,
                             item_vw.STM_phcode, item_vw.STM_old_material_num,
                             buck_li.sale_contracted_cust_id sold_to_cust_id,
                             mem_mv.child_name member_grp_name, s1.prc_program_id, s1.STM_r_af_type,
                             s1.rebate_pmt_id_num, s1.rebate_pmt_id,
                             buck_li.sale_inv_qty AS sales_units,
                             NULLIF (buck_li.sale_ext_amt, 0) sales,
                             NULLIF
                                 (buck_li.STM_payment_amount, 0) earn_rebate_amt,
                             ROW_NUMBER () OVER (PARTITION BY buck_li.bucket_line_id ORDER BY item_vw.STM_phlevel DESC)
                   rnk
                        FROM STM_rpt_rebsum_s1 s1,
                             mn_bucket_line buck_li,
                             STM_rpt_item_ph6_mv item_vw,
                             STM_rpt_rebsum_struct_li_mv limv,
                             STM_rpt_rebsum_mem_mv mem_mv,
                             STM_rpt_rebsum_temp temp,
                             STM_split_pmt SPLIT,
                             mn_prc_program prc
                       WHERE temp.contract_id = s1.contract_id
                         AND temp.rebate_pmt_id = s1.rebate_pmt_id
                         AND temp.rebate_pmt_id = SPLIT.rebate_pmt_id(+)
                         AND temp.rebate_pmt_id = buck_li.rebate_pmt_id
                         AND limv.prc_program_id = s1.prc_program_id
                         AND prc.prc_program_id = s1.prc_program_id
                         AND mem_mv.prc_program_id = s1.prc_program_id
                         AND mem_mv.parent_member_id = s1.member_id_cust
                         AND *limv.cat_map_id = item_vw.cat_map_id*
                         AND item_vw.item_id = buck_li.sale_item_id
                         AND mem_mv.child_member_id = buck_li.sale_contracted_cust_id
                         AND buck_li.pmt_benefit_id IS NOT NULL
                         AND buck_li.inclusion_type = 'INC') rpt
      WHERE rpt.rnk = 1
   GROUP BY rpt.sale_id,
            rpt.struct_doc_id,
            rpt.pdflg,
            rpt.member_id_cust,
            rpt.paid_end_date,
            rpt.payee,
            rpt.map_name,
            rpt.STM_phlevel,
            rpt.STM_phcode,
            rpt.STM_old_material_num,
            rpt.sold_to_cust_id,
            rpt.member_grp_name,
            rpt.prc_program_id,
            rpt.STM_r_af_type,
            rpt.rebate_pmt_id_num,
            rpt.rebate_pmt_id,
            rpt.sales_units,
            rpt.sales;Edited by: 988590 on Feb 21, 2013 10:33 AM

Sorry for messing with the code thingy, it's the first timer's mistake, here is the formatted version:
| Id | Operation                             | Name                        | E-Rows | OMem | 1Mem | Used-Mem | Used-Tmp|
|   0 | INSERT STATEMENT                      |                             |        |       |       |          |         |
|   1 | SORT GROUP BY                        |                             |      1 |   133M| 3616K|   90M (1)|     120K|
|* 2 |   VIEW                                |                             |      1 |       |       |          |         |
|   3 |    SORT UNIQUE                        |                             |      1 |   266M| 5026K|   90M (1)|         |
|* 4 |     WINDOW SORT PUSHED RANK           |                             |      1 |   520M| 6933K| 172M (1)|         |
|   5 |      NESTED LOOPS OUTER               |                             |      1 |       |       |          |         |
|   6 |       NESTED LOOPS                    |                             |      1 |       |       |          |         |
|* 7 |        HASH JOIN                      |                             |      1 | 2047M|   120M| 419M (1)|      93M|
|   8 |         MAT_VIEW ACCESS BY INDEX ROWID| BSC_RPT_REBSUM_STRUCT_LI_MV |     21 |       |       |          |         |
|   9 |          NESTED LOOPS                 |                             |   5468 |       |       |          |         |
| 10 |           NESTED LOOPS                |                             |    255 |       |       |          |         |
|* 11 |            HASH JOIN                  |                             |   2645 | 1465K|   902K| 3386K (0)|         |
|* 12 |             HASH JOIN                 |                             |   4202 |   841K|   841K| 5115K (0)|         |
| 13 |              TABLE ACCESS FULL        | BSC_RPT_REBSUM_TEMP         |   4202 |       |       |          |         |
| 14 |              TABLE ACCESS FULL        | BSC_RPT_REBSUM_S1           | 83564 |       |       |          |         |
| 15 |             MAT_VIEW ACCESS FULL      | BSC_RPT_REBSUM_MEM_MV       | 52596 |       |       |          |         |
|* 16 |            TABLE ACCESS BY INDEX ROWID| MN_BUCKET_LINE              |      1 |       |       |          |         |
|* 17 |             INDEX RANGE SCAN          | REBATEPAYMENTGRP            |   1465 |       |       |          |         |
|* 18 |           INDEX RANGE SCAN            | BSC_RPT_STRUCT_LI_MV_IDX1   |     22 |       |       |          |         |
| 19 |         MAT_VIEW ACCESS FULL          | BSC_RPT_ITEM_PH6_MV         |    641K|       |       |          |         |
|* 20 |        INDEX UNIQUE SCAN              | MN_10291_PK                 |      1 |       |       |          |         |
| 21 |       TABLE ACCESS BY INDEX ROWID     | BSC_SPLIT_PMT               |      1 |       |       |          |         |
|* 22 |        INDEX RANGE SCAN               | BSC_904200_IDX1             |      1 |       |       |          |         |
Predicate Information (identified by operation id):
   2 - filter("RPT"."RNK"=1)
   4 - filter(ROW_NUMBER() OVER ( PARTITION BY "BUCK_LI"."BUCKET_LINE_ID" ORDER BY
              INTERNAL_FUNCTION("ITEM_VW"."BSC_PHLEVEL") DESC )<=1)
   7 - access("LIMV"."CAT_MAP_ID"="ITEM_VW"."CAT_MAP_ID" AND "ITEM_VW"."ITEM_ID"="BUCK_LI"."SALE_ITEM_ID")
11 - access("MEM_MV"."PRC_PROGRAM_ID"="S1"."PRC_PROGRAM_ID" AND
              "MEM_MV"."PARENT_MEMBER_ID"="S1"."MEMBER_ID_CUST")
12 - access("TEMP"."CONTRACT_ID"="S1"."CONTRACT_ID" AND "TEMP"."REBATE_PMT_ID"="S1"."REBATE_PMT_ID")
16 - filter(("BUCK_LI"."PMT_BENEFIT_ID" IS NOT NULL AND "BUCK_LI"."INCLUSION_TYPE"='INC' AND
              "MEM_MV"."CHILD_MEMBER_ID"="BUCK_LI"."SALE_CONTRACTED_CUST_ID"))
17 - access("TEMP"."REBATE_PMT_ID"="BUCK_LI"."REBATE_PMT_ID")
18 - access("LIMV"."PRC_PROGRAM_ID"="S1"."PRC_PROGRAM_ID")
20 - access("PRC"."PRC_PROGRAM_ID"="S1"."PRC_PROGRAM_ID")
22 - access("TEMP"."REBATE_PMT_ID"="SPLIT"."REBATE_PMT_ID")
       filter("SPLIT"."REBATE_PMT_ID" IS NOT NULL)
Note
   - dynamic sampling used for this statement
   - Warning: basic plan statistics not available. These are only collected when:
       * hint 'gather_plan_statistics' is used for the statement or
       * parameter 'statistics_level' is set to 'ALL', at session or system level
INSERT INTO BSC_RPT_REBSUM_S3_TEMP
            (sale_id, struct_doc_id, pdflg, member_id_cust, paid_end_date, payee, map_name, bsc_phlevel, bsc_phcode, bsc_old_material_num, sold_to_cust_id, member_grp_name,
               prc_program_id, bsc_r_af_type, rebate_pmt_id_num, rebate_pmt_id, sale_inv_qty, sales, earn_rebate_amt)
   SELECT   rpt.sale_id, rpt.struct_doc_id, rpt.pdflg, rpt.member_id_cust, rpt.paid_end_date, rpt.payee, rpt.map_name, rpt.bsc_phlevel,
            rpt.bsc_phcode, rpt.bsc_old_material_num, rpt.sold_to_cust_id, rpt.member_grp_name, rpt.prc_program_id, rpt.bsc_r_af_type, rpt.rebate_pmt_id_num,
               rpt.rebate_pmt_id, rpt.sales_units, rpt.sales, SUM (rpt.earn_rebate_amt) earn_rebate_amt
       FROM (SELECT DISTINCT buck_li.sale_id, temp.contract_id struct_doc_id,
                             temp.pdflg, s1.member_id_cust,
                             temp.paid_end_date, SPLIT.payee,
                             item_vw.map_name, item_vw.bsc_phlevel,
                             item_vw.bsc_phcode, item_vw.bsc_old_material_num,
                             buck_li.sale_contracted_cust_id sold_to_cust_id,
                             mem_mv.child_name member_grp_name, s1.prc_program_id, s1.bsc_r_af_type,
                             s1.rebate_pmt_id_num, s1.rebate_pmt_id,
                             buck_li.sale_inv_qty AS sales_units,
                             NULLIF (buck_li.sale_ext_amt, 0) sales,
                             NULLIF
                                 (buck_li.bsc_payment_amount, 0) earn_rebate_amt,
                             ROW_NUMBER () OVER (PARTITION BY buck_li.bucket_line_id ORDER BY item_vw.bsc_phlevel DESC)
                   rnk
                        FROM bsc_rpt_rebsum_s1 s1,
                             mn_bucket_line buck_li,
                             bsc_rpt_item_ph6_mv item_vw,
                             bsc_rpt_rebsum_struct_li_mv limv,
                             bsc_rpt_rebsum_mem_mv mem_mv,
                             bsc_rpt_rebsum_temp temp,
                             bsc_split_pmt SPLIT,
                             mn_prc_program prc
                       WHERE temp.contract_id = s1.contract_id
                         AND temp.rebate_pmt_id = s1.rebate_pmt_id
                         AND temp.rebate_pmt_id = SPLIT.rebate_pmt_id(+)
                         AND temp.rebate_pmt_id = buck_li.rebate_pmt_id
                         AND limv.prc_program_id = s1.prc_program_id
                         AND prc.prc_program_id = s1.prc_program_id
                         AND mem_mv.prc_program_id = s1.prc_program_id
                         AND mem_mv.parent_member_id = s1.member_id_cust
                         AND limv.cat_map_id = item_vw.cat_map_id
                         AND item_vw.item_id = buck_li.sale_item_id
                         AND mem_mv.child_member_id = buck_li.sale_contracted_cust_id
                         AND buck_li.pmt_benefit_id IS NOT NULL
                         AND buck_li.inclusion_type = 'INC') rpt
      WHERE rpt.rnk = 1
   GROUP BY rpt.sale_id,
            rpt.struct_doc_id,
            rpt.pdflg,
            rpt.member_id_cust,
            rpt.paid_end_date,
            rpt.payee,
            rpt.map_name,
            rpt.bsc_phlevel,
            rpt.bsc_phcode,
            rpt.bsc_old_material_num,
            rpt.sold_to_cust_id,
            rpt.member_grp_name,
            rpt.prc_program_id,
            rpt.bsc_r_af_type,
            rpt.rebate_pmt_id_num,
            rpt.rebate_pmt_id,
            rpt.sales_units,
            rpt.sales;
COLUMN_NAME          NUM_DISTINCT NUM_NULLS NUM_BUCKETS SAMPLE_SIZE HISTOGRAM
BSC_OLD_MATERIAL_NUM        55394     583842           1       57907 NONE
BSC_NUM_SHEETS                 28          0           1      641691 NONE
BSC_PHCODE                  69997     207552           1      434160 NONE
BSC_PHLEVEL                     6     207552           1      434160 NONE
BSC_PROFIT_CENTER              10          0           1      641691 NONE
BSC_COST                    14966      42821           1      598874 NONE
MAP_NAME                    61277          0           1      641691 NONE
CAT_MAP_ID                  72241          0           1      641691 NONE
ITEM_ID                     64299          0           1      641691 NONE               Still don't get it why 21 rows hasing with 641k - it looks a bad idea, maybe I should build histograms here?
Any idea.

Hash Join Anti NA

Another simple day at the office.....
What was the case.
A colleague approached me telling that he had two similar queries. One of them returning data, the other not.
The "simplified" version of the two queries looked like:
SELECT col1
FROM tab1
WHERE col1 NOT IN (SELECT col1 FROM tab2);This query returned no data, however he -and later on I also- was sure that there was a mismatch in the data, which should have returned rows.
This was also proven/shown by the second query:
SELECT col1
FROM tab1
WHERE NOT EXISTS
          (SELECT col1
             FROM tab2
            WHERE tab1.col1 = tab2.col1);This query returned the expected difference. And this query does in fact the same as the first query!!
Even when we hardcoded an extra WHERE clause, the result was the same. No rows for:
SELECT *
FROM tab1
WHERE tab1.col1 NOT IN (SELECT col1 FROM tab2)
       AND tab1.col1 = 'car';and the correct rows for:
SELECT *
FROM tab1
WHERE     NOT EXISTS
              (SELECT 1
                 FROM tab2
                WHERE tab1.col1 = tab2.col1)
       AND tab1.col1 = 'car';After an hour searching, trying to reproduce the issue, I almost was about to give up and send it to Oracle Support qualifying it as a bug.
However, there was one difference that I saw, that could be the cause of the problem.
Allthough the statements are almost the same, the execution plan showed a slight difference. The execution plan for the NOT IN query looked like:
Plan
SELECT STATEMENT ALL_ROWS Cost: 5 Bytes: 808 Cardinality: 2
3 HASH JOIN ANTI NA Cost: 5 Bytes: 808 Cardinality: 2
1 TABLE ACCESS FULL TABLE PIM_KRG.TAB1 Cost: 2 Bytes: 606 Cardinality: 3
2 TABLE ACCESS FULL TABLE PIM_KRG.TAB2 Cost: 2 Bytes: 404 Cardinality: 2 Whereas the execution plan of the query with the NOT EXISTS looked like:
Plan
SELECT STATEMENT ALL_ROWS Cost: 5 Bytes: 808 Cardinality: 2
3 HASH JOIN ANTI Cost: 5 Bytes: 808 Cardinality: 2
1 TABLE ACCESS FULL TABLE PIM_KRG.TAB1 Cost: 2 Bytes: 606 Cardinality: 3
2 TABLE ACCESS FULL TABLE PIM_KRG.TAB2 Cost: 2 Bytes: 404 Cardinality: 2 See the difference?
Not knowing what a "HASH JOIN ANTI NA" exactly was, I entered it as a search command into the knowledge base of My Oracle Support. Besides a couple of patch-set lists, I also found Document 1082123.1, which explains all about the HASH JOIN ANTI NULL_AWARE.
In this document the behaviour we saw is explained, with the most important remark being:
*'If t2.n2 contains NULLs,do not return any t1 rows and terminate'*
And then it suddenly hit me as I was unable to reproduce the case using my own created test tables.
In our case, it meant that if tab2.col1 would have contained any rows with a NULL value, the join between those two tables could not be made based on a "NOT IN" clause.
The query would terminate without giving any results !!!
And that is exactly what we saw.
The query with the NOT EXISTS doesn't use a NULL_AWARE ANTI JOIN and therefore does return the results
Also the mentioned workaround:
alter session set "_optimizer_null_aware_antijoin" = false;seems not to work. Allthought the execution plan changes to:
Plan
SELECT STATEMENT ALL_ROWS Cost: 4 Bytes: 202 Cardinality: 1
3 FILTER
1 TABLE ACCESS FULL TABLE PIM_KRG.TAB1 Cost: 2 Bytes: 606 Cardinality: 3
2 TABLE ACCESS FULL TABLE PIM_KRG.TAB2 Cost: 2 Bytes: 404 Cardinality: 2 it still returns no rows !!
And Now??
Since there is a document explaining the behaviour, I'm doubting if we can classify this as a bug. But in my opinion, if developers do not know about this strange behaviour, they will easily call it a bug.
The "problem" is easily solved ( or worked around ) using the NOT EXISTS solution, or using NVL with the JOINed columns. However I would expect the optimizer to sort these things out himself.
For anyone who wants to reproduce/investigate this case, I have listed my test-code.
The database version we used was 11.1.0.7 on Windows 2008 R2. I'm sure the OS doesn't matter here.
-- Create two tables, make sure they allow NULL values
CREATE TABLE tab1 (col1 VARCHAR2 (100) NULL);
CREATE TABLE tab2 (col1 VARCHAR2 (100) NULL);
INSERT INTO tab1
VALUES ('bike');
INSERT INTO tab1
VALUES ('car');
INSERT INTO tab1
VALUES (NULL);
INSERT INTO tab2
VALUES ('bike');
INSERT INTO tab2
VALUES (NULL);
COMMIT;
-- This query returns No results
SELECT col1
FROM tab1
WHERE col1 NOT IN (SELECT col1 FROM tab2);
-- This query return results
SELECT col1
FROM tab1
WHERE NOT EXISTS
          (SELECT col1
             FROM tab2
            WHERE tab1.col1 = tab2.col1);I've also written a blog entry with the above text at http://managingoracle.blogspot.com
Anyone who has the real explanation for this behaviour as in why the HASH-JOIN ANTI terminates. Please elaborate
Thanks
Kind Regards
FJFranken

fjfranken wrote:
SELECT col1
FROM tab1
WHERE col1 NOT IN (SELECT col1 FROM tab2);
SELECT col1
FROM tab1
WHERE NOT EXISTS
(SELECT col1
FROM tab2
WHERE tab1.col1 = tab2.col1);
These two queries are NOT logically equivalent - if there are any rows in tab2 where col1 is null then the first query SHOULD return no rows.
See http://jonathanlewis.wordpress.com/2007/02/25/not-in/ for an explanation of the difference between NOT EXISTS and NOT IN.
>
Anyone who has the real explanation for this behaviour as in why the HASH-JOIN ANTI terminates. Please elaborateThe purpose of the null-aware anti-join is to allow a NOT IN subquery that has to deal with the null problem run as an anti-join; historically it would HAVE to run as a FILTER SUBQUERY.
Regards
Jonathan Lewis

HASH JOIN Probe Residual

Similar Messages

Maybe you are looking for