Does hash partition distribute data evenly across partitions?

As per Oracle documentation, it is mentioned that hash partitioning uses oracle hashing algorithms to assign a hash value to each rows partitioning key and place it in the appropriate partition. And the data will be evenly distributed across the partitions. Ofcourse following following conditions :
1. Partition count should follow 2^n logic
2. Data in partition key column should have high cardinality.
I have used hash partitioning in some of our application tables, but data isn't distributed evenly across partitions. To verify it, i performed a small test :
Table script :
Create table ch_acct_mast_hash(
Cod_acct_no number)
Partition by hash(cod_acct_no)
PARTITIONS 128;
Data population script :
declare
i number;
l number;
begin
i := 1000000000000000;
for l in 1 .. 100000 loop
insert into ch_acct_mast_hash values (i);
i := i + 1;
end loop;
commit;
end;
Row-count check :
select count(1) from Ch_Acct_Mast_hash ; --rowcount is 100000
Gather stats script :
begin
dbms_stats.gather_table_stats('C43HDEV', 'CH_ACCT_MAST_HASH');
end;
Data distribution check :
Select min(num_rows), max(num_rows) from dba_tab_partitions
where table_name = 'CH_ACCT_MAST_HASH';
Result is :
min(num_rows) = 700
max(num_rows) = 853
As per the result, it seems there is lot of skewness in data distribution across partitions. Maybe I am missing something, or something is not right.
Can anybody help me to understand this behavior?
Edited by: Kshitij Kasliwal on Nov 2, 2012 4:49 AM

>
I have used hash partitioning in some of our application tables, but data isn't distributed evenly across partitions.
>
All keys with the same data value will also have the same hash value and so will be in the same partition.
So the actual hash distribution in any particular case will depend on the actual data distribution. And, as Iordan showed, the data distribution depends not only on cardinality but on the standard deviation of the key values.
To use a shorter version of that examle consider these data samples which each have 10 values. There is a calculator here
http://easycalculation.com/statistics/standard-deviation.php
0,1,0,2,0,3,0,4,0,5 - total 10, distinct 6, %distinct 60, mean 1.5, stan deviation 1.9, variance 3.6 - similar to Iordan's example
0,5,0,5,0,5,0,5,0,5 - total 10, distinct 2, %distinct 20, mean 2.5, stan dev. 2.64, variance 6.9
5,5,5,5,5,5,5,5,5,5 - total 10, distinct 1, %distinct 10, mean 5, stan dev. 0, variance 0
0,1,2,3,4,5,6,7,8,9 - total 10, distinct 10, %distinct 100, mean 4.5, stan dev. 3.03, variance 9.2
The first and last examples have the highest cardinality but only the last has unique values (i.e. 100% distinct).
Note that the first example is lower for all other attributes but that doesn't mean it would hash more evenly.
Also note that the last example, the unique values, has the highest variance.
So this is no single attribute that is controlling. As Iordan showed the first example has a high %distinct but all of those '0' values will hash to the same partition so even using a perfect hash the data would use 6 partitions.

Similar Messages

Hash Partitioning

Hi All,
Does hash partitioning always use the same hashing function, and will it always produce the same result if a new table is created with the same number of hash partitions hashed on the same field?
For example, I have to join a multi-million record data set to table1 this morning. table1 is hash partitioned on row_id into 32 partitions.
If I create a temp table to hold the data I want to join and hash partition it likewise into 32 partitions on row_id, will any given record from partition number N in my new table find its match in partition number N of table?
If so, that would allow us to join one partition at a time which performs exponentially better in the resource-contested environment.
I hope you can help.

Using 10gR2
Partition pruning does occur when joined to a global temporary table with hash partitioning. Providing the parimary key on the global temp table is the key used for hashing the relational table:
SQL> create table t (
2    a number)
3    partition by hash(a) (
4      partition p1 ,
5      partition p2 ,
6      partition p3 ,
7      partition p4
8    )
9 /
Table created.
SQL>
SQL> alter table t add (constraint t_pk primary key (a)
2 using index local (partition p1_idx
3                   , partition p2_idx
4                   , partition P3_idx
5                   , partition p4_idx)
6 )
7 /
Table altered.
SQL> insert into t (a) values (1);
1 row created.
SQL> insert into t (a) values (2);
1 row created.
SQL> insert into t (a) values (3);
1 row created.
SQL> insert into t (a) values (4);
1 row created.
SQL> commit;
Commit complete.
SQL>
SQL> create global temporary table tm (a number)
2 /
Table created.
SQL> insert into tm (a) values (2);
1 row created.
SQL> set autotrace traceonly explain
SQL> select tm.a from tm, t
2 where tm.a = t.a
3 /
Execution Plan
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1 Bytes=26)
   1    0   NESTED LOOPS (Cost=2 Card=1 Bytes=26)
   2    1     TABLE ACCESS (FULL) OF 'TM' (TABLE (TEMP)) (Cost=2 Card=
          1 Bytes=13)
   3    1     PARTITION HASH (ITERATOR) (Cost=0 Card=1 Bytes=13)
   4    3       INDEX (UNIQUE SCAN) OF 'T_PK' (INDEX (UNIQUE)) (Cost=0
           Card=1 Bytes=13)As you can see from the above, a full scan was performed on the global temp table, but partition pruning occured on TM. So, in theory, whatever data you load the global temp table with, will be matched to the partition.
P;

Product Revenue Bookings and Backlog Dashboard does not display any data

Product Revenue Bookings and Backlog Dashboard does not display any data even though the load completed successfully.
They are able to see just the parameters.
Not sure if the upgrade of the database from 9.2.0.6 to 10.2.0.3 is a factor.
What can I check?
Is there some table to verify that the data exists for display in the Product Revenue Bookings and Backlog Dashboard?
Screenshot is at:
https://gtcr.oracle.com/gtcr-dir/gtcr_5637/6415786.993/Product_Revenue_Bookings_Backlog_Dashboard.doc
Support suggested to create a new request set and run the initial load with load all summaries option; but there was no change in the Product Revenue Bookings and Backlog Dashboard.
Any ideas?

hi
We have faced the similar problem after the upgrade to 10G
What we did was
Ran the initial load of time dimension, Item setup request set, and the request set of all the dash board in the clear and initial load mode..
we were able to get the data once the clear and load is completed successfully
Regards
Ramesh Kumar S

How data is distributed in HASH partitions

Guys,
I want to partitions my one big table into 5 different partitions based on HASH value of the LOCATION field of the table.
My question is, Will the data be distributed equally in partitions or will end up in one partition or I need to have 5 diferent HASH value for location key to end up in five partitions.

Hash partitioning enables easy partitioning of data that does not lend itself to range or list partitioning. It does this with a simple syntax and is easy to implement. It is a better choice than range partitioning when:
1) You do not know beforehand how much data maps into a given range
2) The sizes of range partitions would differ quite substantially or would be difficult to balance manually
3) Range partitioning would cause the data to be undesirably clustered
4) Performance features such as parallel DML, partition pruning, and partition-wise joins are important
The concepts of splitting, dropping or merging partitions do not apply to hash partitions. Instead, hash partitions can be added and coalesced.
What I think that is, in your case list partitioning can be of choice.
http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14220/partconc.htm#i462869

How to accelerate by partitioning drives & how to distribute data among 'em

Dear forum,
I have read guide to storage acceleration and guides to phototoshop acceleration, but they always warn that the best solution depends on the work i do, the hardware i have, and the hardware i think i can afford to buy. I'm hoping that if i tell you what photoshop work i do, what hardware i have, and what hardware i'm intending to buy, you can tell me how to accelerate by partitioning my drives and how to distribute data among them. My biggest questions are about how big the volumes should be, and what should go on each volume. It sounds vague here, but I get more specific below:
THE PHOTOSHOP WORK I DO:
*wet-mount raw scans of 6x7 cm film using silverfast software on microtek artixscan 120tf 4000dpi scanner: resulting 16-bit TIFF file is typically 550 MB in size.
*working in Photoshop CS2 on same file, adding multiple layers makes file 1 GB to 1.4 GB in size
*my system's limitations show up most painfully when I OPEN a file (this can take five minutes) SAVE a file (this can take more than ten minutes!), when i FLATTEN the image's layers for printing (this can take 5 minutes), and when i CONVERT the file from 16-bit to 8-bit (this can take 5 minutes). most other operations in Ps CS2 are fast enough (not snappy, but fast enough) for me to stay with my current processor for the time being.
THE HARDWARE I HAVE:
*Power Mac G5 dual 1.8GHz, made in 2004, with only 4 slots for RAM (not 8 slots).
(I'm told this has quite limited bus speed, as compared with other dual-processor G5s, and that this hardware will not benefit much at all from adding a RAID array.)
*one internal seagate 80GB 7200rpm SATA drive. this is half-full (it has 39 GB on it): it holds my OS and my Users folder, but NOT my photoshop image files.
*one internal Western DIgital 400 GB 7200rpm SATA drive. this holds my photoshop image files, but not my user folder.(This WD drive turns out to cause the G5 to hang up occasionally, requiring a re-boot; to avoid this, i recently learned, i can connect it with a host card adapter [see below].)
*two 500 GB external firewire drives
*two 300GB external USB drives
*I have 2.25 GB of RAM, but I'm about to buy 2 more GB to max out at 4GB.
THE HARDWARE I'M INTENDING TO BUY:
*2GB of RAM, of course.
*two Hitachi T7K500 500 GB SATAII HD 16MB Cache 7200rpm drives to occupy both internal drive slots in the G5
*a 2-drive external enclosure to hold my old seagate 80GB drive and my old WD400GB drive.
*a seritek host card adaptor for connecting the external enclosure to the G5.
THE PLAN:
What follows is a combination of suggestions I have received about what I could do and my speculation about how I could do it. Please see my Questions, embedded in the lines below: I'd be very grateful for any amendments or directions you can offer on this topic.
Drive A: first newly internal Hitachi 500GB drive:
partition into 2 volumes:
first (faster) volume, "volume A1," of 100GB to hold OS and Users folder but NOT photoshop image files.
(Question: how much space should I leave free on volume A1 for optimum performance? is 50% free of 100GB optimal? is 60% free of 100GB better? Is 50% free of 150GB better still? or does that cut into the other volume's space too much (indirectly cutting into the space of "volume B1" on Drive B, which is to be the WorkDisk/ScratchDisk)?
second (slower) volume, "volume A2" of remainder GB (almost 400GB) as backup for 400GB "volume B1" of the OTHER internal Hitachi Drive, a.k.a. Drive B.
Drive B: second newly internal Hitachi 500GB drive:
partition into 2 volumes:
first (faster) volume, "volume B1" of almost 400GB as designated WorkDisk/ScratchDisk for large photoshop image files;
second (slower) partition "volume B2" (exactly 100GB) as backup for 100GB volume 1 (OS volume) of the OTHER internal Hitachi Drive, a.k.a. Drive A.
(Question: how much space should I leave free on this WorkDisk/ScratchDisk for optimum performance? is 50% free of almost 400GB optimal? is 60% free of almost 400GB better? Is 50% free of 300GB just as good, with the additional advantage of indirectly allowing "volume A1" on Drive A to be 150+GB?
Drive C: old Seagate 80GB drive, in external enclosure: disk designated for running the Photoshop Application? How would I set this up? any pitfalls to watch out for? should i partition this drive, or leave the whole thing for Photoshop? or is it better to run photoshop off Drive D?
Drive D: old WD 400 GB Drive: second scratch disk? Storage disk? Both storage and scratch disk? how large should an empty volume on this disk be in order to be useful as a scratch disk? volume 1 or volume 2? if i run the Photoshop Application off of this drive, how large should the volume for that be? should it be volume 1, the faster, outside volume, leaving volume 2 for scratch disk space? or vice versa?
External Firewire and USB drives: i guess i'll just use them for storage/archiving and extra backup? or am i much safer buying more SATAs and Enclosures? or are the external firewire and USB drives plenty safe (so long as i double-back up), since i'll only power them up for the data transfer, and then power them back down?
Given that the large Photoshop files are not in my User folder, does it matter whether i keep the User folder (with its MS Word docs and a bunch of PDFs and so on) on my OS volume, "volume A1"? would it speed things up when I'm using photoshop if i moved the Users folder to another drive? what if i'd like to play iTunes while also working on photoshop? my iTunes music folder (with all the song data) is already on an external firewire drive. but the iTunes Library and iTunes application are, of course, in my User folder, which is on the OS drive. would moving the Users folder to another drive make much difference when i use photoshop and iTunes simultaneously?
But I wonder whether it makes sense to be using volume A2 on Drive A as a backup drive: wouldn't it make more sense to back up my working files to two external drives that can be traded out, one on-site and one off-site, back and forth (not so convenient when one of the backup drives is internal!)? and after all, why would i devote a 400GB volume to the task of backing up another 400GB volume that will never be more than half full? I need to leave a WorkDisk/ScratchDisk half empty for efficient use, but i can back up that 200GB of working files on a 200GB volume, right? so for a backup drive, I might as well use a slow, inexpensive external USB drive that will only be tuned on for backup and will then stay powered off, a drive that's easily transportable on and off site, right? or am i misunderstanding something?
by the way, what backup software do you recommend for backing up back and forth between Drive A and Drive B? I've been using Carbon Cpy Cloner. do you recommend that? or something that does more archiving of progressive states of data?
Thank you for any help you can offer!
Sincerely,
Mark Woods
Dual 1.8 GHz PowerPC G5 (2.2), 512 KB L2 Cache per CPU, w/ 4 RAM slots Mac OS X (10.3.9) 2.25 GB DDR SDRAM (2x128MB plus 2x1GB)

Crossposted: Re: How to use Oracle partitioning with JPA @OneToOne reference?

Best Way to Load Data in Hash Partition

Hi,
I have partitioning by Hash on a Large Table of 5 TB. We have to load Data say more than 500GB daily on that table from ETL.
What is the best way to Load data into that Big Table which has hash Partition .
Regards
Sahil Soni

Do you have any specific requirements to match records to lookup tables or it just a straight load - that is an insert?
Do you have any specific performance requirements?
The easiest and fastest way to load data into Oracle is via external file and parallel query/parallel insert. Remember that parallel DML is not enabled by default and you have to do so via alter session command. You can leverage multiple CPU cores and direct path operation to perform the load.
Assuming your database is on a linux/unix server - you could NFS load the file if it is on a remote system, but then you will be most likely limited by network transfer speed.

A question on Hash Partition

Hi,
I'm facing a proble.My table has 16 partition and all the partitions are hash partitioned.
But i found only one partition is being populated heavily rather than the other ones it near to 3-4 times of other partitions.
My database version is 9i.
Can anyone suggest me in this.
Thanks in advance
say my table structure like this ..
CREATE TABLE TAB1(COL1 NUMBER,COL2 VARCHAR2(10),COL3 VARCHAR(10));
PARTITON BY HASH(COL3)
PARTITION P1 TABLESPACE TS1
PARTITION P16 TABLESPACE TS1
And i have only one index i.e
create index indx on tab1(col1,col2,col3);
Edited by: bp on Feb 17, 2009 4:40 AM

bp wrote:
My table has near 1000 million data as it is a history table.
Partition_key (col3) has distinct 926 values.
One thing is sure as the cardinality of col3 is very low in comparison with the amount of data in the table data is not evenly distributed.
Now another problem is one value (say col3 =1) of col3 from the 926 distinct values only goes to p16 partition but surprisingly this is not going to any other partitons.We have no other objects on this table to control the flow of data between partitions.
I really counld not find any reason of such behaviour.I'm not sure if I understand what you attempt to describe. You mean to say that in partition p16 there is only one value of COL3 found, and this partition holds more rows than the other partitions. Whereas the remaining partitions cover more COL3 values but hold less rows.
A single COL3 value always maps to the same hash value, so if you don't change the number of hash partitions and cause a "rebalancing" the same value should always map to the same partition (it still does after rebalancing but it might be a different partition now). You might be unlucky that there is currently no other value than "1" that maps to the same hash value. You could think about adding/removing hash partitions to change the distribution of the rows, but this could be a quite expensive operation given the amount of data in your table.
Are these 926 distinct values evenly distributed or is the data skewed in this column? Your description suggests that the data is skewed if a single value in a partition holds more rows than the other partitions that cover multiple values.
You could do a simple
SELECT COUNT(*), COL3
FROM TAB
GROUP BY COL3
ORDER BY COUNT(*) DESC
to find this out, or check the column statistics if there is an histogram on that column describing the column skew. If that query takes too long use a SAMPLE size (...FROM TAB SAMPLE (1)...) indicates a 1 percent sampling. You need to scale the counts then by the sampling factor.
Regards,
Randolf
Oracle related stuff blog:
http://oracle-randolf.blogspot.com/
SQLTools++ for Oracle (Open source Oracle GUI for Windows):
http://www.sqltools-plusplus.org:7676/
http://sourceforge.net/projects/sqlt-pp/

Uneven distribution in Hash Partitioning

Version :11.1.0.7.0 - 64bit Production
OS :RHEL 5.3
I have a range partitioning on ACCOUNTING_DATE column and have 24 monthly partitions.
To get rid of buffer busy waits on index, i have created global partitioned index using below ddl
DDL :
CREATE INDEX IDX_GL_BATCH_ID ON SL_JOURNAL_ENTRY_LINES(GL_BATCH_ID)
GLOBAL PARTITION BY HASH (GL_BATCH_ID) PARTITIONS 16 TABLESPACE OTC_IDX PARALLEL 8 INITRANS 8 MAXTRANS 8 PCTFREE 0 ONLINE;After index creation, i realized that only one index hash partition got all rows.
select partition_name,num_rows from dba_ind_partitions where index_name='IDX_GL_BATCH_ID';
PARTITION_NAME                   NUM_ROWS
SYS_P77                                 0
SYS_P79                                 0
SYS_P80                                 0
SYS_P81                                 0
SYS_P83                                 0
SYS_P84                                 0
SYS_P85                                 0
SYS_P87                                 0
SYS_P88                                 0
SYS_P89                                 0
SYS_P91                                 0
SYS_P92                                 0
SYS_P78                                 0
SYS_P82                                 0
SYS_P86                                 0
SYS_P90                         256905355As far as i understand, HASH partitioning will distribute evenly. By looking at above distribution, i think, i did not benefit of having multiple insert points using HASH partitioning as well.
Here is index column statistics :
select TABLE_NAME,COLUMN_NAME,NUM_DISTINCT,NUM_NULLS,LAST_ANALYZED,SAMPLE_SIZE,HISTOGRAM,AVG_COL_LEN from dba_tab_col_statistics where table_name='SL_JOURNAL_ENTRY_LINES' and COLUMN_NAME='GL_BATCH_ID';
TABLE_NAME                     COLUMN_NAME          NUM_DISTINCT NUM_NULLS LAST_ANALYZED        SAMPLE_SIZE HISTOGRAM       AVG_COL_LEN
SL_JOURNAL_ENTRY_LINES         GL_BATCH_ID                     1          0 2010/12/28 22:00:51    259218636 NONE                      4

It looks like that inserted data has always the same value for the partitioning key: it is expected that in this case the same partition is used because
>
For optimal data distribution, the following requirements should be satisfied:
Choose a column or combination of columns that is unique or almost unique.
Create multiple partitions and subpartitions for each partition that is a power of two. For example, 2, 4, 8, 16, 32, 64, 128, and so on.
>
See http://download.oracle.com/docs/cd/E11882_01/server.112/e16541/part_avail.htm#VLDBG1270.
Edited by: P. Forstmann on 29 déc. 2010 09:06

Cost to change hash partition key column in a history table

Hi All,
I have the following scenario.
We have a history table in production which has 16 hash partitions on the basis of key_column.
But the nature of data that we have in history table that has 878 distinct values of the key_column and about 1000 million data and all partitons are in same tablespace.
Now we have a Pro*C module which purges data from this history table in the following way..
> DELETE FROM hsitory_tab
> WHERE p_date < (TO_DATE(sysdate+1, 'YYYYMMDD') - 210)
> AND t_date < (TO_DATE(sysdate+1, 'YYYYMMDD') - 210)
> AND ROWNUM <= 210;
Now (p_date,t_data are one of the two columns in history table) data is deleted using thiese two date column conditions but key_column for partition is different.
So as per aboove statement this history table containd 6 months data.
DBA is asking to change this query and use partiton date wise.Now will it be proper to change the partition key_column (the existing hash partiton key_column >have 810 distinct values) and what things we need to cosider to calculate cost behind this hash partition key_column cahange(if it is appropriate to change >partition )key_column)Hope i explained my problem clearly and waiting for your suggestions .
Thanks in advance.

Hi Sir
Many thanks for the reply.
For first point -
we are in plan to move the database to 10g after a lot of hastle between client.For second point -
If we do partition by date or week we will have 30 or 7 partitions .As suggested by you as we have 16 partitions in the table best approach would be to have >partition by week then we will have 7 partitions and then each query will heat 7 partitions .For third point -
Our main aim to reduce the timings of a job(a Pro*C program) which contains the following delete query to delete data from a history table .So accroding to the >query it is deleting data every day for 7 months and while deleting it it queries this hug etable by date.So in this case hash partition or range partiton or >hash/range partition which will be more suitable.
DELETE FROM hsitory_tab
WHERE p_date < (TO_DATE(sysdate+1, 'YYYYMMDD') - 210)
AND t_date < (TO_DATE(sysdate+1, 'YYYYMMDD') - 210)
AND ROWNUM <= 210;I have read in hash partition is used so that data will be evenly distributed in all partitions (though it depends on nature of data).In my case i want some suggestion from you to take the best approach .

What is the significance of Hash Partition ?

Hi All,
First time i am going to implement Hash partition as well as subpartition also.before implementing i have some query regarding that.
1.What is the Max no. of partition or sub partition we can specify and default ?
2.How do we know whch data comes under whch Hash partition ? i mean suppose incase of range partition based on specified range we are able to know what data comes under what partition. Same incase of List partition.
Does anyone have any idea.
Thanks n advance.
Anwar

1. Take a look here : http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14237/limits003.htm
2. Take a look here : Re: Access to HASH PARTITION
Nicolas.
Correction of link
Message was edited by:
N. Gasparotto

Hash partitions+pruning+star transformation

Hi there,
We are considering partitioning our fact table on product_id (hash) range and list not suitable for us, most queries by product.
Couple of questions
1) does oracle create the hash partitions automaticall, if say hash by quantity what happens if later decide need more partitions - keen to keep maintenance to minimum
2) Will partition pruning work
simplified structure of fact table
sales_qty
product_id
customer_id
day_id (date of sale)
simple query
select sum(sale_qty)
from sales, products, customers
where sales.product_id = product.product_id
and sales.customer_id = customer.customer_id
and product.creation_week between 200901 and 200952
all id's in the fact table are surrogate id's and are simply the dimesnion keys of the realetd dimensions, therefore users won't query omn these columns directly.
If we hash partition on product_id and database parameter star_transformation enabled will this give us query performance benefits (i.e. partition pruning).
Many Thanks

Hi there,
We are considering partitioning our fact table on product_id (hash) range and list not suitable for us, most queries by product.
Couple of questions
1) does oracle create the hash partitions automaticall, if say hash by quantity what happens if later decide need more partitions - keen to keep maintenance to minimum
2) Will partition pruning work
simplified structure of fact table
sales_qty
product_id
customer_id
day_id (date of sale)
simple query
select sum(sale_qty)
from sales, products, customers
where sales.product_id = product.product_id
and sales.customer_id = customer.customer_id
and product.creation_week between 200901 and 200952
all id's in the fact table are surrogate id's and are simply the dimesnion keys of the realetd dimensions, therefore users won't query omn these columns directly.
If we hash partition on product_id and database parameter star_transformation enabled will this give us query performance benefits (i.e. partition pruning).
Many Thanks

Design capture of hash partitioned tables

Hi,
Designer version 9.0.2.94.11
I am trying to capture from a server model where the tables are hash partitioned. But this errors because Designer only knows about range partitions. Does anyone know how I can get Designer to capture these tables and their constraints?
Thanks
Pete

Pete,
I have tried all three "current" Designer clients 6i, 9i, and 10g, at the "current" revision of the repository (I can post details if interested). I have trawled the net for instances of this too, there are many.
As stated by Sue, the Designer product model does not support this functionality (details can be found on ORACLE Metalink under [Bug No. 1484454] if you have access), if not, see excerpt below. It appears that at the moment ORACLE have no urgent plans to change this (the excerpt is dated as raised in 2001 and last updated in May 2004).
Composite partitioning and List partitioning are equally affected.
>>>>> ORACLE excerpt details STARTS >>>>>
CDS-18014 Error: Table Partition 'P1' has a null String parameter
'valueLessThan' in file ..\cddo\cddotp.cpp function
cddotp_table_partition::cddotp_table_partition and line 122
*** 03/02/01 01:16 am ***
*** 06/19/01 03:49 am *** (CHG: Pri->2)
*** 06/19/01 03:49 am ***
Publishing bug, and upping priority - user is stuck hitting this issue.
*** 09/27/01 04:23 pm *** (CHG: FixBy->9.0.2?)
*** 10/03/01 08:30 am *** (CHG: FixBy->9.1)
*** 10/03/01 08:30 am ***
This should be considered seriously when looking at ERs we should be able to
do this
*** 05/01/02 04:37 pm ***
*** 05/02/02 11:44 am ***
I have reproduced this problem in 6.5.82.2.
*** 05/02/02 11:45 am *** ESCALATION -> WAITING
*** 05/20/02 07:38 am ***
*** 05/20/02 07:38 am *** ESCALATED
*** 05/28/02 11:24 pm *** (CHG: FixBy->9.0.3)
*** 05/30/02 06:23 am ***
Hash partitioning is not modelled in repository and to do so would require a
major model change. This is not feasible at the moment but I am leaving this
open as an enhancement request because it is a much requested facility.
Although we can't implement this I think we should try to detect 'partition by
hash', output a warning message that it is not supported and then ignore it.
At least then capture can continue. If this is possible, it should be tested
and the status re-set to '15'
*** 05/30/02 06:23 am *** (CHG: FixBy->9.1)
*** 06/06/02 02:16 am *** (CHG: Sta->15)
*** 06/06/02 02:16 am RESPONSE ***
It was not possible to ignore the HASH and continue processing without a
considerable amount of work so we have not made any changes. The existing
ERROR message highlights that the problem is with the partition. To enable
the capture to continue the HASH clause must be removed from the file.
*** 06/10/02 08:32 am *** ESCALATION -> CLOSED
*** 06/10/02 09:34 am RESPONSE ***
*** 06/12/02 06:17 pm RESPONSE ***
*** 08/14/02 06:07 am *** (CHG: FixBy->10)
*** 01/16/03 10:05 am *** (CHG: Asg->NEW OWNER)
*** 02/13/03 06:02 am RESPONSE ***
*** 05/04/04 05:58 am RESPONSE ***
*** 05/04/04 07:15 am *** (CHG: Sta->97)
*** 05/04/04 07:15 am RESPONSE ***
<<<<< ORACLE excerpt details ENDS <<<<<
I (like I'm sure many of us) have an urgent immediate need for this sort of functionality, and have therefore resolved to looking at some form of post process to produce the required output.
I imagine that it will be necessary to flag the Designer meta-data content and then manipulate the generator output once it's done its "raw" generation as a RANGE partition stuff (probably by using the VALUE_LESS_THAN field as its mandatory, and meaningless for HASH partitions!).
An alternative would be to write an API level generator for this using the same flag, probably using PL/SQL.
If you have (or anyone else has) any ideas on this, then I'd be happy to share them to see what we can cobble together in the absence of an ORACLE interface to their own product.
Peter

Hash partition algorithm

If I hash partition a table on CUSTOMER_ID into say p partitions. I receive a daily batch feed of flat file transaction records that contain CUSTOMER ID. I need to split the batch of incoming source records into p parts and each part should correspond to one of the p partitions. I can do this if I am able to execute the same hash algorithm with CUSTOMER ID as parameter which will give me a number between 1 and p. Now I know the partition that oracle has asiigned this CUSTOMER ID to and therefore I can distribute the batch records amongst parallel threads with affinity between threads and oracle table partitions.
Can anybody let me know if the hash algorithm is avalable to call? Is it available in any package?

I hope I understood well your requirement : you want to divide the input file in 3 files corresponding to the partitions, right ?
Since your partitioned table is based on hash algorithm nothing obvious.
But since you are doing update only, you could have a pre-check in the database to know which partition a row is in : based on the partition key read from the input file, write 1 file for each partition accordingly. And then process with your 3 batches running on the different partitions based on their own file. It will require one full scan of the input file before processing, so I don't know how much gain you could hope from such thing though.
Nicolas.

Hash partitioning v. list partitioning on surrogate key + partition pruning

Hi,
Have a large fact table with surrogate keys, therefore queries are of form
select dimension.attribute..
from fact, dima, dimb..
where facta.dima_surrogate_key = dima.dimension_key
and facta.dimb_surrogate_key = dimb.dimension_key
and dima.attribute = <value>
and dimb.attribute = <value>
Would ideally like partition pruning to happen but will this happen if hash partition on facta.surrogate_key
Likewise could list partition on facta.dima_surrogate_key and further sub-partition on hash of factb.dima_surrogate_key.
Any advice much appreciated.

user5716448 wrote:
Hi,
Version 11.2.0.1
fact table structure
PRODUCT_ID NUMBER
RETAILER_ID NUMBER
OUTLET_ID NUMBER
CALENDAR_ID NUMBER
BRANCH_ID NUMBER
PUBLISHER_ID NUMBER
DISTRIBUTOR_ID NUMBER
TRANS_TYPE_ID NUMBER
TRANS_QTY NUMBER (10)
TRANS_VALUE (10,4)
No date on fact table (just surrogate_id for calendar whihc links to calendar/date dimension.
Although queries can be by date of transaction, most aren't.
Potential to grow to 3 billion rows.
Considering hash partitioning on the product_id, simply to break data down and product_id is the largest dimension.About hash partitioning – in this case it is probably all about the ability to run in parallel. Do not have any info on that, so I cannot comment further.
>
sqls are varied, lots of different types some query all dimensions, sometimes a few. Not the straightforward date examples in the manual.You can pick a dimension that is frequently used by the SQLs. I understand that there is no perfect one, but even if you pick just a “good” one you might have a good deal of partition elimination.
>
Users run 3rd part ad-hoc reporting layer which has to allow them to report against the star in any way they want.
Star transformation hint enabled. Have heard in deciding number of hash partitions, partition size should geneerally be < 2gb.
e.g transactions for a given product for customers belonging to a given multiple in a given week
select trans_qty, trans_value, m.prod_name, m.prod_num, r.cust_name, w.branch_name, rtt.trans_date, rtt,trans_type
from retailer_transaction rt, media m, wholesaler w, calendar c, retailer r, trans_type rtt
where rt.issue_id = m.dimension_key
and m.prod_num = 600
and rt.branch_id = w.dimension_key
and rt.outlet_id = r.dimension_key
and r.multiple_num = 700
and rt.calendar_id = calendar.dimension_key
and m.issue_year_week = 201110
and rt.trans_type_id = rtt.dimension_keyLastly, you need to focus on weather and how to partition your indexes (I assume you have bunch of bitmaps). This decision is at least as important as partitioning the table.

IOT or Hash partition

Hi all,
I want to insert large data into a table to be retreived later using a key column (like emp no).
To the performance point of view, which is more efficient: IOT (Index Organized Table) or Hash Partition ?

I highly appreciate your time Justin. Your explanation clarified many things to me.
However, I have small notes on your comments:
Firt:
<<IOT's tend to be useful when you have thin, tall tables (many rows, few columns) where you always want to retrieve all the rows.>>
Regarding this claim, I referred to the following sources:
1. Sybex-Oracle9i Performance Tuning book
"If you access the table using its primary key, an IOT will return the rows more quickly than a traditional table."
2. http://www.tlingua.com/articles/iot.html
For single row fetch,"IOTs could provide a substantial performance gain as well as reducing the demand for disk drives"
For Index Range Scans,"IOTs significantly outperform the standard B-tree/table model during index range scans."
3. Oracle9i Database Administratorâs Guide Release 2 (9.2)
"Index-organized tables are particularly useful when you are using applications that must retrieve data based on a primary key."
As you can see Justin, none of them mentioned the thin-tall-table fact. Did you obtain it from practical experience or from some source?
Also they all showed that IOT is most useful when retreiving based on PK.
Second:
"In general, partitioning works best when you are doing set-based processing where you can use partition elimination to concentrate on a particular subset of the data."
In Sybex-Oracle9i Performance Tuning book it is stated that "Hash partitions work best when applications retrieve the data from the partitioned table via the unique key. Range lookups on hash partitioned tables derive no benefit from the partitioning.".
I can see there is some confilict, isn't it?
Thanks in advance.

Does hash partition distribute data evenly across partitions?

Similar Messages

Maybe you are looking for