Warehouse partitioning - performance of queries across multiple partitions?

Hi,
We are using Oracle 11.2.0.3 and have a large central fact table with several surrogate ids which have bitmap indexes on them and have fks looking at dimension tables + several measures
(PRODUCT_ID,
CUSTOMER_ID,
DAY_ID,
TRANS_TYPE_ID,
REGION_ID,
QTY
VALUE)
We have 2 distinct sets of queries users look to run for most part, ones accessing all transactions for products regradless of the time those transactions happened (i.e. non-financial queries - about 70%,
queries determining what happened in a particular week - 20% of queries.
Table will have approx 4bn rows in eventually.
Considering adding extra column to this DATE and range partition this to allow us to drop old partitions every year - however this data wouldn't be joined to any other table.
Then considering sub-partitioning by hash of product_id which is surrogate key for product dimension.
Thoughts on performance?
Queries by their nature would hit several sub-partitions.
Thoughts on query performance of queries which access several sub-partitions/partitions versus queries running aganist a single table.
Any other thoughts on partitioning strategy in our situation much apprecaited.
Thanks

>
Thoughts on query performance of queries which access several sub-partitions/partitions versus queries running aganist a single table.
>
Queries that access multiple partitions can improve performance for two use cases: 1) only a subset of the entire table is needed and 2) if the access is done in parallel.
Even if 9 of 10 partitions are needed that can still be better than scanning a single table containing all of the data. And when there is a logical partitioning key (transaction date) that matches typical query predicate conditions then you can get guaranteed benefits by limiting a query to only 1 (or a small number) partition when an index on a single table might not get used at all.
Conversely, if all table data is needed (perhaps there is no good partition key) and parallel option is not available then I wouldn't expect any performance benefit for either single table or partitioning.
You don't mention if you have licensed the parallel option.
>
Any other thoughts on partitioning strategy in our situation much apprecaited.
>
You provide some confusing information. On the one hand you say that 70% of your queries are
>
ones accessing all transactions for products regradless of the time those transactions happened
>
But then you add that you are
>
Considering adding extra column to this DATE and range partition this to allow us to drop old partitions every year
>
How can you drop old partitions every year if 70% of the queries need product data 'regardless of the time those transactions happened'?
What is the actual 'datetime' requirement'? And what is your definition of 'a particular week'? Does a week cross Month and Year boundaries? Does the requirement include MONTHLY, QUARTERLY or ANNUAL reporting?
Those 'boundary' requirements (and the online/offline need) are critical inputs to the best partitioning strategy. A MONTHLY partitioning strategy means that for some weeks two partitions are needed. A weekly partitioning strategy means that for some months two partitions are needed. Which queries are run more frequently weekly or monthly?
Why did you mention sub-partitioning? What benefit do you expect or what problem are you trying to address? And why hash? Hash partitioning guarantees that ALL partitions will be needed for predicate-based queries since Oracle can't prune partitions when it evaluates execution plans.
The biggest performance benefit of partitioning is when the partition keys used have a high correspondence with the filter predicates used in the queries that you run.
Contrarily the biggest management benefit of partitioning is when you can use interval partitioning to automate the creation of new partitions (and subpartitions if used) based solely on the data.
The other big consideration for partitioning, for both performance and management, is the use of global versus local indexes. WIth global indexes (e.g. a global primary key) you can't just drop a partition in isolation; the global primary key needs to be maintained by deleting the corresponding index entries.
On the other hand if your partition key includes the primary key column(s) then you can use a local index for the primary key. Then partition maintenance (drop, exchange) is very efficient.

Similar Messages

Load balancing across multiple machines

I am looking for assistance in configuring Tuxedo to perform load balancing across
multiple machines. I have successfully performed load balancing for a service
across different servers hosted on one machine but not to another server that's
hosted on a different machine.
Any assistance in this matter is greatly appreciated.

Hello, Christina.
Load balancing with multiple machines is a little bit different than
in the same machine. One of the important resource in this kind
of application is network bandwidth, so tuxedo tries to keep the
traffic among the machines as low as possible. So, it only
balance the load (call services in other machine) in case all the
services are busy in the machine where they are call.
I mean, if you have workstation clients attached only to one
machine, then tuxedo will call services in this machine untill
all servers are busy.
If you want load balancing, try to put one WSL in each machine,
and the corresponding configuration in your WSC ( with the | to
make tuxedo randomly choose one or the other) or spread your
native clients among all the machines.
And so, be carefull with the routing!
Ramón Gordillo
"Christina" <[email protected]> wrote:
>
I am looking for assistance in configuring Tuxedo to perform load balancing
across
multiple machines. I have successfully performed load balancing for a
service
across different servers hosted on one machine but not to another server
that's
hosted on a different machine.
Any assistance in this matter is greatly appreciated.

Filestream Partitioning across multiple drives

I have a SQL 2008 R2 ENT database with the single [PRIMARY] filegroup, and a single FilestreamGroup. The filestream has millions of records, cannot be restored, and is about to exceed the drive space limit.
The table with the single filestream column has a primary key column that is also the Cluster index key. There is a Full Text index and several foreign key constraints to this table's primary key. All must be disabled prior to dropping and rebuilding
the index for partitioning (tried and tested).
The filestream must be spread across multiple drive letters, and have multiple partitions on each drive, to facilitate file-restores within SLA. Due to its size, it may exceed the weekend maintenance window, and therefore must be done ONLINE to allow
the business to save new documents while the rebuild is in operation.
How should I cobble this up into Filegroups / Files? A data filegroup per drive. What is Best Practice for the filestream?

I have never worked with it, but it seems very logical. If you create a partition that says that some data should be in other partition, the data has to be moved to that partition. And, yes, it has to remain in the old partition as well, in case you do a
restore to point-in-time. This is no different than if you just delete a row.
To get rid of the rows in the old partition, you need backup the transaction log, checkpoint and backup the log again, if memory serves.
As for the font issue, the editor in the web UI stinks. That's one reason I stick to the NNTP bridge.
Erland Sommarskog, SQL Server MVP, [email protected]

OCFS2 Partition across multiple Disks

Does anyone know if you can create an ocfs2 partition that spans multiple drives - I tried creating an LVM - but this will not work for sharing - Is a Clustered Volume Group Needed ?

OCFS2 is like any file system and functions in the same manner with respect to implementations.. Meaning, if you can create a file system or a volume across multiple disks (LUN) using any third party file system you can do so with OCFS2 as well.
I have done implementations (pre ASM) with OCFS on EMC LUN's Format the the entire LUN as it was a single disk and mount it.

Can ARD fetch reports automatically across multiple partitions, without being in a particular partition?

I'm trying to run reports across multiple client computers that have data on 2-3 partitions per Mac. Is there anyway ARD can report to me data across partitions and not just the current partition the Mac is on? What I have to do now is restart into every partition and get ARD data off, too time consuming.
Also, where does ARD store it's reporting data?

Partitions are... old school. And it will interfere with Startup Disk, with OS X.
You could have used Windows 7 entirely and no sign of OS X (which can be installed and booted from external drives).
I like to use multiple hard drives: boot drive for system, another drive for data, and also backup, scratch etc.
If you use Windows 7 system image backup you should be able to restore any system in reasonable time and manner.

Capture performance metrics across multiple servers

Hello. I'm still very new to Powershell but anyone know of a good Powershell v.3 -4 script that can capture performance metrics across multiple servers with an emphasis on HPC (high performance computing) and gen up a helpful report, perhaps in HTML or Excel
format?
Closest thing I've found and used is this line of powershell:
http://www.microsoftpro.nl/2013/11/21/powershell-performance-monitor-on-multiple-remote-computers/
Maybe figure out a way to present that in better format, such as HTML or Excel.
Also, if someone can suggest some performance metrics to look at with an HPC perspective. For example, if a CPU is running at 100 utilization, figure out if which cores are running high, see how many threads are queued waiting for CPU time, etc...

As far as formatting is concerned,
ConvertTo-HTML is a basic HTML output format, but you can spice it up as much as you like:
http://technet.microsoft.com/en-us/library/ff730936.aspx
Out-Grid is very functional and pretty simple:
http://powertoe.wordpress.com/2011/09/19/out-gridview-now-has-a-passthru-parameter/
Here's an example with Excel:
Excel
Worksheets Example
This might be a good reference for HPC, I don't have access to an HPC environment so I can't offer much advice there.
http://technet.microsoft.com/en-us/library/ff950195.aspx
It might be better to keep unrelated questions separate, so a thread doesn't focus on one question and you lose time getting an answer to another.
I hope this post has helped!

How do you share Aperture file across multiple users on same Mac?

How do you share Aperture file across multiple users on same Mac? Seems this should be a preferences choice.

When you share your library between users, you may run into permission and ownership problems, if both users are editing the Aperture library and not only reading it. To avoid that, it helps to put the Aperture library onto a separate disk or a separate partion of your hard drive. For s separate partition or disk you can enable the "ignore ownership on this volume" flag. Then all users can access the library as owners of this library.
You might try to put the aperture library into a shared folder on your mac, but that has caused problems recently, i.e. when the library also contains video files.
Regards
Léonie

Outer joins across multiple databases

I'm trying to join three tables: 2 from data warehouse and 1 from Siebel database in OBIEE. I create the physical and logical joins (no errors or warnings). The model looks like this: Account Address (table 1) -> Account (table 2) -> Siebel Customer Data (table 3)
I create two queries in Answers:
(1) Account.Number and Siebel.Address (the query works)
(2) Account Address.Zip, Account.Number, and Siebel.Address (the query gets the following error):
State: HY000. Code: 10058. [NQODBC] [SQL_STATE: HY000] [nQSError: 10058] A general error has occurred. [nQSError: 42019] Join condition, D901.c3 <> 0, contains predicates that are currently not supported for outer joins across multiple databases. (HY000)
Can someone help me understand why the join between the two databases works for the first query but when I add a second table it fails?
thanks...

Either you can link the Oracle DB into Access (by using File/Get External Data/Link Tables)
or you have to simulate the join in Java by doing two separate queries and looping through the results.

Process Chains across multiple BW instances?

We're considering to implement a "CIF" (corporate information factory) approach for BW, using mutiple BW layers (staging, storage, analysis, across regions). It seems to make sense from a performance management perspective.
However one concern is the administration / monitoring.
Can standard BW tools (e.g. process chains, admin cockpit) customized to be used across multiple BW instances (plus ideally OLTP reporting, which we also have)? Or would we end up having to log in to each system individually? My gut / understanding tells me it is the latter one, but would appreciate any thoughts.
Thanks!

Hi Ingo,
Thanks for the information. I too have same scenario to work out.
Is there any possibility of automatic Source/server name conversion from system A to System B while copying all the Universes and Queries.
For Example: In BI we have an option to datasource server name change convention while transporting from one system to another system through out the landscape.
As you said in your reply, in life cycle manager, do we need to edit connection for each & every universe or any option for one and all universes.
Any documentation on this is appreciable...
Please provide me information.
Thanks in Advance.
Regards,
Ravi Kanth

Time Machine to back up across multiple disks?

I've run out of space on my Time Machine disk because I have a lot of video stored on my system. (It's a 2TB drive and it's maxed out)
Is there a way to set up to share the Time Machine back up across multiple drives? If not, any recommendations for how to address this issue?
Thanks for any assistance!

No, Time Machine backups cannot span multiple volumes.
If you're doing a lot of video editing, and a lot of the backup space is taken up by intermediate versions, there are ways to minimize that.
If you just have too much video, your best bet may be to get an additional external HD, and use a different app to back up the video files to it, and exclude them from being backed-up by Time Machine.
Tell us a bit more about your setup -- how many drives/partitions, and how large, is Time Machine backing-up, and how much of that is video?

APO gATP vs R/3 ATP - To check sales order ATP across multiple plants

Hi There,
I am trying to evaluate gATP functionality for SD sales orders.
The primary requirement is to have sales order ATP checking take place across multiple plants.
E.G.
Sales order line is entered for qty 100
60 is available in plant A, 40 is available in plant B
System checks both plants and creates 2 lines - one for delivery from plant A and one for delivery from plant B
(we are currently heading down the road of writing ABAP to do this 'multi-plant' check in R/3 but the more complex the requirements get the more interested I am in understanding more about APO/gATP)
I would like to understand the benefit of implementing APO / gATP as opposed to using standard R/3 ATP and perhaps writing custom ABAP code to search for inventory across multiple plants.
I would appreciate any insight regarding what is required to setp gATP to perform such checking and any other feedback regarding this issue - especially if you have had to implement something similar at your company.
I have looked here but not much clear help:
http://help.sap.com/saphelp_scm50/helpdata/en/26/c2d63b18bc7e7fe10000000a114084/frameset.htm
Thanks,
Niall

Hi Niall
you are probably looking at RBATP (Rule based ATP). Look at transaction /sapapo/rba04 in APO where you develop your own location and product substitution rules. Going down an ABAP road in R/3 may work short-term but not long-term as the requirements may get more complex.
Regards
Srinivas

Searching for links across multiple pdf files

We have thousands of pdf files that are being moved to a new website. Some of these pdf files have links within them (either as text or as a hyperlink). This number is unknown.
The issue is how to programmatically search across multiple pdf files (numbering in the thousands) looking for links using a regular expression or part of a path. This will have to be able to search behind the text and search for the link url.
We first need to identify the number of files with links and create a list of the files with links that need modifying. If the number is too great to modify manually, then we would need the ability to programmatically edit these links.
The pdf files are stored in a database. Also, the pdf files are different versions and some are password protected.
Is there an Adobe product that will perform this? If not, are there any 3rd party vendor products that will accomplish this?
Thanks in advance for your help.

I have no solution, but a thought: the database factor may seem to be
a killer. But you could look for a solution designed to read PDF files
from a web site (by spidering or from a list), which would presumably
load them.
Or could do a one off extraction of the files from the database into a
directory and use that for your process. Probably a very good idea,
since extracting all files from the database is likely to be costly
and hammer the server (but can be scheduled at a sensible pace), while
the search process will (if it is possible at all) doubtless need to
be run countless times.
Aandi Inston

Spreading user data across multiple HD's?

I hope this is the right forum for this. I'm on a Mac Pro 1,1 and recently installed a few extra hard drives to optimize performance for video editing (accoring to recommendations over at another big software company's support docs -- this is not about FCP).
My goal is to have this general set-up:
SSD: Boot Drive (OS & Apps) -- already set up
Disk 2: Editing Project Files, media and exports
Disk 3: Cache, renders and previews
Disk 4: Ideally this would be for docs, itunes, photos and anythings non video related.
The thing is, this advice comes from PC users mainly and the OSX User Folder structure isn't an issue.
Question: Can the same user folder's contents be spread across multiple HD's? I realize I'll need to physically place a different "house" icon folder in each HD, probably, but can those have different contents? How can I make them all boot on start-up so the disk allocation is pretty much not noticeable?
Thanks!

A simple version of what you are trying is to establish a Boot Drive, with only System, Library, Applications, and the hidden unix files. All user files are moved off to a different drive.
Here are some simple recipes for Moving the "Home" folder"
Japamac's Blog: Make space for Performance -- Moving the Home Folder
http://chris.pirillo.com/how-to-move-the-home-folder-in-os-x-and-why/
You can embellish this basic setup any way you wish, especially putting Movie data on a different drive, and Movie cache data on yet another drive.

Single result set across multiple tables

Hi - what's the best way to perform a single query that can pull
a single result set across multiple tables, ie., a master table
containing subject details and child table containing multiple
records with detail.
I know how to do this for two columns in the same table via
indexing, but how about across tables?
Cheers,
John

I am not sure if I understood your question, but you can use
Intermedia Text with USER_DATA_STORE to create an index with data
source from multiple tables.
(see technet.oracle.com -> products -> oracle text)
Thomas

Using ATMI and tuxedo to institue distributed transactions across multiple DBs

I am creating the framework for a given application that needs to ensure that data
integrity is maintained spanning multiple databases not necessarily within an
instance of weblogic. In other words, I need to basically have 2 phase commit
"internet transactions" between a given coordinator and n participants without
having any real knowlegde of their internal system.
Originally I was thinking of using Weblogic but it appears that I may need to
have all my particular data stores registered with my weblogic instance. This
cannot be the case as I will not have access to that information for the other
participating sytems.
I next thought I would write my own TP...ouch. Everytime I get through another
iteration I kept hitting the same issue of falling into an infinite loop trying
to ensure that my coordinator and the set of participants were each able to perform
the directed action.
My next attempt has led me to the world of ATMI. Would ATMI be able to help me
here. Granted I am using JAVA so I am assuming that I would have to use CORBA
to make the calls but will ATMI enable me to truly manage and create distributed
transactions across multiple databases. Please, any advice at all would be greatly
appreciated.
Thanks
Chris

Andy
I will not have multiple instances of weblogic as I cannot enfore that
the other participants involved in the transaction have weblogic as
their application server. That being said, I may not have the choice
but to use WTC.
Does this make more sense?
Andy Piper <[email protected]> wrote in message news:<[email protected]>...
"Chris" <[email protected]> writes:
I am creating the framework for a given application that needs to ensure that data
integrity is maintained spanning multiple databases not necessarily within an
instance of weblogic. In other words, I need to basically have 2 phase commit
"internet transactions" between a given coordinator and n participants without
having any real knowlegde of their internal system.
Originally I was thinking of using Weblogic but it appears that I may need to
have all my particular data stores registered with my weblogic instance. This
cannot be the case as I will not have access to that information for the other
participating sytems.I don't really understand this. From 6.0 onwards you can do 2PC
between weblogic instances, so as long as the things you are calling
are transaction (EJBs for instance) it should all work out fine.
I next thought I would write my own TP...ouch. Everytime I get through another
iteration I kept hitting the same issue of falling into an infinite loop trying
to ensure that my coordinator and the set of participants were each able to perform
the directed action.
My next attempt has led me to the world of ATMI. Would ATMI be able to help me
here. Granted I am using JAVA so I am assuming that I would have to use CORBA
to make the calls but will ATMI enable me to truly manage and create distributed
transactions across multiple databases. Please, any advice at all would be greatly
appreciated.I don't see that ATMI would give you anything different. Transaction
management Tux is fairly similar to WebLogic (it was written by the
same people). If you are trying to do interposed transactions
(i.e. multiple co-ordinators) then WTC would give you this but it is
only a beta feature in WLS 6.1. Using Tux domain gateways would also
give you interposed behaviour but would require you write your servers
in C or C++ ....
andy

Warehouse partitioning - performance of queries across multiple partitions?

Similar Messages

Maybe you are looking for