Tablespaces and block size in Data Warehouse

We are preparing to implement Data Warehouse on Oracle 11g R2 and currently I am trying to set up some storage strategy - unfortunately I have very little experience with that. The question is what are general advices in such considerations according table spaces and block size? I made some research and it is hard to find some clear answer, there are resources advising that block size is not important and can be left small (8 KB), others state that it is crucial and should be the biggest possible (64KB). The other thing is what part of data should be placed where? Many resources state that keeping indexes apart from its data is a myth and a bad practice as it may lead to decrease of performance, others say that although there is no performance benefit, index table spaces do not need to be backed up and thats why it should be split. The next idea is to have separate table spaces for big tables, small tables, tables accessed frequently and infrequently. How should I organize partitions in terms of table spaces? Is it a good idea to have "old" data (read only) partitions on separate table spaces?
Any help highly appreciated and thank you in advance.

Wojtus-J wrote:
We are preparing to implement Data Warehouse on Oracle 11g R2 and currently I am trying to set up some storage strategy - unfortunately I have very little experience with that. With little experience, the key feature is to avoid big mistakes - don't try to get too clever.
The question is what are general advices in such considerations according table spaces and block size? If you need to ask about block sizes, use the default (i.e. 8KB).
I made some research and it is hard to find some clear answer, But if you get contradictory advice from this forum, how would you decide which bits to follow ?
A couple of sensible guidelines when researching on the internet - look for material that is datestamped with recent dates (last couple of years), or references recent - or at least relevant - versions of Oracle. Give preference to material that explains WHY an idea might be relevant, give greater preference to material that DEMONSTRATES why an idea might be relevant. Check that any explanations and demonstrations are relevant to your planned setup.
The other thing is what part of data should be placed where? Many resources state that keeping indexes apart from its data is a myth and a bad practice as it may lead to decrease of performance, others say that although there is no performance benefit, index table spaces do not need to be backed up and thats why it should be split. The next idea is to have separate table spaces for big tables, small tables, tables accessed frequently and infrequently. How should I organize partitions in terms of table spaces? Is it a good idea to have "old" data (read only) partitions on separate table spaces?
It is often convenient, and sometimes very important, to separate data into different tablespaces based on some aspect of functionality. The performance thing was mooted (badly) in an era when discs were small and (disk) partitions were hard; but all your other examples of why to split are potentially valid for administrative. Big/Small, table/index, old/new, read-only/read-write, fact/dimension etc.
For data warehouses a fairly common practice is to identify some sort of aging pattern for the data, and try to pick a boundary that allows you to partition data so that a large fraction of the data can eventually be made read-only: using tablespaces to mark time-boundaries can be a great convenience - note that the tablespace boundary need not match the partition boudary - e.g. daily partitions in a monthly tablespace. If you take this type of approach, you might have a "working" tablespace for recent data, and then copy the older data to "time-specific" tablespace, packing it and making it readonly as you do so.
Tablespaces are (broadly speaking) about strategy, not performance. (Temporary tablespaces / tablespace groups are probably the exception to this thought.)
Regards
Jonathan Lewis

Similar Messages

DASYLAB QUERIES on Sampling Rate and Block Size

HELP!!!! I have been dwelling on DASYLAB for a few weeks regarding certain problems faced, yet hasn't come to any conclusion. Hope that someone would be able to help.Lots of thanks!
1. I need to have more data points, thus I increase the sampling rate(SR). When sampling rate is increased, Block size(BS) will increase correspondingly.
For low sampling rate (SR<100Hz) and Block size of 1, the recorded time in dasy and the real experimental time is the same. But problem starts when SR>100Hz for BS=1. I realized that the recorded time in dasylab differs from the real time. To solve the time difference problem, I've decided to use "AUTO" block size.
Qn1: Is there any way to solve the time difference problem for high SR?
Qn2: For Auto Block Size, Is the recorded result in dasylab at one time moment the actual value or has it been overwritten by the value from the previous block when AUTO BS is chosen.
2. I've tried getting the result for both BS=1 and when BS is auto. Regardless of the sampling rate, the values gotten when BS=1 is always larger than that of Auto Block size. Qn1: Which is the actual result of the test?
Qn2: Is there any best combination of the block size and sampling rate that can be used?
Hope someone is able to help me with the above problem.
Thanks-a-million!!!!!
Message Edited by JasTan on 03-24-2008 05:37 AM

Generally, the DASYLab sampling rate to block size ratio should be between 2:1 and 10:1.
If your sample rate is 1000, the block size should be 500 to no smaller than 100.
Very large block sizes that encompass more than 1 second worth of data often cause display delays that frustrate users.
Very small block sizes that have less than 10 ms of data cause DASYLab to bog down.
Sample rate of 100 samples / second and a block size of 1 is going to cause DASYLab to bog down.
There are many factors that contribute to performance, or lack there of - the speed and on-board buffers of the data acquisition device, the speed, memory, and video capabilities of the computer, and the complexity of the worksheet. As a result, we cannot be more specific, other than to provide you with the rule of thumb above, and suggest that you experiment with various settings, as you have done.
Usually the only reason that you want a small block size is for closed loop control applications. My usual advice is that DASYLab control is around 1 to 10 samples/second. Much faster, and delays start to set in. If you need fast, tight control loops, there are better solutions that don't involve Microsoft Windows and DASYLab.
Q1 - without knowing more about your hardware, I cannot answer the question, but, see above. Keep the block size ratio between 2:1 and 10:1.
Q2 - without knowing more about your hardware, and the driver, I'm not sure that I can fully answer the question. In general, the DASYLab driver instructs the DAQ device driver to program the DAQ device to a certain sampling rate and buffer size. The DASYLab driver then retrieves the data from the intermediate buffers, and feeds it to the DASYLab A/D Input module. If the intermediate buffers are too small, or the sample rate exceeds the capability of the built-in buffers on the hardwar, then data might be overwritten. You should have receive warning or error messages from the driver.
Q3 - See above.
It may be that your hardware driver is not configured correctly. What DAQ device, driver, DASYLab version, and operating system are you using? How much memory do you have? How complex is your worksheet? Are you doing control?
Have you contacted your DASYLab reseller for more help? They should know your hardware better than I do.
- cj
Measurement Computing (MCC) has free technical support. Visit www.mccdaq.com and click on the "Support" tab for all support options, including DASYLab.

Buffer data before chart it and block size

I hope you can help me with this situation because I have been stoped in this situation for two days and I think that I won't see the light in the short time and the time is a scarce resource.
I want to use a NI DaqCard-AI-16XE-50 ( 20KS/s accordig to specifications). To acquired data in DasyLab I've been using OPC DA system but when I try get a chart from the signal I get awfull results.
I guess the origin of the problem is the PC is not powerfull to generate a chart in real time, so Is there a block to save the data, then graph the data without use "Write Data" Block to avoid write data to disk?
Another cause of the problem it could be an incorrect set value of Block size, but in my piont of view with the 10Khz, and 4096 of block size is more than neccesary to acquire a signal of 26[Hz] (showing in the photo). If I reduce the block size to 1 the signal showing in the graph is a constant of the first value acquire. Why could be this situation?
Thanks in advance for your answers,
Solved!
Go to Solution.
Attachments:
data from DAQcard.PNG ‏95 KB

Is there someone who can help me
I connect CN606TC2 and Dasylab 11... using the RS232 cable, and have done the instruction manual of the book cn606tc2.
In dasylab RS232 module, there is a box:
1). Measurement data request,
2). Measurement data format ...
what should I write in the box is so modules dasylab Digital meters can be read from CN606TC2.
To start communication, the Command Module must send alert code ASCII [L] hex 4C.
Commands requesting data from the scannerCN606TC2)
ASCII [A] hex 41 = Zones/ Alarms/ Scan time
ASCII [M] hex 4D = Model/ Password/ ID#/ # of zones
ASCII [S] hex 53 = Setpoints
ASCII [T] hex 54 = Temperature
I did not understand the program and the ASCII code.
I send [T] in the RS232 monitor and restore dasylab 54.
I am very grateful for your help

RAID, ASM, and Block Size

* This was posted in the "Installation" Thread, but I copied it here to see if I can get more responses, Thank you.*
Hello,
I am about to set up a new Oracle 10.2 Database server. In the past, I used RAID5 since 1) it was a fairly small database 2) there were not alot of writes 3) high availability 4) wasted less space compared to other RAID techniques.
However, even though our database is still small (around 100GB), we are noticing that when we update our data, the time it takes is starting to grow to a point whereby the update that used to take about an hour, now takes 10-12 hours or more. One thing we noticed that if we created another tablespace which had a block size of 16KB versus our normal tablespace which had a block size of 8KB, we almost cut the update time in half.
So, we decided that we should really start from scratch on a new server and tune it optimally. Here are some questions I have:
1) Our server is a DELL PowerEdge 2850 with 4x146GB Hard Drives (584GB total). What is the best way to set up the disks? Should I use RAID 1+0 for everything? Should I use ASM? If I use ASM, how is the RAID configured? Do I use RAID0 for ASM since ASM handles mirroring and striping? How should I setup the directory structure? How about partitioning?
2) I am installing this on Linux and when I tried on my old system to use 32K block size, it said I could only use 16K due to my OS. Is there a way to use a 32K block size with Linux? Should I use a 32K block size?
Thanks!

Hi
RAID 0 does indeed offer best performance, however if any one drive of the striped set fails you will lose all your data. If you have not considered a backup strategy now would be the time to do so. For redundancy RAID 1 Mirror might be a better option as this will offer a safety net in case of a single drive failure. A RAID is not a backup and you should always consider a workable backup strategy.
Purchase another 2x1TB drives and you could consider a RAID 10? Two Stripes mirrored.
Not all your files will be large ones as I'm guessing you'll be using this workstation for the usual mundane matters such as email etc? Selecting a larger block size with small file sizes usually decreases performance. You have to consider all applications and file sizes, in which case the best block size would be 32k.
My 2p
Tony

Transaction execution time and block size

Hi,
I have Oracle Database 11g R2 64 bit database on Oracle Linux 5.6. My system has ONE hard drive.
Recently I experimented with 8.5 GB database in TPC-E test. I was watching transaction time for 2K,4K,8K Oracle block size. Each time I started new test on different block size, I would created new database from scratch to avoid messing something up (each time SGA and PGA parameters ware identical).
In all experiments a gave to my own tablespace (NEWTS) different configuration because of oracle block-datafile size limits :
2K oracle block database had 3 datafiles, each 7GB.
4K oracle block database had 2 datafiles, each 10GB.
8K oracle block database had 1 datafile of 20GB.
Now best transaction (tranasaction execution) time was on 8K block, little longer tranasaction time had 4K block, but 2K oracle block had definitly worst transaction time.
I identified SQL query(when using 2K and 4K block) that was creating hot segments on E_TRANSACTION table, that is largest table in database (2.9GB), and was slowly executed (number of executions was low compared to 8K numbers).
Now here is my question. Is it possible that multiple datafiles are reasone for this low transaction times. I have AWR reports from that period, but as someone who is still learning things about DBA, I would like to asq, how could I identify this multi-datafile problem (if that is THE problem), by looking inside AWR statistics.
THX to all.

>
It's always interesting to see the results of serious attempts to quantify the effects of variation in block sizes, but it's hard to do proper tests and eliminate side effects.
I have Oracle Database 11g R2 64 bit database on Oracle Linux 5.6. My system has ONE hard drive.A single drive does make it a little too easy for apparently random variation in performance.
Recently I experimented with 8.5 GB database in TPC-E test. I was watching transaction time for 2K,4K,8K Oracle block size. Each time I started new test on different block size, I would created new database from scratch to avoid messing something up Did you do anything to ensure that the physical location of the data files was a very close match across databases - inner tracks vs. outer tracks could make a difference.
(each time SGA and PGA parameters ware identical).Can you give us the list of parameters you set ? As you change the block size, identical parameters DON'T necessarily result in the same configuration. Typically a large change in response time turns out to be due to changes in execution plan, and this can often be associated with different configuration. Did you also check that the system statistics were appropriately matched (which doesn't mean identical cross all databases).
In all experiments a gave to my own tablespace (NEWTS) different configuration because of oracle block-datafile size limits :
2K oracle block database had 3 datafiles, each 7GB.
4K oracle block database had 2 datafiles, each 10GB.
8K oracle block database had 1 datafile of 20GB.If you use bigfile tablespaces I think you can get 8TB in a single file for a tablespace.
Now best transaction (tranasaction execution) time was on 8K block, little longer tranasaction time had 4K block, but 2K oracle block had definitly worst transaction time.We need some values here, not just "best/worst" - it doesn't even begin to get interesting unless you have at least a 5% variation - and then it has to be consistent and reproducible.
I identified SQL query(when using 2K and 4K block) that was creating hot segments on E_TRANSACTION table, that is largest table in database (2.9GB), and was slowly executed (number of executions was low compared to 8K numbers).Query, or DML ? What do you mean by "hot" ? Is E_TRANSACTION a partitioned table - if not then it consists of one segment, so did you mean to say "blocks" rather than segments ? If blocks, which class of blocks ?
Now here is my question. Is it possible that multiple datafiles are reasone for this low transaction times. I have AWR reports from that period, but as someone who is still learning things about DBA, I would like to asq, how could I identify this multi-datafile problem (if that is THE problem), by looking inside AWR statistics.On a single disc drive I could probably set something up that ensured you got different performance because of different numbers of files per tablespace. As SB has pointed out there are some aspects of extent allocation that could have an effect - roughly speaking, extents for a single object go round-robin on the files so if you have small extent sizes for a large object then a tablescan is more likely to result in larger (slower) head movements if the tablespace is made from multiple files.
If the results are reproducible, then enable extended tracking (dbms_monitor, with waits) and show us what the tkprof summaries for the slow transactions look like. That may give us some clues.
Regards
Jonathan Lewis

Specifying segments and block size manaually

Hi, just a quick question,
But could anyone help me understand why someone may manually add segments to a table space (or is it a data file they would be added to) ? does auto extend not take care of this?
And secondly ... why would you increase or decrease the block size of a segment?... is this because you may have small or large sized rows within a table and want a block size to acompany this?
Any help would be appriciated

Hi,
In Oracle free space can be managed automatically or manually,You specify automatic segment-space management when you create a locally managed tablespace
Free space can be managed automatically inside database segments. The in-segment free/used space is tracked using bitmaps, as opposed to free lists. Automatic segment-space management offers the following benefits:
-Ease of use
-Better space utilization, especially for the objects with highly varying size rows
-Better run-time adjustment to variations in concurrent access
-Better multi-instance behavior in terms of performance/space utilization
For manually managed tablespaces, two space management parameters, PCTFREE and PCTUSED, enable you to control the use of free space for inserts and updates to the rows in all the data blocks of a particular segment. Specify these parameters when you create or alter a table or cluster (which has its own data segment). You can also specify the storage parameter PCTFREE when creating or altering an index (which has its own index segment).
see this link
http://download.oracle.com/docs/cd/B10500_01/server.920/a96524/b_deprec.htm#634923 :)

Raid storage usage and block size

We have two XServe RAID units Raid 5 and we are adding a new 16 bay ACNC raid with 16 1.5TB drives in Raid 6 + Hot Spare. I initialized the Raid 6 with 128K block size. The total data moving from the older raid volumes is around 5.7TB, but on the new Raid it is taking around 7.4TB of space. Is this due to the 128K block size? This is a prepress server so most of the files are quite large, but there may be lots of small files as well.

Hi
RAID 0 does indeed offer best performance, however if any one drive of the striped set fails you will lose all your data. If you have not considered a backup strategy now would be the time to do so. For redundancy RAID 1 Mirror might be a better option as this will offer a safety net in case of a single drive failure. A RAID is not a backup and you should always consider a workable backup strategy.
Purchase another 2x1TB drives and you could consider a RAID 10? Two Stripes mirrored.
Not all your files will be large ones as I'm guessing you'll be using this workstation for the usual mundane matters such as email etc? Selecting a larger block size with small file sizes usually decreases performance. You have to consider all applications and file sizes, in which case the best block size would be 32k.
My 2p
Tony

Compression and query performance in data warehouses

Hi,
Using Oracle 11.2.0.3 have a large fact table with bitmap indexes to the asscoiated dimensions.
Understand bitmap indexes are compressed by default so assume cannot further compress them.
Is this correct?
Wish to try compress the large fact table to see if this will reduce the i/o on reads and therfore give performance benefits.
ETL speed fine just want to increase the report performance.
Thoughts - anyone seen significant gains in data warehouse report performance with compression.
Also, current PCTFREE on table 10%.
As only insert into tabel considering making this 1% to imporve report performance.
Thoughts?
Thanks

First of all:
Table Compression and Bitmap Indexes
To use table compression on partitioned tables with bitmap indexes, you must do the following before you introduce the compression attribute for the first time:
Mark bitmap indexes unusable.
Set the compression attribute.
Rebuild the indexes.
The first time you make a compressed partition part of an existing, fully uncompressed partitioned table, you must either drop all existing bitmap indexes or mark them UNUSABLE before adding a compressed partition. This must be done irrespective of whether any partition contains any data. It is also independent of the operation that causes one or more compressed partitions to become part of the table. This does not apply to a partitioned table having B-tree indexes only.
This rebuilding of the bitmap index structures is necessary to accommodate the potentially higher number of rows stored for each data block with table compression enabled. Enabling table compression must be done only for the first time. All subsequent operations, whether they affect compressed or uncompressed partitions, or change the compression attribute, behave identically for uncompressed, partially compressed, or fully compressed partitioned tables.
To avoid the recreation of any bitmap index structure, Oracle recommends creating every partitioned table with at least one compressed partition whenever you plan to partially or fully compress the partitioned table in the future. This compressed partition can stay empty or even can be dropped after the partition table creation.
Having a partitioned table with compressed partitions can lead to slightly larger bitmap index structures for the uncompressed partitions. The bitmap index structures for the compressed partitions, however, are usually smaller than the appropriate bitmap index structure before table compression. This highly depends on the achieved compression rates.

Raid 0 (Stripe) for OS X boot disk? Best Performance and block size

Hi,
so this is a new thread to an older question I had and would like some feedback on;
I have a new Mac Pro with 4 matched 1TB caviar black drives. I WILL be doing Full Time-Machine Backups, as well as an independant full-system backup regularly.
That being said, I have 4 drives open and am looking for suggestions. I am leaning toward 2 sets or stripes (one for the OS and one for 'work space', the former with a 32k stripe block size, the latter with 64k (will hold video, audio, scratch, and, yes, Games).
Does this sound alright? Is there an issue with Striping the boot drive? Is the block size or 32 (or 64) optimal?
Thanks!
Dan

Hi D# Shooter, regarding your question,
D3 Shooter wrote:
You brought to mind something I did not take into consideration, Time Machine. I really like the simplicity of TM as it saved me once before. So, could you tell me, for photo files, some video, how much does the striping (% wise) improve the accessing and filing of such files compared to no striping but, using internal drives (7200/WD/1TB/Caviar)? I have not done striping before and want to weigh in because of the back up storage issues now. Thanks.
J_ust give it a try and see if it is worth it for you_.
Striping:
• just enhances (reduces the access/transfer) because in practice the access is distributed in parallel across several DDM's (Old school but it works great!). I think for video and file work the advantage is that you can access the whole object sooner (rather than faster).
• this distribuition also reduces a load of old style queing on the device ove rthe path. THis was resolved in the late 1980's so no reall rocket science here.
the issues with striping are few and basically over all the raid implementations (except JBOD which of course is not raid) when compared to a single spindle. The discussions are enormous and plentiful via google and experiences and opinions vary widely.
Fir the I.T. peole its the advantage they get for access using a smart disk controller that caches goosies like indexes and stuff so that they can sustain a zillion trivial transactions/sec (i.e. banking & internet stuff).. stuff that is of no interest to me
For the creative people and many applications that are BLOB's (like video, film and remote sensing objects) getting use of the objects sooner (not faster) is of prime importance for workflow efficiencies. If you have this need then striping stiff across disks is for you!
TIMEMACHINE.app works fine as it seems fairly agnostic to whats implemented under the disk file system. MY issue with time machine is that I don't want it looking after my production stuff, only to keep an eye on my admin I.T. type stuff such as ~/ and data data files.
As posted on ths thread:
• availability is the major concern with any file system (cloud or raid or other). RAID with parity schemes and double parity schemes (Raid1,,3,5,6) and implementations such as RAID6+ LSF (log structured file) are all wonderful for this business workflows that need it.
• timely access in a workflow is another
• cost benefits are another
However a *great benefit* for me of *consolidating small storage components under one huge file system is that you dont have to COPY any thing around*. THis is marvelous especially when you think you have to move 2TB's of stuff from one place to anther. THis a takes a lot of time with elcheapo didks that dont have fast interfaces such as SATA/SAS of FC for example.
As always and has been addressed by others on this thread (Hatter) if you lose a component storage device the whole file system is hosed or severely degraded unless you spend a lot of money on full ranks of DDMs with hot spares and a very good RAID controller card. Again its money.
YEah sure you can carry some PARITY RAID implementation around across 3 didks but the storage capacity usage is dreadful. THis is why more complex RAID implemntatiosn are in groups of 10+ dDMs.. (yep poepl can argue.. but this is the mainstream).
My external disk arrays are merely two LUNs (SAS DOMAINSA) that have two file systems implemented using 2 x 4TB 1TBs DDMS - all RAID0 - no parity (no availability) - I just want speed. I look after my own "availability" withm= my archive solution. If the operation dies, I stat again. I'm happy wi that. RAID 5 has write penalty performace hits (well known +update in place+), , RAID 6+ is lousy for huge objects but good for I.T. but ok if you lose two disks in a stripe (RANK).
They all have their flaws... and mirroring a RAID0 (RAID1/0) seems to be popular with storage vendors because they can see you more disk and thats proper business workflow depends on it.
However you can achieve this stuff if you change your workflow slightly.
Other than these the rest is tech specs and stuff under the cover.
So you what is right for you and your business.
I dont like spending money on nasty elcheapo FW800 LeCIE disk enclosures with the their junky components and their ilk having been done badly on several corrupted devices and lsing TB;s of content - this is why I invested in a high speed LTO4 ULtrium data tape archive solution.
sorry for long post..
w

RSA key and block size

Let's say that I have an RSA key pair that has been generated in a keystore using the keytool utility.
I am now accessing this key pair through some java code (using the Keystore class) and I want to encrypt/decrypt data using this public/private key.
In order to encrypt/decrypt arbitray length data, I need to know the maximum block size that I can encrypt/decrypt.
Based upon my experiment, this block size seems to be the size of the key divided by 8 and minus 11.
But how can I determine all that programatically when the only thing that I have is the keystore?
I did not find a way to figure out the size of the key from the keystore (unless it can be computed from the RSA exponent or modulus, but this is where my knowledged of RSA keys stops) and I did not find a way to figure out where this "magic" number 11 is coming from.
I can always encrypt 1 byte of data and look at the size of the result. This will give me the blocksize and the key size by multiplying it by 8. But it means that I always need the public key around to compute this size (I cannot do it if I have only the private key).
And this is not helping much on the number 11 side.
Am I missing something obvious?
Thanks.

It is probably a bug. A naive implementation of RSA key generation that would exhibit this bug would work as follows (I'm ignoring the encrypt and decrypt exponents intentionally):
input: an rsa modulus bit size k, k is even:
output: the rsa modulus n.
k is even, so let k=2*l
step1: generate an l bit prime p, 2^l(-1) < p < 2^l
step2: generate another l bit prime q, 2^l(-1) < q < 2^l
step3: output n = p*q
Now the above might seem reasonable, but when you multiply the inequalities you get
2^(2*l -2) < n < 2^(2l)
That lower bound means that n can be 1 bit smaller than you expect.. The correct smallest lower bound for generating the primes p and q is (2^l) / sqrt(2), rounded up to the nearest integer.
I'll bet the IBM code implements something like the first algorithm.

Table and Index compression in data warehouse - thoughts?

Hi,
We have a data warehouse with large fact tables and materialized views of this data.
Approx 3 million inserts per day week-ends about 12 million.
The fact tables we have expected to have 200 million, and couple with 1-3 billion.
Tables partitioned and have bitmap indexes.
Just wondered what thoughts were about compressing large fact tables and mviews both from point of view of ETL into them and reporting from them afterwards.
I take it, can compress/uncompress accordingly without any problem?
Many Thanks

After compression, most SELECT statements would not get slower. Actually, many can get faster due to reduced IO and buffer needs.
The situation with DMLs is more complex. It depends on the exact compression options (basic or advanced) and the DML (INSERT,UPDATE, direct load,..),but generally DML are negatively affected by compression.
In a Data Warehouses (DWs), it is usually quite beneficial to compress partitions or tables that contain data that is not supposed to be modified (read only or read mostly). Please note that in many cases you do not have to compress while you are loading the data – you can do that later.
You can also consider compressing some of your B-tree indexes (if you use them in your DW system).
Iordan Iotzov
http://iiotzov.wordpress.com/

Why do we need SSIS and star schema of Data Warehouse?

If SSAS in MOLAP mode stores data, what is the application of SSIS and why do we need a Data Warehouse and the ETL process of SSIS?
I have a SQL Server OLTP database. I am using SSIS to transfer my SQL Server data from OLTP database to a Data Warehouse database that contains fact and dimension tables.
After that I want to create cubes using SSAS form Data Warehouse data.
I know that MOLAP stores data. Do I need any Data warehouse with Fact and Dimension tables?
Is not it better to avoid creating Data warehouse and create cubes directly from OLTP database?

Another thing to note is data stored in transactional system may not always be in end user consumable format for ex. we may use bit fields/flags to represent some details in OLTP as storage required ius minimum but presenting them as is would not make any
sense to user as they would not know what each bit value represents. In such cases we apply some transformations and convert data into useful information for users to understand. This is also in the warehouse so that information in warehouse can directly be
used for reporting. Also in many cases the report will merge data from multiple source systems so merging it on the fly in report would be tedious and would have hit on report server. In comparison bringing them onto common layer (warehouse) and prebuilding
aggregates would be benefitial for the report performance.
I think (not sure) we join tables in SSAS queries and calculate aggregations in it.
I think SSAS stores these values and joined tables and we do not need to evaluates those values again and this behavior is like a Data Warehouse.
Is not it?
So if I do not need historical data, Can I avoid creating Data Warehouse?
On the backend SSAS uses queries only to extract the data
B/w I was not explaining on SSAS. I was explaining on what happens inside datawarehouse which is a relational database by itself. SSAS is used to built cube (OLAP structures) on top of datawarehouse. star schema is easier for defining relationships
and buidling aggregations inside SSAS as its simple and requires minimal lookups to be performed. Also data would be held at lowest granularity level which can easily be aggregated to required levels inside OLAP cubes. Cube processing is very resource
intensive and using OLTP system would really have a huge impact on processing performance as its nnot denormalized and also doing tranformation etc on the fly adds up to complexity. Precreating a layer (data warehouse) having data in required format would
make cube processing easier and simpler as it has to just cross join tables and aggregate data based on relationships defined and level needed inside the cube.
Please Mark This As Answer if it helps to solve the issue Visakh ---------------------------- http://visakhm.blogspot.com/ https://www.facebook.com/VmBlogs

Foreign keys in SCD2 dimensions and fact tables in data warehouse

Hello.
I have datawarehouse in snowflake schema. All dimensions are SCD2, the columns are like that:
ID (PK) SID NAME ... START_DATE END_DATE IS_ACTUAL
1 1 XXX 01.01.2000 01.01.2002 0
2 1 YYX 02.01.2002 01.01.2004 1
3 2 SYX 02.01.2002 1
4 3 AYX 02.01.2002 01.01.2004 0
5 3 YYZ 02.01.2004 1
On this table there are relations from other dimension and fact table.
Need I create foreign keys for relation?
And if I do, on what columns? SID (serial ID) is not unique. If I create on ID, I have to get SID and actual row in any query.

>
I have datawarehouse in snowflake schema. All dimensions are SCD2, the columns are like that:
ID (PK) SID NAME ... START_DATE END_DATE IS_ACTUAL
1 1 XXX 01.01.2000 01.01.2002 0
2 1 YYX 02.01.2002 01.01.2004 1
3 2 SYX 02.01.2002 1
4 3 AYX 02.01.2002 01.01.2004 0
5 3 YYZ 02.01.2004 1
On this table there are relations from other dimension and fact table.
Need I create foreign keys for relation?
>
Are you still designing your system? Why did you choose NOT to use a Star schema? Star schema's are simpler and have some performance benefits over snowflakes. Although there may be some data redundancy that is usually not an issue for data warehouse systems since any DML is usually well-managed and normalization is often sacrificed for better performance.
Only YOU can determine what foreign keys you need. Generally you will create foreign keys between any child table and its parent table and those need to be created on a primary key or unique key value.
>
And if I do, on what columns? SID (serial ID) is not unique. If I create on ID, I have to get SID and actual row in any query.
>
I have no idea what that means. There isn't any way to tell from just the DDL for one dimension table that you provided.
It is not clear if you are saying that your fact table will have a direct relationship to the star-flake dimension tables or only link to them through the top-level dimensions.
Some types of snowflakes do nothing more than normalize a dimension table to eliminate redundancy. For those types the dimension table is, in a sense, a 'mini' fact table and the other normalized tables become its children. The fact table only has a relation to the main dimension table; any data needed from the dimensions 'child' tables is obtained by joining them to their 'parent'.
Other snowflake types have the main fact table having relations to one or more of the dimensions 'child' tables. That complicates the maintenance of the fact table since any change to the dimension 'child' table impacts the fact table also. It is not recommended to use that type of snowflake.
See the 'Snowflake Schemas' section of the Data Warehousing Guide
http://docs.oracle.com/cd/B28359_01/server.111/b28313/schemas.htm
>
Snowflake Schemas
The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake.
Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Figure 19-3 presents a graphical representation of a snowflake schema.

RMAN crosscheck and expire guidelines for data warehouse environment

O/S: Windows Server 2008
DB: Oracle 11gR2
Are there any guidelines to how often one should do a RMAN crosscheck and set an expiration on archivelogs for data warehouse environments?
It would seem once a day would be enough for the crosscheck in a data warehouse environment that gets refreshed nightly. Expiration I would expect no less than 1 week.
Cheers!

I agree with damorgan
refer the below links for best practices.
http://www.oracle.com/technetwork/database/features/availability/311394-132335.pdf
https://blogs.oracle.com/datawarehousing/entry/data_warehouse_in_archivelog_m
Hope this helps,
Regards
http://www.oracleracepxert.com
Understand the Power of Oracle RMAN
http://www.oracleracexpert.com/2011/10/understand-power-of-oracle-rman.html
Duplicating RAC database using RMAN
http://www.oracleracexpert.com/2009/12/duplicate-rac-database-using-rman.html

Stripe Breadth and Block size Allocation..

Hi,
Could anyone please advise me if there is any formula or utility to calculate or to investigate the stripe Breadth or the Block size to be used while creating the pools, I know it differs with the different kind of data to be stored..
It should be a document or a utility that helps with that, I'm still fetching for.
waddah
MacBook Pro Mac OS X (10.4.5) PowerMac G5

Check out Andre Aulich's site:
http://www.andre-aulich.de/en/perm/optimized-xsan-settings-for-several-video-fil e-formats
A lot of good testing went into those results.

Tablespaces and block size in Data Warehouse

Similar Messages

Maybe you are looking for