Btree vs Bitmap. Optimizing load process in Data Warehouse.

Hi,
I'm working on fine tuning a Data Warehousing system. I understand that Bitmap indexes are very good for OLAP systems, especially if the cardinality is low and if the WHERE clause has multiple fields on which bitmap indexes exist for each field.
However, what I'm finetuning is not query, but load process. I want to minimize the total load time. If I create a bitmap index on a field with cardinality of one million, and if the table has one million rows (each row has a distinct field value), then my understanding is
The total size of the bitmap index = number of rows * (cardinality / 8) bytes
(because there are 8 bits in a byte).
Hence the size of my bitmap index will be
Million * Million / 8 bytes = 116 GB.
Also, does anyone know what would be the size of my B-tree index? I'm thinking
The total size of the B-tree index = number of rows * (field length+20) bytes
(assuming that the field length of rowid is 20 charas).
Hence the size of my b-tree index will be
Million * (10+20) bytes = 0.03 GB (assuming that my field length is 10 charas).
That means B-tree index is much lesser than the size of the Bitmap index.
Is my math correct? If so, then the disk activity will be much higher for a bitmap index than a B-tree index. Hence, creation of the bitmap index should take much longer than the B-tree index if the cardinality is high.
Please let me know your opinions.
Thanks
Sankar

Hi Jaffar,
Thanks to you and Jonathan. This is the kind of answer I have been looking for.
If I understand your email correctly, for the scenario from my original email, bitmap index will be 32MB where as Btree will be 23MB. Is that right?
Suppose there is an order table with 10 orders. There are four possible values for OrderType. Based on your reply, now I understand that the bitmap index is organized as shown below.
Data Table:
RowId     OrderNo     OrderType
1     23456     A
2     23457     A
3     23458     B
4     23459     C
5     23460     C
6     23461     C
7     23462     B
8     23463     B
9     23464     D
10     23465     A
Index table:
OrderType     FROM     TO
A     1     2
B     3     3
C     4     6
B     7     8
D     9     9
A     10     10
That means, you might have more entries in the index table than the cardinality. Is that right? That means, the size of the index table cannot be EXACTLY determined based on cardinality. In our example, the cardinality is 4 while there are 6 entries in the index table.
In an extreme example, if no two adjacent records have the same OrderType, then there will be 10 records in the index table as well, as shown in the example below.
Data Table (second example):
RowId     OrderNo     OrderType
1     23456     A
2     23457     B
3     23458     C
4     23459     D
5     23460     A
6     23461     B
7     23462     C
8     23463     D
9     23464     A
10     23465     B
Index table (second example):
OrderType     FROM     TO
A     1     1
B     2     2
C     3     3
D     4     4
A     5     5
B     6     6
C     7     7
D     8     8
A     9     9
B     10     10
That means, the size of the index table will be somewhere between the cardinality (minimally) and the table size (maximally).
Please let me know if I make sense.
Regards
Sankar

Similar Messages

How to automate the data load process using data load file & task Scheduler

Hi,
I am doing Automated Process to load the data in Hyperion Planning application with the help of data_Load.bat file & Task Scheduler.
I have created Data_Load.bat file but rest of the process i am unable complete.
So could you help me , how to automate the data load process using Data_load.bat file & task Scheduler or what are the rest of the file is require to achieve this.
Thanks

To follow up on your question are you using the maxl scripts for the dataload?
If so I have seen and issue within the batch (ex: load_data.bat) that if you do not have the full maxl script path with a batch when running it through event task scheduler the task will work but the log and/ or error file will not be created. Meaning the batch claims it ran from the task scheduler although it didn't do what you needed it to.
If you are using maxl use this as the batch
"essmsh C:\data\DataLoad.mxl" Or you can also use the full path for the maxl either way works. The only reason I would think that the maxl may then not work is if you do not have the batch updated to call on all the maxl PATH changes or if you need to update your environment variables to correct the essmsh command to work in a command prompt.

Load balancing BW data loads

Hi all,
I have a BW 3.5 system running on Oracle 9.2.0.5.
What I would like to do is to set BW to pick a particular R/3 application server for my data loads.
Is it possible?
Best Regards
Mark

Hi Mark,
Load balancing can be done at BW side to pick any specific application server.
You have an SAP BW system with several (application) servers. You would like to distribute the workload of the data loads and other data warehouse management activities in a way that fits your needs best. This could mean that you would like to have all processes distributed across all available servers or that you would like to have one dedicated server for these processes.
Check: [Load Balancing|http://www.scribd.com/doc/7838988/KnowHow-to-SAP-BW-Data-Load-Performance-Analysis-and-Tuning]
[Load balancing 1|http://rapidshare.com/files/157988927/load_balancing_in_BW_data_loads.pdf]
Hope it Helps
Srini

HOW TO LOAD R/3 DATA INTO SAP BI USING PROCESS CHAINS?

Hi,
Can we load R/3 data into BI using process chains?... I loaded data from R/3 into Infocube using generic extraction using view... took 2 tables EBKN and EBAN and ceated view.
In PSA I can find all the 2388 records but when I load into datatarget in transferred tab there are 2388 records but in added colum i could find only 2096.....
I deleted the request and want to load through process chains....... but how to do ?????? without flat file ...can we laod using process chains?
I appreciate any inputs.......
Regards,
Prasanthi.

did you even bother looking at the links in my previous posts???
read the docs...try yourself...if you encounter specific issues, you can post them on the forum...
if you're really expecting somebody to post a step by step for process chain, i think you can wait a long, long time...

Data Load process for 0FI_AR_4 failed

Hi!
I am aobut to implement SAP Best practices scenario "Accounts Receivable Analysis".
When I schedule data load process in Dialog immediately for Transaction Data 0FI_AR_4 and check them in Monitor the the status is yellow:
On the top I can see the following information:
12:33:35 (194 from 0 records)
Request still running
Diagnosis
No errors found. The current process has probably not finished yet.
System Response
The ALE inbox of BI is identical to the ALE outbox of the source system
or
the maximum wait time for this request has not yet been exceeded
or
the background job has not yet finished in the source system.
Current status
No Idocs arrived from the source system.
Question:
which acitons can I do to run the loading process succesfully?

Hi,
The job is still in progress it seems.
You could monitor the job that was created in R/3 (by copying the technical name in the monitor, appending "BI" to is as prefix, and searching for this in SM37 in R/3).
Keep on eye on ST22 as well if this job is taking too long, as you may have gotten a short dump for it already, and this may not have been reported to the monitor yet.
Regards,
De Villiers

How to design data load process chain?

Hello,
I am designing data load process chains for the first time and would like to get some general information on best practicies in that area.
My situation is as follows:
I have 3 source systems (R3 and two for which I use flat files).
How do you suggest, should I define one big chain for all my loading process (I have about 20 InfoSources) or define a few shorter e.g.
1. Master data R3
2. Master data flat file system 1
3. Master data flat file system 2
4. Transaction data R3
5. Transaction data file sys 1
... and execute one after another succesful end?
Could you also suggest me any links or manuals on that topic?
Thank you
Andrzej

Andrzej,
My advise is to make separate chains for master & transaction data (always load in this order!) and afterwards make a 'master chain' where you insert these 2 chains one after the other (so: Start process -> Master data chain -> Transaction data chain).
Regarding the separate chains; paralellize as much as possible (if functionally allowed). Normally, the number of parallel ('vertical') chains equals the nr of CPU's available (check with basis-person).
Hope this provides you with enough info to start off with!
Regards,
Marco

Data load process for FI module

Dear all,
We are using BI7.00 and in one of our FI data source 0EC_PCA_1 we had data load failure, the cause for the failure was analysed and we did the following
1) deleted the data from cube and the PSA
2) reloaded (full load) data - without disturbing the init.
This solved our problem. Now when the data reconciliation is done we find that there are doubled entries for some of the G/L codes.
I have a doubt here.
Since there is no setup table for FI transactions (correct me if i am wrong), the full load had taken the data which was also present in the delta queue and subsequently the delta load had also loaded the same data
(some g/l which was available as delta).
Kindly provide the funtioning of FI data loads. Should we go for a Down time and how FI data loads works without setup tables.
Can experts provided valuable solution for addressing this problem. Can anyone provide step by step process that has to be adopted to solve this problem permenantly.
Regards,
M.M

Hi Magesh,
The FI datasources do not involve Setup tables while performing full loads and they do not involve outbound queue during delta loads.
Full load happens directly from your datasource view to BI and delta is captured in the delta queue.
Yes you are right in saying that when you did a full load some of the values were pulled that were also present in the delta queue. Hence you have double loads.
You need to completely reinitialise as the full load process is disturbed. Taking a down time depends on how frequent the transactions are happening.
You need to.
1. Completely delete the data in BW including the initialisation.
2. Take a down time if necessary.
3. Reintialise the whole datasource from scratch.
Regards,
Pramod

Increase the number of background work processes for data load performance

Hi all,
There are 10 available background work processes in the BW system. We're doing some mass load to multiple ODS.But system uses only 3 background processes. How can i
increase the number of used background work processes for new data load.
I tried to change number of prosesses with RSODSO_SETTINGS. But no successes. Are there any other settings need to change?
thanks,
Yigit

Hi sankar,
I entered the max proc. number into ROIDOCPRMS. But it doesn't make difference. System still uses only 3 of background processes. RSCUSTA2 is replaced with
RSODSO_SETTINGS in BI 7.0 and this trans. can only change the processes for data activation, SID generation and rollback. I need to change the process numbers for data extraction.

Load .csv file data with OWb Process flow using Web

Hi,
I Have a file in my local machine( Machines on multiple user's), need to load data through Web user interface.
Let's say have a web page with multiple radio buttons respective to different sources, by clicking on each button will pass the path of .csv file to through Application, (API or Java programming interface) execute owb Process flow as a accepting file path as a input parameter to execute for loading purpose.
Should facilitate view data, Update data through web based on user requests.
Need your guidence how can i implement this with OWb 11g R2.
Assuming with Web browser functionality. Please confirm it and if yes, please throw some light how could be the steps to implement.
Thanks

Hi David,
Thanks for your reply.
Undersatnd your proposed solution.But my requirement should be as follows.
1. Currently under consideration using web page likely to be implement with Java, allowing users to load .csv file data into staging area.(Loading flat file into Data abse table)
Case 1, Assuming OWB software is not installed on user machine. I think no.
Is it possible through web page (this case Java page) to trigger java procedure/Pl/SQl procedure or integration of both to laod data into staging area.If yes, how it could effect performance of data load with 1 GB file.
Case 2, OWb client software installed on User machine, while runtime passing parameters means passing manually?
In case it is automated, how should i pass machine name & Path to owb runtime web browser.
Could you please show me guidence how should I acheive this functionality with APEX customization part?
Thanks agin for your support.
Anil

Automate the data load process using task scheduler

Hi,
I am doing Automated Process to load the data in Hyperion Planning application with the help of Task Scheduler.
I have created Data_Load.bat file but rest of the process i am unable complete.
So could you help me , how to automate the data load process using Data_load.bat file & task Scheduler.
Thanks

Thanks for your help.
I have done data loading using Data_load.batch file . the .bat file call to .msh file using this i have followed your steps & automatic data lload is possible.
If you need this then please let me know.
Thanks again.
Edited by: 949936 on Oct 26, 2012 5:46 AM

EIS to Essbase data load process

I have two user defined queries in a metaoutline. When i perform dataload operation from EIS, the Essbase application log says that dataload updated [] cells and Data Load Elapsed Time [] seconds but the EIS dataload process is still running. It seems that Essbase is loading the data only for the first query and completing dataload while EIS is still running second query.Has anyone else experienced this problem? Essbase and EIS both are 6.5.0.

In EIS
1) You have to define the Logical OLAP Model connecting to the relational source.It defines the joins between fact table and dimension tables.
2) Based on the OLAP Model You have to create meta outline which defines the rules for loading members and data into essbase.

Loading data into essbase - can't stop the loading process

Hello Everyone
I have built an interface which loads data into essbase.
The interface was working great until last night, don't know why it got stuck in the stage of loading data into essbase
(This is my bonus question, why? does it have anything to do wit hte fact that I have refreshed the database via planning)
Any way, we have tried to kill the loading process by clicking stop on the interface execution in operator, and even after we killed the
session in essbase it was still there in essbase.
Does anyone know why?
Thanks

Hi,
I can't answer your questions but ...
I'm really familiar with the fact that trying to stop an execution fail.
The only solution I found is to pass by an agent for all my executions.
With an agent if I have a problem, as an emergency I can stop the agent it will stop the execution...
I'm sure that it is a bit dirty but it works...
And that is better than nothing...
The ugliest and only way I found without agent to stop an execution was to drop a table used by the job (like a I$ table) it forces the failed...
Sorry no more clue.
Regards,
Brice

Initial full load of Master data using process chain

Hi All,
Could you please help me regarding, initial master data load to characteristics with attributes and text. I need to load master data to 23 info objects, by using process chain can I do full load of master data to all info objects at a time. And one more doubt is, as per my knowledge we can't maintain more than one variant in an info package, is that right ? or we can ?
Means Start Variant -> Info Package (0Customer_Text, 0Customer_Attr,0BILL_TYPE_TEXT, BILL_CAT_TEXT) -> DTP ( ", ", ", ") -> ACR.
Your Help will be appreciated.
Thanks & Regards
Sunil

Hi,
"I need to load master data to 23 info objects, by using process chain can I do full load of master data to all info objects at a time."
if there is no dependency between attributes then you add you can create process chains and trigger them at a time. No issues.
we can't maintain more than one variant in an info package, is that right ? or we can ?
With one info pack you can't load data to all 23 psa. because each data source have own psa. you need to sue 23 info packs.
in general start variant--> info pack --> dtp (assuming as your bw 7.x)---> attribute change run.
like that you need to create 23 chains
Or create one two big chains.
one is for attribute and another for text.
In attribute
start varaint--> info pack(info bject 1)--DTP(infoobject 1))--> info pack(infoo bject 2)-->dtp(infoobject 2).
Like that way you can create in series and parallel chains to load attributes data into info objects. at end you add change run for 6 info objects each. SAme you can do for text loads also.
Thanks

Optimize the data load process into BPC Cubes on BW

Hello Gurus,
We like to know how to optimize the data load process and our scenario for this is that we have ECC Classic Ledger, and we are looking for the best way to load data into the BW Infocubes from an ECC source.
To complement the question above, from what tables the data must be extracted and then parsed to BW so the consolidation it´s done ? also, is there any other module that has to be considered from other modules like FI or EC-CS for this?
Best Regards,
Rodrigo

Hi Rodrigo,
Have you looked at the BW Business Content extractors available for the classic GL? If not, I suggest you take a look. BW business content provides all the business logic you will normally need to get data out of ECC and into BW for pretty much every ECC application component in existence: [http://help.sap.com/saphelp_nw70/helpdata/en/17/cdfb637ca5436fa07f1fdc0123aaf8/frameset.htm]
Ethan

Load Process in 11g

Hi,
How do you model a load process in BPM 11g using the new BPMN palette. The load process queries an external oracle database table and creates tasks for the end users in the workspace. Each task has a user interface that will display the data passed in.
thanks.

OracleStudent,
I am not going to recommned to fiddle with your redo log size and that will be my last option if I have to.
number of record = 8413427
Txt fiel size = 3.59GB
columns = 91
you said remote server, is that the case? i've have no idea could you tell mean what is meaning of remote server?? plz tell me how can i check this???? i've recenly join this campany i asked to developer who show me the code where he is using direct = ture. plz help me this process of loading is very annoying for me. plz tell what i need to check
Couple of questions.
How are you loading this data? You mentioend using some .NET application my question, is this .NET applicaiton resides on the same server as your database or does it run from a different machine. Also if you are invoking sqlldr (as you mentioned), please post your sqlldr control file. Also during the load it should be generating a log file , check and look for following line to verify and confirm you are using direct path.
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Continuation: none specified
Path used: Direct2. Do you have any indexes on this table, if yes how many and what type? I mean regular btree or bitmap or both?
3. Does this table in logging or nologging state?
Regards

Btree vs Bitmap. Optimizing load process in Data Warehouse.

Similar Messages

Maybe you are looking for