Identify high volume characteristics in dimension tables
Hi,
We have identified some cubes where dimension table to fact table ratio is more than 20%.
Now further step is to find out characteristics in those dimension tables which we can either make as line item dimension or
move to new dimension.Is there any function module or table which can identify such characteristics for particular cube?
Regards,
Shital
Hi,
LISTSCHEMA won't give you the size of the tables nor the option to analyze the dimensions in detail.
Similarly, SAP_INFOCUBE_DESIGN won't give you the detail of what's inside a dimension. It just gives you the size of the dimension.
The only way(as far as i know) to do it is as i suggested in earlier posts - get your dimension table to your DBA. It will take 2 minutes for them to find out.
Please correct me if I am wrong.
-RMP
Similar Messages
-
How to identify fact and dimension tables
Hi ,
We are having the list of parent child relation info for each database tables. Based upon the parent child tables ,needs to identify which table is fact ,which table is dimension .Could you please help me how to identify the fact table and dimensions tables ?
Thanks in advanceHi,
Refer this link........
http://www.oraclebidwh.com/2007/12/fact-dimension-tables-in-obiee/
Please mark if it helps you....... -
Identify the Cubes where dimension table size exactly 100% of the fact tabl
Hello,
We want to Identify the Cubes where dimension table size exactly 100% of the fact table.
Is there any table or standard query which can give me this data?
Regards,
Shitaluse report (se38) SAP_INFOCUBE_DESIGNS
M. -
Drawbacks of big volume into Dimension Tables
Experts,
Below are counts for DIM and FACT tables.
Can somebody please help me with the big volume of data under Dimension tables versus FACT tables?
What are the drawbacks and how DIM tables with big volume of data can hurt in Data Warehousing?
Also, in near future we are target for cubes designing above this DIM & FACT tables.
Your thoughts/responses are much appreciated.
Thanks in advance
Please do let us know your feedback. Thank You - KG, MCTSHi gk1393,
Incremental loading is fundamental when facing these volumes you are showing. But, besides the ETL considerations, you may want to re-think about the design of this datawarehouse.
¿Are these dimensions actually dimensions? In some businesses dimensions are very big, but is not that common to find many dimensions bigger than the facts (at least not in a mature datawarehouse).
Building dimensions on top of these big tables presents some performance problems, both processing and querying the cube. Disabling attributes for processing, removing extra text attributes that are not used in aggregations and reducing data type sizes (if
you can store data in a smallint data type don't use bigint, for example) are useful techniques to mitigate these drawbacks.
But, again, I'd think about the datawarehouse design itself to check if these cardinalities make sense within your business needs.
Regards.
Pau -
How to Maintain Surrogate Key Mapping (cross-reference) for Dimension Tables
Hi,
What would be the best approach on ODI to implement the Surrogate Key Mapping Table on the STG layer according to Kimball's technique:
"Surrogate key mapping tables are designed to map natural keys from the disparate source systems to their master data warehouse surrogate key. Mapping tables are an efficient way to maintain surrogate keys in your data warehouse. These compact tables are designed for high-speed processing. Mapping tables contain only the most current value of a surrogate key— used to populate a dimension—and the natural key from the source system. Since the same dimension can have many sources, a mapping table contains a natural key column for each of its sources.
Mapping tables can be equally effective if they are stored in a database or on the file system. The advantage of using a database for mapping tables is that you can utilize the database sequence generator to create new surrogate keys. And also, when indexed properly, mapping tables in a database are very efficient during key value lookups."
We have a requirement to implement cross-reference mapping tables with Natural and Surrogate Keys for each dimension table. These mappings tables will be populated automatically (only inserts) during the E-LT execution, right after inserting into the dimension table.
Someone have any idea on how to implement this on ODI?
Thanks,
DaniloHi,
first of all please avoid bolding something. After this according Kimball (if i remember well) is a 1:1 mapping, so no-surrogate key.
After that personally you could use Lookup Table
http://www.odigurus.com/2012/02/lookup-transformation-using-odi.html
or make a simple outer join filtering by your "Active_Flag" column (remember that this filter need to be inside your outer join).
Let us know
Francesco -
Tool to export and import high volume data from/to Oracle and MS Excel
We are using certain reports (developed in XLS and CSV) to extract more than 500K to 1M records in single report. There around 1000 reports generated daily. The business users review those reports and apply certain rules to identify exceptions then they apply those corrections back to the system through XL upload.
The XL reports are developed in TIBCO BW and deployed in AMX platform. The user interface is running on TIBCO GI.
Database Version: Oracle 11.2.0.3.0 (RAC - 2 node)
The inputs around following points will be of great help:
1) Recommendation to handle such higher volumes reports and mechanism to apply bulk correction back to system?
2) Suggestions for any Oracle tool or third party toolIf you were to install Oracle client software on the PC where EXCEL is installed,
then you can utilize ODBC such that Excel can connect directly to the DB & issue SQL. -
Should my dimension table be this big
Im in the process of building my first product dimension for a star schema and not sure if im doing this correctly. A brief explanation of our setup
We have products (dresses) made by designers for specific collections each of which has a range of colors and each color can come in many sizes. For our UK market this equates to some 1.9 million
product variants. Flattening the structure out to Product+designer+collection gives about 33,000 records but when you add all the colors and then all the colors sizes it pumps that figure up to 1.9million. My “rolled own” incremental ETL load runs
ok just now, but we are expanding into the US market and our projections indicate that our products variants could multiple 10 fold. Im a bit worried about performance of the ETL just now and in the future.
Is 1.9m records not an awful lot of records to incrementally load (well analyse) for a dimension table nightly as it is never mind what that figure may grow to when we go to US?
I thought of somehow reducing this by using a snowflake but would this not just reduce the number of columns in the dimensions and not the row count?
I then thought of separating the colors and size into their own dimension, but this doesn’t seem right as they are attributes of products and also I would lose the relationship between products, size & color I.e. I would have to go through the
fact table (which ive read is not a good choice.) for any analysis.
Am I correct in thinking these are big numbers for a dimension table? Is it even possible to reduce the number somhow?
Still learning so welcome any help.
ThanksHi Plip71,
In my opinion, It is always good to reduce the Dimension Volume as much as possible for better performance.
Is there any Hierarchy in you product Dimension?.. Going for a snowflake for this problem is a bad idea..
Solution 1:
From the details given by, It is good to Split the Colour and Size as seperate dimension. This will reduce the vloume of dimension and increase the column count in the fact(seperate WID has to be maintained in the fact table). but, this will improve the
performance of the cube. before doint this please check the layout requirement from the business.
Solution 2:
Check the distinct count of Item varient used in fact table. If it is very less, they try creating a linear product dimension. i.e, create an view for the product dimension doing a inner join with the fact table. so that only the used Dimension member will
be loaded in the Cube Product Dimension. hence volume is reduced with improvement in performance and stability of the cube.
Thanks in advance, Sorry for the delayed reply ;)
Anand
Please vote as helpful or mark as answer, if it helps Regards, Anand -
Consistency Warning - [39008] Dimension Table not joined to Fact Source
I have a schema in which I have the following tables:
A) Patient Transaction Fact Table (i.e. supplies used, procedures performed, etc.)
B) Demographic Dimension table (houses info like patient location code)
C) Location Dimension table (tells me what Hospital each unique Location maps to)
So table A is the fact, and table B is a dimension table joined to table A based on Patient ID, so I can get general info on the patient. This would allow me to apply logic to just see patient transactions where the patient was FEMALE, or was in the Emergency Room, by applying conditions to these fields in table B.
Table C is a simple lookup table joined to table B by Location Code, so I can identify which hospital's emergency room the patient was located in for instance.
So the schema is: A<---B<---C, where B and C are both dimension tables.
The query works as desired, but my consistency check gives me the following WARNING:
*[39008] Logical dimension table LOCATION MASTER D has a source LOCATION MASTER D that does not join to any fact source.*
How do I resolve this WARNING, or at least suppress it?Hi,
What you need to do is to add the (physical) location dimension table to the logical table source of the demographic dimension, for example by dragging it from physical layer on top of logical table source of demographic logical dimension table in bmm layer
Regards,
Stijn -
What is '#Distinct values' in Index on dimension table
Gurus!
I have loaded my BW Quality system (master data and transaction data) with almost equivalent volume as in Production.
I am comparing the sizes of dimension and fact tables of one of the cubes in Quality and PROD.
I am taking one of the dimension tables into consideration here.
Quality:
/BIC/DCUBENAME2 Volume of records: 4,286,259
Index /BIC/ECUBENAME~050 on the E fact table /BIC/ECUBENAME for this dimension key KEY_CUBENAME2 shows #Distinct values as 4,286,259
Prod:
/BIC/DCUBENAME2 Volume of records: 5,817,463
Index /BIC/ECUBENAME~050 on the E fact table /BIC/ECUBENAME for this dimension key KEY_CUBENAME2 shows #Distinct values as 937,844
I would want to know why the distinct value is different from the dimension table count in PROD
I am getting this information from the SQL execution plan, if I click on the /BIC/ECUBENAME table in the code. This screen gives me all details about the fact table volumes, indexes etc..
The index and statistics on the cube is up to date.
Quality:
E fact table:
Table /BIC/ECUBENAME
Last statistics date 03.11.2008
Analyze Method 9,767,732 Rows
Number of rows 9,767,732
Number of blocks allocated 136,596
Number of empty blocks 0
Average space 0
Chain count 0
Average row length 95
Partitioned YES
NONUNIQUE Index /BIC/ECUBENAME~P:
Column Name #Distinct
KEY_CUBENAMEP 1
KEY_CUBENAMET 7
KEY_CUBENAMEU 1
KEY_CUBENAME1 148,647
KEY_CUBENAME2 4,286,259
KEY_CUBENAME3 6
KEY_CUBENAME4 322
KEY_CUBENAME5 1,891,706
KEY_CUBENAME6 254,668
KEY_CUBENAME7 5
KEY_CUBENAME8 9,430
KEY_CUBENAME9 122
KEY_CUBENAMEA 10
KEY_CUBENAMEB 6
KEY_CUBENAMEC 1,224
KEY_CUBENAMED 328
Prod:
Table /BIC/ECUBENAME
Last statistics date 13.11.2008
Analyze Method 1,379,086 Rows
Number of rows 13,790,860
Number of blocks allocated 187,880
Number of empty blocks 0
Average space 0
Chain count 0
Average row length 92
Partitioned YES
NONUNIQUE Index /BIC/ECUBENAME~P:
Column Name #Distinct
KEY_CUBENAMEP 1
KEY_CUBENAMET 10
KEY_CUBENAMEU 1
KEY_CUBENAME1 123,319
KEY_CUBENAME2 937,844
KEY_CUBENAME3 6
KEY_CUBENAME4 363
KEY_CUBENAME5 691,303
KEY_CUBENAME6 226,470
KEY_CUBENAME7 5
KEY_CUBENAME8 8,835
KEY_CUBENAME9 124
KEY_CUBENAMEA 14
KEY_CUBENAMEB 6
KEY_CUBENAMEC 295
KEY_CUBENAMED 381Arun,
The cube in QA and PROD are compressed. Index building and statistics are also up to date.
But I am not sure what other jobs are run by BASIS as far as this cube in production is concerned.
Is there any other Tcode/ Func Mod etc which can give information about the #distinct values of this Index or dimension table?
One basic question, As the DIM key is the primary key in the dimension table, there cant be duplicates.
So, how would the index on Ftable on this dimension table show #distinct values less than the entries in that dimension table?
Should the entries in dimension table not exactly match with the #Distinct entries shown in
Index /BIC/ECUBENAME~P on this DIM KEY? -
Regarding Dimension Table and Fact table
Hello,
I am having basic doubts regarding the star schema.
Let me explain first regarding star schema.
Fact table containes Key fiigures and Dim IDs,Ok,
These DIm ids will be connected to my dimension tables.The Dimension table contains Characterstics and these Dim ids ,Ok.
Then My basic doubt
1.How does DIm id will be linked to SID tables
2.If I have not maintained any master data or text or Heirachies then SID tables will it be generated or not?
3.If it is generated I think there is use of This SID now..as we have not maintained Master data.
4.I am haing 18 characterstic which are no way related to each other in that scnerio how does Dimensions have to identified.?or we need to inclued whole chracterstics in one dimensions or we need to create seprate dimesnions for each of them..?(max is 13 dimensions)
5.If Dimension table contains dim ids and characterstics then where does the values for characterstics will be stored...?
( for ex..sales rep is characterstics for this we will be giving values some names where does these values will be stored..)hi Vasu,
e.g we have infocube with
- dimension 'location' -> characteristic 'sales rep', 'country'
- dimension 'partner'.
fact table
dim-id('sales person') dim-id('partner') revenue
1001 9001 500
1002 9002 300
1003 9004 200
dimenstion table 'location'
dim-id sid-id(sales rep) sid-id(country)
1001 3001 5001
1002 3004 5004
1003 3005 5001
'sales rep' sid table
sid sales rep
3001 abc
3004 pqr
3005 xyz
'country' sid table
5001 country1
5004 country2
so from the link dim-id and sid, we get
"sales rep report"
sales-rep revenue
abc 500
pqr 300
xyz 200
"country report"
country revenue
country1 700
country2 300
hope it's clear. -
Dimension Table populating data
Hi
I am in the process of creating a data mart with a star schema.
The star schema has been defined with the fact and dimension tables and the primary and foreign keys.
I have written the script for one of the dimensions and would like to know when the job runs on a daily basis should the job truncate the table every day and rebuild the dimension table or should it only add new records to the table. If it should add only
new records to the table how do is this done?
I assume that the fact table job is run once a day and only new data is added to it?
ThanksIt will depend on the volume of your dimensions. In most of our projects, we do not truncate, we update only updated rows based on a fingerprint (to make the comparison faster than column by column), and insert for new rows (SCD1). For SCD2 we apply
similar approach for updates and inserts, and expirations in batch (one UPDATE for all applicable rows at the end of the package/ETL).
If your dimension is very large, you can consider truncating all data or deleting only affected modified rows (based on nosiness key) to later reload those, but you have to be carefully maintaining the same surrogate keys reference by your
existing facts.
HTH,
Please, mark this post as Answer if this helps you to solve your question/problem.
Alan Koo | "Microsoft Business Intelligence and more..."
http://www.alankoo.com -
How and when does a dimension table gets generated
Hi Gurus,
I am new into BI and I will be put into a project within 2 months. I have learned that dimension table contains the sid's of all the charateristics in the dimension table. My conclusions are like
1. Dimension table contains the dim id as the primary key.
2. Dimension table contains sid's of the characteristics.
3. Though sid's in the dimension table are primary keys in thier 'S' table they are not key in the dimension table.
My question is
1. Is there any chance to generate new dim id's for the same combination of sid's because sid's are not part of the key?
2. I got confused when and how does the dimension table gets generated ?
I have searched in the forum and google but still my doubts didnt get clarified. If anyone could throw some light on this topic I would really appreciate it.HI,
All your conclusions are correct.
Now for your questions the answers are in line:
1. Is there any chance to generate new dim id's for the same combination of sid's because sid's are not part of the key?
No new dim id's will be generated, dim Id is unique for the same combination of sid's .
2. I got confused when and how does the dimension table gets generated ?
They get generated when you activate the info provider.
Hope this helps.
thanks,
Rahul -
None of the dimension tables are compatible with the query request
Hi,
i am experiencing the below error while querying columns alone from employee dimension (w_employee_d) in workforce profile SA. There is only one column in my report which is employee number coming from employee dimension. when i query other information like job, region, location etc i am not getting any error. the below error appears only when querying columns from employee dimension. the content tab level for the LTS of employee dimension is set to employee detail.
View Display Error
Odbc driver returned an error (SQLExecDirectW).
Error Details
Error Codes: OPR4ONWY:U9IM8TAC:OI2DL65P
State: HY000. Code: 10058. [NQODBC] [SQL_STATE: HY000] [nQSError: 10058] A general error has occurred. [nQSError: 43113] Message returned from OBIS. [nQSError: 43119] Query Failed: [nQSError: 14077] None of the dimension tables are compatible with the query request Dim - Employee.Employee Number. (HY000)
SQL Issued: SELECT 0 s_0, "Human Resources - Workforce Profile"."Employee Attributes"."Employee Number" s_1 FROM "Human Resources - Workforce Profile" FETCH FIRST 65001 ROWS ONLY.
couldn't able to know the exact reason. Any suggestions would be highly appreciated.
Regards.hi user582149,
It is difficult to answer you question with such a little amount of details. Could you specify:
- how many facts/dimensions are you using in the query?
- what is the structure of your Business Model?
- which version of OBI are you using?
- what does your log say?
I hope to tell you more having the information above
Cheers -
Problem with populating a fact table from dimension tables
my aim is there are 5 dimensional tables that are created
Student->s_id primary key,upn(unique pupil no),name
Grade->g_id primary key,grade,exam_level,values
Subject->sb_id primary key,subjectid,subname
School->sc_id primary key,schoolno,school_name
year->y_id primary key,year(like 2008)
s_id,g_id,sb_id,sc_id,y_id are sequences
select * from student;
S_ID UPN FNAME COMMONNAME GENDER DOB
==============================
9062 1027 MELISSA ANNE f 13-OCT-81
9000 rows selected
select * from grade;
G_ID GRADE E_LEVEL VALUE
73 A a 120
74 B a 100
75 C a 80
76 D a 60
77 E a 40
78 F a 20
79 U a 0
80 X a 0
18 rows selectedThese are basically the dimensional views
Now according to the specification given, need to create a fact table as facts_table which contains all the dim tables primary keys as foreign keys in it.
The problem is when i say,I am going to consider a smaller example than the actual no of dimension tables 5 lets say there are 2 dim tables student,grade with s_id,g_id as p key.
create materialized view facts_table(s_id,g_id)
as
select s.s_id,g.g_id
from (select distinct s_id from student)s
, (select distinct g_id from grade)gThis results in massive duplication as there is no join between the two tables.But basically there are no common things between the two tables to join,how to solve it?
Consider it when i do it for 5 tables the amount of duplication being involved, thats why there is not enough tablespace.
I was hoping if there is no other way then create a fact table with just one column initially
create materialized view facts_table(s_id)
as
select s_id
from student;then
alter materialized view facts_table add column g_id number;Then populate this g_id column by fetching all the g_id values from the grade table using some sort of loop even though we should not use pl/sql i dont know if this works?
Any suggestions.Basically your quite right to say that without any logical common columns between the dimension tables it will produce results that every student studied every sibject and got every grade and its very rubbish,
I am confused at to whether the dimension tables can contain duplicated columns i.e column like upn(unique pupil no) i will also copy in another table so that when writing queries a join can be placed. i dont know whether thats right
These are the required queries from the star schema
Design a conformed star schema which will support the following queries:
a. For each year give the actual number of students entered for at A-level in the whole country / in each LEA / in each school.
b. For each A-level subject, and for each year, give the percentage of students who gained each grade.
c. For the most recent 3 years, show the 5 most popular A-level subjects in that year over the whole country (measure popularity as the number of entries for that subject as a percentage of the total number of exam entries).
I written the queries earlier based on dimesnion tables which were highly duplicated they were like
student
=======
upn
school
school
======
school(this column substr gives lea,school and the whole is country)
id(id of school)
student_group
=============
upn(unique pupil no)
gid(group id)
grade
year_col
========
year
sid(subject id)
gid(group id)
exam_level
id(school id)
grades_list
===========
exam_level
grade
value
subject
========
sid
subject
compulsory
These were the dimension table si created earlier and as you can see many columns are duplicated in other tables like upn and this structure effectively gets the data out of the schema as there are common column upon which we can link
But a collegue suggested that these dimension tables are wrong and they should not be this way and should not contain dupliated columns.
select distinct count(s.upn) as st_count
, y.year
, c.sn
from student_info s
, student_group sg
, year_col y
, subject sb
, grades_list g
, country c
where s.upn=sg.upn
and sb.sid=y.sid
and sg.gid=y.gid
and c.id=y.id
and c.id=s.school
and y.exam_lev=g.exam_level
and g.exam_level='a'
group by y.year,c.sn
order by y.year;This is the code for the 1st query
I am confused now which structure is right.Are my earlier dimension tables which i am describing here or the new dimension tables which i explained above are right.
If what i am describing now is right i mean the dimension tables and the columns are allright then i just need to create a fact table with foreign keys of all the dimension tables. -
What kind of throughput should I expect? Anyone using AQ in high volume?
Hi,
I am working with AQ in a 10.2 environment and have been doing some testing with AQ. What I have is a very simple Queue with 1 queue table. The queue table structure is:
id number
message varchar(256)
message_date date
I have not done anything special with storage paramteres, etc so it's all defalt at this point. The I created a stored procedure that will generate messages given message text and number of times to loop. When I run this procedure with 10,000 iterations it runs in 15 seconds (if I commit all messages at the end) and 24 seconds if I commit after each message (probabliy more realistic).
Now, on the same database I have a straight table that contains one column (message varchar(256)). I have also created a similiar storage procedure to insert into it. For this, 10,000 inserts takes about 1 second.
As you can see there is an order of magnitude of difference so I am looking to see if others have been able to achieve higher throughput than 500-700 messages per second and if so what was done to achieve it.
Thanks in advance,
BillYes, I have seen it. My testing so far hasn't even gotten to the point of concurrent enqueue/dequeue. So far I have focused on enqueue time and it is dramatically slower than a plain old database table. That link also discussed mutliple indexed organized tables being created behind the scenes. I'm guessing that the 15X factor I am seeing is because of 4 underlying tables, plus they are indexed organized which adds additional overhead.
So my question remains - Is anyone using AQ for high volume processing? I suppose I could create a bunch of queues. However, that will create additional management on my side which is what I was trying to avoid by using AQ in the first place.
Can one queue be served by multiple queue tables? Can queue tables be partitioned? I would like to minimize the number of queue so that the dequeue processes don't have to contain multiplexed logic.
Thanks
Maybe you are looking for
-
Events on iCal not showing up on my iPhone 4
I downloaded iCloud and synced my iPhone calendar with my calendars on my mac. Before I did not have mobile me, so they were two seperate calendars that I could sync and update when I synced my phone to my computer. Now that I've merged the two toget
-
Hi, I just set up a 3rd-gen Apple TV. W/ iTunes Match & Home Sharing, I'm able to play my songs or TV shows on my TV. Then, I installed the Apple Remote app on my iPad. It points to my Apple TV but then can displays & plays only the songs, not the
-
ITunes 7.2 error message (-9813)
Since installing iTunes 7.2, everything seems to be working and my podcasts are downloading, but when I try to purchase music, I get an error message saying, "iTunes could not connect to the iTunes Store. An unknown error occurred (-9813). Make sure
-
Non-global zones on a SAN???
Hi everyone, i have a question that's probably been asked before and i'm sure many others are interested in knowing the answer. Is it possible to store non-global zone(s) on a SAN? The idea being that if the server hosting the non-global zone(s) dies
-
DBMS_JAVA.GRANT_PERMISSION
Hi all anyone know when/why do I need to grant permissions to myself? I created a small java class, created the wrapper plsql, but when I run it, complains about permissions. Funny thing is, another simple class I had tried before didnt require anyth