Star schema design question

Hi!
I`m trying to build a dw with patient data. I`m mostly interested in the patients` visits table, which will be used for a facts table and the patient table.
The patient table contains among others the following data:
patient (id, sex, marital_status, nationality, profession, zip).
My question is what should I do with the "patient" table. There seem to be three options:
1. Use it to build a one level dimension and just use the above fields as simple dimension attributes.
2. Create separate hierarchies for each attribute in the patient dimension
(e.g. patient -> sex, patient -> nationality, patient -> zip -> city e.t.c.)
3. Create separate dimensions for each attribute (e.g. a professions dimensions which will be joined with the patient visits facts table) and no patient dimension.
What would you do?
Thank you...

I guess my answer would be based on how you intend to physically implement. If you are deploying this relationally, then I would definitely have a single patient dimension table. Perhaps build a hierarchy for patient -> zip -> city, but I'd keep gender and nationality as simple "attributes" of the dimension.
If you're going to deploy this through an OLAP cube, its a bit more complex, because the standard Oracle OLAP functionality makes it impossible to do sums over attributes unless they are part of the hierarchy. In that case, I guess I'd deploy multiple dimensions (geneder, nationality, one a generic patient one that includes the hierarchy patient -> zip -> city).
If you attempt to do your option #2, you'll be in trouble if you need to show queries that cross the hierarchies, i.e. "Male American patients who live in zip code 12345".
Hope this helps,
Scott
p.s. you could also deploy this relationally, and then put an AW on top of the relational model.

Similar Messages

Star schema design, metrics dimension or not.

Hello Guys,
I just heard from one of my colleagues that its wise to
have an "KPI" or "metrics" dimension in my DWH star schema (later used in OBIEE).
Now, we have quite a lot of data 100 000 rows per day (botton leve, non-aggregated, the aggregations are obviously far less then that, lets say 200 rows per day) and
we have build pre-aggregated data marts for each of the 5 very static reports (OBIEE Publisher).
The table structure is very simple
e.g.
Date,County,NumberofCars,RevenuePerCar, ExpensesPerCar, BreakEvenPerCar, CarType
One could exclude the metrics "NumberofCars","RevenuePerCar", "ExpensesPerCar", "BreakEvenPerCar"
and put them into a metrics dimension.
MetricID Metric
1 NumberofCars
2 RevenuePerCar
3 ExpensesPerCar
4 BreakEvenPerCar
and hence the fact table design would be simpler.
Date,County,MetricID,Metric, CarType
Disadvanatages: A join is required
We would have to redesign our tables
tables are not aggregated anymore for specific metric types
if we notice performance is bad, we would need to go back to the old design
Advantages : Should new metrics appear, we dont have to change the design of the tables
its probably best practice
Note: date, country and cartype are already dimensions. we are just missing one to differentiate the metrics/KPI's
So I struggle a bit, what should I do? Redesign, or stick to the way I have done it, having
performance optimization in mind.
Thanks

"Usually the date is stored in sales table or product table.
ut here why they created separate Dimension table for date(Dim_date)? "
You should provide the link.
A good place to start with the basic concepts is :
http://www.ralphkimball.com/
Pick up some of his books and start going through them.
My recommendation would be
The Data Warehouse Toolkit, 2nd Edition: The Complete Guide to Dimensional Modeling
John Wiley & Sons, 2002 (436 pages
Good Luck.,

Star schema design

Hi,
I know that in classical star schema the dimension tables sits within the info cube and so we cannot use this dimension table in any other cube we need to have separate dimension table for that cube thought it might be having same data. I also know to over come this redundancy extended star schema came into picture where we have SID table and we keep the dimension table out of the cube and reuse the dimension tables across many cubes.
Now what i don't understand is that instead of having Separate SID tables for linking the dimension and fact tables why cant we make the DIMENSION table generic and keep them out of the infocube so that we can same the same dimension table for many infocube in this case we wont need SID tables.
suppose i have one info cube which has dimension vendor material and customer and its keyfigure is quantity and price and i have a separate infocube which has dimesnion material customer and location and its key figure is something else ......so here in why cant i keep the dimensions out of the infocube and use the dimension material customer for both infocube.

Your dimension tables are filled based on your transaction data - which is why dimension table design is very important you decide to group related data for the incoming transaction data into your dimension tables .
The dimension tables have SIDs which in turn point to master data = in the classic star schema - the dimension tables are outside the cube but the dim tables have the master data within them whhich is overcome using the extended star schema.
The reason why dimension tables can be reused is that the dim IDs and SIDs in the simension table correspond to the transaction data in the cube - and unless the dim IDs in both your cubes match you cannot reuse the dim tables - which means that you have exactly the same data in both the cubes - which means you need not have two cubes with the same data.
Example :
Cube 1 : Fact Table
Dim1ID | DIM2ID | KF1
1|01|100
2|02|200
Dimension Table : Dim 1 ( Assumin that there are 2 characteristics in this dimension ) - here the DIM1ID is Key
Dim1ID | SID1 | SID2
1|20|25
2|30|35
Dimension Table Dim 2 - Here the Dim2ID field is key
Dim2ID| SID1 | SID2| SID3
01| 30| 45
02|45|40
Here the Dim IDs for the cube Fact table are generated at the time of load and this is generated from the NRIV Table ( read material on Number Ranges ) - this meanns that you cannot control DIM ID generation across cubes which means that you cannot reuse Dimension Tables

STAR schema designing

Hi,
What are the factors or the things that we should consider while designing star schema ?
Thanks
Vaishali

Vaishali,
The major things to be considered while designing start could be ..
1. Proper maintenance of Master data which will be shared across all the infocubes.
2. Deciding upon dimensions. Which characteristics should be assigned to which dimension? Try to reduce the number of dimensions.
3. Design deimensions based upon the characteristic relation 1:M...
4. If the char has more no of values for eg, document number which come in huge volumes to BW that can be designed as Line item dimension instead of assigning it to a dimension.....
Hope this helps you.....

Star Schema/Cube Question

I am fairly new to OLAP and cubes, yet have the task of creating one from a star schema.
The schema has 2 fact tables, instead of most examples I see online with 1 fact table. Should this be 2 cubes? Can it be 1? Any information could prove helpful...thanks!

Assuming the fact tables have related dimensions, then I would recommend one Analytic Workspace with multiple cubes inside that AW. That would allow them to share one or more dimensions. There is an example of this in the Oracle By Example lesson: http://www.oracle.com/technology/obe/obe10gdb/bidw/awm/awm.htm
I am fairly new to OLAP and cubes, yet have the task
of creating one from a star schema.
The schema has 2 fact tables, instead of most
examples I see online with 1 fact table. Should this
be 2 cubes? Can it be 1? Any information could
prove helpful...thanks!

XML Schema Design question

Hello all,
I am a new to XML Schema design and struggling with designing my first XML Schema. Here is my problem.
I have almost 200 elements in my database which I have to use to design XSDs, I have customer name ( last name, first name, dob etc) and customer address ( office address, home address etc) and at the same time I have other player such as Patient name ( last name, first name, dob etc) and patient address( home address, office address).
And then we have some dollor related data such as patient insurance amount, copay amount , total amount and many more $.
What I wanted to know is that what would be best approach to design a XML schema for such kind of system, should I create one schema (xsd) for all the 200 attributes or I should create seperate schema for customers ( including names and addresses along with other dollor amount data ) and a similar xsd for Patient data. Some of the XML documents which I will create from these schemas would be based on customers and patient information both .
Thank you.
Regards
Suhail Ahmad

It's hard to tell what is the best design. But in order to simplify access to these data through other program APIs such as JAXB, you may start from defining objects such as the schema types/elements for customer, patient and the addresses
Then you can assoicate the related data to these object.

Download Dimensional Modeling Book (Star Schema Design Book)

Hello Friends,
You can download an IBM Book on Dimensional Modeling from the following
link:
http://www.redbooks.ibm.com/abstracts/sg247138.html?Open
This is an excellent IBM Redbook on Data warehouse design and
Dimensional Modeling design.
Great to read,
Dave

Dear Krish,
Please try this link and let me know if you can open.If not I will email you the Book:
http://www.redbooks.ibm.com/abstracts/sg247138.html?Open
My Email is [email protected]
Thanks
Dave

Download IBM Book on Star Schema Design (Dimensional Modeling)

Hello Friends,
You can download an IBM Book on Dimensional Modeling from the following link:
http://www.redbooks.ibm.com/redbooks.nsf/e9abd4a2a3406a7f852569de005c909f/e235dc46161249d38525703e00036135?OpenDocument
The Book is very well explained.

Dear Krish,
Please try this link and let me know if you can open.If not I will email you the Book:
http://www.redbooks.ibm.com/abstracts/sg247138.html?Open
My Email is [email protected]
Thanks
Dave

How you we design and create a star schema in Oracle BI?

We can use Informatica to generate ETL. But how do we design the star-schema?? Is there a design tool like Oracle Designer??
What is the purpose of the DAC??

Hi,
You can handle the star schema design in the BMM layer. No separate tool for that.
Refer-
http://gerardnico.com/wiki/data_modeling/star_schema
DAC-
Data Warehouse Application Console (DAC) works together with Informatica to acomplish the ETL for pre-packaged BI application.
- DAC publish the changes from OLTP
- Informatica extracts the changes from the change log published by DAC as well as from the base table
- Informatica load the data to the stage tables and the target tables in the data warehouse
- DAC manage the performance by dropping indexes, truncating stage tables, rebuilding the indexes, and analyzing the tables during the process
If you do not use DAC, you have to write your own custom change capture process and need to redesign from scratch an ETL method that allow the restart of the ETL process from point of failure at record level. The biggest saving is that DAC can survive during the upgrade , while your custom processes cannot.
Refer-
http://obieetraining11.blogspot.in/2012/06/how-to-use-dac-source-system-parameter.html
Hope this helped/ answered.
Regards
MuRam
Edited by: MuRam on Jun 25, 2012 2:37 AM

Extended Star Schema

Hi
Right now I'm trying to implement SAP HR Infotypes into SAP BW. I try to combine 10 infotypes (all Personal Administration) into one infocube. The confusing one is that when I see my dimension, there something strange (please see below)
Dimension 1 (from PA0016)
ZDATEFROM16
ZDATETO16
ZXXX (60 character, don't have master date attribute or text)
ZYYY (30 character, don't have master date attribute or text)
ZAAA(3 character, only master data text)
Dimension 2 (from PA0023)
ZDATEFROM23
ZDATETO23
ZCCC (60 character, don't have master date attribute or text)
ZBBB (30 character, don't have master date attribute or text)
ZDDD (20 character, don't have master date attribute or text)
If i see my dimension, it seems strange because most of them don't have master data (text,attr,hierarchy).
Is this violate the extended star schema design? Thank you.
Regards,
Satria

You can use characteristics such as ZXXX without master data (text, attr, hier) in extended star schema. It is not a violation. This means in business, for this characteristic you really don't need its text, attribute or hierarchy.
When perform data modeling for you HR cube, please do consider the design, can some characteristic be attribute of another one? Will you use hierarchy such organization level for some characteristic? characteristic with text, attribute or hierachy can provide more flexibility in reporting.

Extended Star Schema doubt

Hi ALL,
Fact table --- Dimension table -
Sid Table -- master data values.
Fact table:
Contains Key Figures and Dimensional id's
Dimension table:
Contains Dim Id & SID
SID table:
SID and Master object ID ( eg: Customer ID)
Master Data Table:
contains Customer Attributes , Texts , Hierarchies for that Customer ID.
THis is the Extended Star Schema design.
MY DOUBT:
1)In Dimension table itself if we place that Customer ID(No SID table next) what will happen?
Fact table --> Dimension table --> Master data Table
2)Instead of that SID table we can directly place that CustomerID in Dimension table , so we can reduce one layer inbetween Dimendion table and Master data table.Is it correct or not?
Any one can clarify my doubt.
Regards,
Arun.M.D

SID means 'surrogate ID'. That is an system created id as you know. Main purpose is fastening the search.
Mostly, there exists a rule for Customer or Material ID's.
Like it should be CHAR 10 or CHAR 16.
This kind of alpha-numeric fields are harder to search when compared to integers. Moreover, your customer id can be 10 digits but, this does not mean you will have 1000000000 customers. This is the main reason that, an internal ID is produced. If you have 10000 customers, your SID will be at most 10000.
However, if your customer ID's are starting from 1 and growing up like integers, then your argument would be true. ( but still no way to skip SID creation and direct usage of characteristic ID in fact table)
Also, as mentioned by other friends, there exist the Line Item dimension property if you have only one characteristic in one dimension. That simply does skip the DIM ID creation step, and puts your SID into the Fact table. ( Since you have only one char in the dimension, no combination is possible)
Hope this helps.
Derya

Star Schema and Oracle 11gR2 ?

Star Schema and Oracle 11gR2 ?
I know the star schema (ROLAP) and implemented couple of them. Apart from general design principle of dimension, FACT, surrogate key etc, what are the specific items needed in Oracle 11gR2?
Some one talked about over 10 conditions/pre-requisits for Star Schema (ROLAP) implementations in Oracle 11gR2. I did some search, but I did not get any hits.
Do we design Star schema (ROLAP) differently in Oracle 11gR2?
Any pointer welcome.
Thanks in helping.

Hi,
from my experience there are no specific requirements for the star schema design when using owb 11.2.
When using the OWB ETL Option (extra license required), one may use the owb dimensions and cubes.
These make mapping development easier, since support for SCD2 is built into the dimension operators. Loading the cube is simplified because the lookup of the surrogate key from the dimension is built into the cube operator.
These owb objects will deploy specific dimension and fact tables. If you already have existing ones, you must modify them manually.
I implemented several projects without these advanced features. Baiscally I did the same in OWB what I would have done using hand-coded SQL and PL/SQL. And it worked just fine.
If you find those 10 conditions, please post them here. I'm curious to learn about them!
Regards,
Carsten.

Newbie question : why is star schema fast and efficient?

Hi all,
just a stupid question, but I haven't been able to find a proper
answer so far...
Why is star schema a good design for Data Marts and DWH?
What is the underlying reason that makes it attractive
performance wise?
Why wouldn't just one big table with all the data in it and with
the proper indexes be enough?
Thanks all!!
Regards
Vincent

There are several reasons to use star schemas, particularly in
Oracle.
A flat table like you asked about looks attractive but has
several flaws, i.e. massive data redundancy, no logical
groupings, no aggregation (or additional redundant data
aggregated), etc.
A start schema is semi-denormalized to allow easy reporting. A
truely normalized system is diffucult to report against be cause
you may have to join many tables to return just 2 pieces of
related data. A star schema enables you to join to only a single
dimension table to the fact table to return the same 2 pieces of
data. If you're returning many pieces of data, a star schema
keeps access very simple. Most third party reporting tools
recognize star schemas and will build your where clauses behind
the scenes making them a lot more useful to end users.
Oracle is adding optimizations to the cbo for start schemas.
Using dimensions, materialized views, partitions, IOTs, etc
greatly enhances performance for queries against massive amounts
of data. It does make loading the data more difficult but the
trade off at query time is worth it.
A flat table structure, besides having a lot of redundant data,
is hard to optimize. When you have terebytes of data, a flat
table structure gets scary even with indexes.
This is just my opinion, hope that helps.
Lewis

Design Fixed Assets Star Schema from OLTP DB

Hi,
Scope : Design Fixed Assets logical Star Schema for Demonstrate with OLTP tables
Our platform is Oracle 10g Forms and Reports deployed on Oracle 10g App Server._ At the moment we don’t have data warehouse constructed. We are pumping OLTP Data into Staging DB thru jobs and then by using materialized views getting the data in DWH DB which is in progress.
OBIEE 10g installed and working fine for testing.
We are planning to implement OBIEE for Reporting. As a starting point I would like to design Fixed Assets star schema to demonstrate FA reports in OBIEE from OLTP Tables.
FA- OLTP Tables:_
AC_UNIT_MASTR           – Business Units
AC_ACNT_MASTR           - Nominal Codes
AC_COST_CENTR_MASTR      - Departments
AC_MANTN_ASTS           - Assets transaction table
AC_AST_BAL_DETLS           – Period wise Asset summary
AC_AST_BAL_DETLS_V      - View
To achieve this
1).Import tables in physical layer and set physical joins
2).identify and create dimensions and set complex joins and then move to presentation layer.
Please suggest best approach to design Repository
Thanks
Regards,
Kulkarni
Edited by: hai_shailesh on Jan 25, 2012 11:36 PM
Edited by: hai_shailesh on Jan 25, 2012 11:38 PM
Edited by: hai_shailesh on Jan 25, 2012 11:39 PM

Hi Saichand,
Thanks for the response.
Already i referred that doc and completed practically. now i have to work on Finance data . as a starting point working with Fixed Assets module. already i designed centralised fact with dimensions as below
in physical&BMM layer defined relationship as below()
AC_UNIT_MASTR_D                  --> AC_AST_BAL_DETLS_F
AC_ACNT_YEAR_MASTR_D        --> AC_AST_BAL_DETLS_F
AC_ACNT_PERD_MASTR_D        --> AC_AST_BAL_DETLS_F
AC_COST_CENTR_MASTR_D      --> AC_AST_BAL_DETLS_F
AC_AST_GRP_MASTR_D            --> AC_AST_BAL_DETLS_F
AC_AST_SUBGRP_MASTR_D      --> AC_AST_BAL_DETLS_F
and
AC_ACNT_YEAR_MASTR_D        --> AC_ACNT_PERD_MASTR_D
When iam trying to create a dimension for Periods(AC_ACNT_PERD_MASTR) , Periods dimesnion created with two tables
AC_ACNT_YEAR_MASTR_D,AC_ACNT_PERD_MASTR_D
Please advice..
Regards,
Kulkarni

Star schema question

Hi,
I have a question about the realization of the star schema. I have familiarized me with the basic concepts of dimensions and fact tables. But what I don’t get is how I “combine” the dimensions with the fact table. I know that the fact table includes the dimension-IDs and measures. But do I use the joiner-operator in the OWB to join the dimension-ID (IDs of the dims are the criteria for the joiner condition) to create the fact table?
So my understanding is when I have for example 3 dimensions (product dimension, sales dimension, time dimension) and one fact table.
The realization looks like this:
product dim ->
sales dim -> joiner operator = fact table with the IDs of the dims and measure
time dim ->
Please correct me if I am wrong.
If there is something that I can read to this subject of matter it would be very nice if someone could post it.
Thx

Hi,
first you load the dimensions. Every entry has an id (surrogate key) and some business key (coming from the data source).
When you load the fact, you use the business key from the data source to join (using a joiner or lookup operator) the dimension and get the id (surrogate key) from it. You only load the id and the measures into the fact table.
Make sure to handle the case that the business key is null or no entry in the dimension can be found.
If you query the fact table you must always join the dimensions.
Regards,
Carsten.

Star schema design question

Similar Messages

Maybe you are looking for