Surrogate key for fact
Hi all,
i don't know that in ETL update of existing records in fact can happen or not. IF it happen then how can i handle it? is it recommendable to implement surrogate key for fact tables to update the existing records of fact? That work like primary key for fact table. This doesn't have any relation with OWB. This is a general concept about ETL.
Regards,
Sumanta
Yes, you can update a fact.
You generally do this by adding a unique constraint to the fact table which comprises the identifying primary key columns from the source tables, but exclusive of the surrogate key. The surrogate key will be the actual primary key of the table.
Now, when you make an insert/update mapping you include a sequence that generates your surrogate key, and then select the 'match by' on the fact table to match by the unique key (actual source table values which determine uniqueness) rather than the primary key. Then, on the surrogate key column on the fact table which is being populated by the sequence, you set its property to not update the surrogate key when an update is being applied to the row.
This way the update statement will not update the surrogate key when updating the fact based upon the source identifiers that identify this fact.
I'm sure I could explain this more clearly... but I'm very pressed for time right now.
Cheers,
Mike
Similar Messages
-
How we use Surrogate Keys for snowflake dimension
Hi All,
my question is - How we use Surrogate Keys for snowflake dimension
i heard from some body Surrogate Keys only work with star schema.
please correct me if i wrong.
Regards,
ManishHi manishcal16PPS,
According to your description, you can only create natural key in your dimension. But it's not working when using surrogate key. Right?
In Analysis Services, the snowflake schema of the dimensions are represented by more than one dimension table in other words its takes multiple dimension tables to define a dimension. Surrogate key are just some extra, redundant, unique key based on the
natural key. So there's no direct relationship or some limitations between surrogate keys and snowflake schema.
In this scenario, since there's relationship between the two dimensions, you should create natural key. For using natural key or surrogate key. Please refer to an article below:
Surrogate Key vs. Natural Key
For understanding star/snowflake schema, please see:
Understanding Star and Snowflake Schemas
Regards,
Simon Hou
TechNet Community Support -
Best Practice loading Dimension Table with Surrogate Keys for Levels
Hi Experts,
how would you load an Oracle dimension table with a hierarchy of at least 5 levels with surrogate keys in each level and a unique dimension key for the dimension table.
With OWB it is an integrated feature to use surrogate keys in every level of a hierarchy. You don't have to care about
the parent child relation. The load process of the mapping generates the right keys and cares about the relation between the parent and child inside the dimension key.
I tried to use one interface per Level and created a surrogate key with a native Oracle sequence.
After that I put all the interfaces in to one big Interface with a union data set per level and added look ups for the right parent child relation.
I think it is a bit too complicated making the interface like that.
I will be more than happy for any suggestions? Thank you in advance!
negib
Edited by: nmarhoul on Jun 14, 2012 2:26 AMHi,
I do like the level keys feature of OWB - It makes aggregate tables very easy to implement if your sticking with a star schema.
Sadly there is nothing off the shelf with the built in knowledge modules with ODI , It doesnt support creating dimension objects in the database by default but there is nothing stopping you coding up your own knowledge module (use flex fields maybe on the datastore to tag column attributes as needed)
Your approach is what I would have done, possibly use a view (if you dont mind having it external to ODI) to make the interface simpler. -
Loading data into Fact/Cube with surrogate keys from SCD2
We have created 2 dimensions, CUSTOMER & PRODUCT with surrogate keys to reflect SCD Type 2.
We now have the transactional data that we need to load.
The data has a customer id that relates to the natural key of the customer dimension and a product id that relates to the natural key of the product dimension.
Can anyone point us in the direction of some documentation that explains the steps necessary to populate our fact table with the appropriate surrgoate key?
We assume that we need to have an lookup table between the current version of the customer and the incoming transaction data - but not sure how to go about this.
Thanks in advance for your help.
LauraHi Laura
There is another way to handling SCD and changing Facts. This is to use a different table for the history. Let me explain.
The standard approach has these three steps:
1. Determine if a change has occurred
2. End Date the existing record
3. Insert a new record into the same table with a new Start Date and dummy End Date, using a new surrogate key
The modified approach also has three steps:
1. Determine if a change has occurred
2. Copy the existing record to a history table, setting the appropriate End Date en route
3. Update the existing record with the changed information giving the record a new Start Date, but retaining the original surrogate key
What else do you need to do?
The current table can use the surrogate key as the primary key with the natural key being being a unique key. The history table has the surrogate key and the end date in the primary key, with a unique key on the natural key and the end date. For end user queries which in more than 90% of the time go against current data this method is much faster because only current records are in the main table and no filters are needed on dates. If a user wants to query history and current combined then a view which uses a union of the main and historical data can be used. One more thing to note is that if you adopt this approach for your dimension tables then they always keep the same surrogate key for the index. This means that if you follow a strict Kimball approach to make the primary key of the fact table be a composite key made up of the foreign keys from each dimension, you NEVER have to rework this primary key. It always points to the correct dimension, thereby eliminating the need for a surrogate key on the fact table!
I am using this technique to great effect in my current contract and performance is excellent. The splitter at the end of the map splits the data into three sets. Set one is for an insert into the main table when there is no match on the natural key. Set two is when there is a match on the natural key and the delta comparison has determined that a change occurred. In this case the current row needs to be copied into history, setting the End Date to the system date en route. Set three is also when there is a match on the natural key and the delta comparison has determined that a change occurred. In this case the main record is simply updated with the Start Date being reset to the system date.
By the way, I intend to put a white paper together on this approach if anyone is interested.
Hope this helps
Regards
Michael -
How to Maintain Surrogate Key Mapping (cross-reference) for Dimension Tables
Hi,
What would be the best approach on ODI to implement the Surrogate Key Mapping Table on the STG layer according to Kimball's technique:
"Surrogate key mapping tables are designed to map natural keys from the disparate source systems to their master data warehouse surrogate key. Mapping tables are an efficient way to maintain surrogate keys in your data warehouse. These compact tables are designed for high-speed processing. Mapping tables contain only the most current value of a surrogate key— used to populate a dimension—and the natural key from the source system. Since the same dimension can have many sources, a mapping table contains a natural key column for each of its sources.
Mapping tables can be equally effective if they are stored in a database or on the file system. The advantage of using a database for mapping tables is that you can utilize the database sequence generator to create new surrogate keys. And also, when indexed properly, mapping tables in a database are very efficient during key value lookups."
We have a requirement to implement cross-reference mapping tables with Natural and Surrogate Keys for each dimension table. These mappings tables will be populated automatically (only inserts) during the E-LT execution, right after inserting into the dimension table.
Someone have any idea on how to implement this on ODI?
Thanks,
DaniloHi,
first of all please avoid bolding something. After this according Kimball (if i remember well) is a 1:1 mapping, so no-surrogate key.
After that personally you could use Lookup Table
http://www.odigurus.com/2012/02/lookup-transformation-using-odi.html
or make a simple outer join filtering by your "Active_Flag" column (remember that this filter need to be inside your outer join).
Let us know
Francesco -
Surrogate keys -- which field to be used?
Hello,
I am wondering what field would be best suited to be used as surrogate key for a data warehouse. Since my fact table is likely to have hundreds of thousands of records, I am thinking that uniqueidentifier makes more sense than some type of integer.
Any thoughts on this?
Also I am designing my DW (I am really starting with a data-mart) with a lot of data coming from Dynamics CRM and I was wondering if a surrogate key is really necessary if we are not going to ever delete records from the data sources.
Any advice and insights is greatly appreciated.
Regards,
P.Since my fact table is likely to have hundreds of thousands of records, I am thinking that uniqueidentifier makes more sense than some type of integer.
Hello,
100 K is very less on data. Integer data type performs best is SQL Server, much better the GUID's and with type integer you can handle 4 billion values.
Olaf Helper
[ Blog] [ Xing] [ MVP] -
What is the best practice for creating primary key on fact table?
what is the best practice for primary key on fact table?
1. Using composite key
2. Create a surrogate key
3. No primary key
In document, i can only find "From a modeling standpoint, the primary key of the fact table is usually a composite key that is made up of all of its foreign keys."
http://download.oracle.com/docs/cd/E11882_01/server.112/e16579/logical.htm#i1006423
I also found a relevant thread states that primary key on fact table is necessary.
Primary Key on Fact Table.
But, if no business requires the uniqueness of the records and there is no materilized view, do we still need primary key? is there any other bad affect if there is no primary key on fact table? and any benifits from not creating primary key?Well, natural combination of dimensions connected to the fact would be a natural primary key and it would be composite.
Having an artificial PK might simplify things a bit.
Having no PK leads to a major mess. Fact should represent a business transaction, or some general event. If you're loading data you want to be able to identify the records that are processed. Also without PK if you forget to make an unique key the access to this fact table will be slow. Plus, having no PK will mean that if you want to used different tools, like Data Modeller in Jbuilder or OWB insert / update functionality it won't function, since there's no PK. Defining a PK for every table is a good practice. Not defining PK is asking for a load of problems, from performance to functionality and data quality.
Edited by: Cortanamo on 16.12.2010 07:12 -
I need to write a MDX query to show the latest product text again historical facts or a chosen product text in time. I can write this query in TSQL, but new to MDX.
The way I do it in TSQL is joining two queries together on the Natural Key as opposed to the surrogate key.
Can this be done in MDX. I know I could write two separate MDX queries, one which get the product text I wan and the other to get the measure with the actual product text and Natural Key, and use a lookup function in ssrs to show the two result sets I the
same tablix by looking up the Natural Keys. But this should be able to be done in one query shouldn't it.
In the dsv the fact knows to join to the dimension using the surrogate key.
Thanks JHi Jamster,
According to your description, you want to write a query to show the latest product text, right?
In MDX, we can use LastNonEmpty function to return the lastest member winth a dimension. LastNonEmpty is an aggregation function available in the Enterprise version of SQL Server. However, you can create your own with a little bit of recursive MDX. Here
is a sample query for you reference.
With Member Measures.LastHits as
iif(isempty(Measures.Hits),
([Date].[Year Month Day].prevmember,
Measures.LastHits
),Measures.Hits)
Reference
http://cwebbbi.wordpress.com/2011/03/24/last-ever-non-empty-a-new-fast-mdx-approach/
http://richardlees.blogspot.com/2010/07/getting-last-non-empty-value.html
If this is not what you want, please provide us the detail structure of your cube and the expected result, so that we can make further analysis and give you the exactly MDX query.
Regards,
Charlie Liao
TechNet Community Support -
DataModeler v3.3.0 - Naming standards template for surrogate keys creation
I'm using DM 3.3.0.734 and in the logical model we now can define in the entity properties to "Create Surrogate Key".
When we use Engineer to relational model, for each entity is automatically created a new column using the naming template {entity}_ID as NUMERIC (without precision) and is defined as primary key.
My questions are:
Is possible to define a different naming standard for surrogate key creation?. We define id_{entiry}
Is possible to set the precision of NUMERIC surrogate key?
If we define entity's column as Primary UID, these columns are included in an unique constraint, but are using the naming standards for PK.
As consequence are created the following:
Unique constraint name: entity_PK
Primary key(surrogate)name: entity_PKv1
There is any way to define naming standards like "{entity}_UID" for unique constraints, or even, "{entity}_SK" for surrogate primary key name?
Can anyone help with some of these topics?
Regards,
Ariel.Hi Ariel,
Naming standards template for surrogate keys creation I logged enhancement request for that.
How to change those bad names (going to change them one by one is not an option):
1) If those "transformed" unique keys are the only ones you have in relational model then simply can apply naming standards
2) You can write transformation script to do that for you
3) you can use new functionality - search, export to excel file, change the names there (using find/replace will be faster) and return changed data back to relational model
you can find description for that here:
https://apex.oracle.com/pls/apex/f?p=44785:24:13179871410726::NO:24:P24_CONTENT_ID,P24_PREV_PAGE:6621,16
http://www.thatjeffsmith.com/archive/2012/11/sql-developer-data-modeler-v3-3-early-adopter-search/
http://www.thatjeffsmith.com/archive/2012/11/sql-developer-data-modeler-v3-3-early-adopter-collaborative-design-via-excel/
You should search for _PK, then filter result on Index and you can export result using report functionality (to XLS or XLSX output format). You can create template and include only table and name (of index) as properties to be included into report.
Regards,
Philip -
Surrogate Key and Map for Cube
Hi
I am new to Data Warehousing and am trying to use OWB 11g.
I am trying to create dimensions with multiple levels. When I create more than one level it need to have surrogate as well business key for each dimension level. But I can create only one surrogate in the dimension, there is no option to create multiple surrogate keys in the same dimension. so what am I missing?
My second question is regarding cube. Do I need to create a Mapping for a cube? if yes, should I move the data to the cube from the dimensions? and where will the measures come from? do i need to load the measures or they will be calculated automatically?
please reply...
regards
Arifhi
Got it, Yes that was the reason,
The table was not properly deployed after the dimension was modified.
Anyway, the describe of the table is as follows
describe arif.QUESTION_DIM
Name Null Type
DIMENSION_KEY NOT NULL NUMBER
IGV_ID NUMBER
PER_ID NUMBER
DIM_ID NUMBER
IGO_ID NUMBER
INQ_ID NUMBER
ID NUMBER
DIM_ORDEM NUMBER
DIM_AMBITO VARCHAR2(3)
DIM_NOME VARCHAR2(150)
10 rows selected
Now, I am having another problem,
when, I deploy the Map to load the data from three different tables, it gives the following problem
Name Action Status Log
QUESTION_MAP Create Warning ORA-06550: line 297, column 25:
PLS-00302: component 'ID' must be declared
QUESTION_MAP Create Warning ORA-06550: line 1153, column 11:
PL/SQL: SQL Statement ignored
QUESTION_MAP Create Warning ORA-06550: line 1155, column 15:
PL/SQL: ORA-00904: "QUESTION_DIM"."ID": invalid identifier
QUESTION_MAP Create Warning ORA-06550: line 1155, column 31:
PLS-00302: component 'ID' must be declared
QUESTION_MAP Create Warning ORA-06550: line 233, column 1:
PL/SQL: SQL Statement ignored
QUESTION_MAP Create Warning ORA-06550: line 2539, column 11:
PL/SQL: SQL Statement ignored
QUESTION_MAP Create Warning ORA-06550: line 2541, column 15:
PL/SQL: ORA-00904: "QUESTION_DIM"."ID": invalid identifier
QUESTION_MAP Create Warning ORA-06550: line 2541, column 31:
PLS-00302: component 'ID' must be declared
QUESTION_MAP Create Warning ORA-06550: line 297, column 9:
PL/SQL: ORA-00904: "QUESTION_DIM"."ID": invalid identifier
Edited by: user643560 on Oct 22, 2008 9:38 AM -
Suppress auto sequence and trigger DDL for surrogate keys?
Is there a way to suppress trigger and sequence creation for surrogate keys when export to DDL file?
I know most of the time the automatic sequence and trigger creation is welcome and very handy.
However I'm migrating from an old Designer model and there only the needed sequences are created.
They have a different name and trigger logic is custom (and generated outside designer).
There is a lot of package code depending on this. So I prefer to create and use different sequences.
Is there a way to achieve this? Any tips are welcome.CreateHi,
Note that generating the DDL for Oracle 12c means that it will attempt to use your Oracle 12c Physical model. So if you normally use Oracle 10g or 11g, you will find that any details from your Oracle 10g or 11g Physical Model will not be included. So this approach may have other implications for you.
If you are not using Oracle 12c, there are some relevant properties on the Auto Increment tab of the Relational Model properties dialog for the Column which may help:
Sequence Name - allows you to specify the name of the Sequence (which can be the name of a Sequence defined in the relevant Physical Model).
Trigger Name - allows you to specify the name of a Trigger (which can be the name of a Trigger that is defined for the Table in the Physical Model).
Generate Trigger - unsetting this will stop the Trigger being generated.
David -
I am using Oracle 11.2.0.3. What is the best practice for choosing data type for surrogate key?
In my work place , I see other developers using NUMBER or INTEGER.
Thanks.ranit B wrote:
The above is called the Ingenuity+ of Sir Frank Kulash.
<snip>
You gotta love the subtleties of language and culture. :-)
To simply address someone as 'Sir' (as in "Sir, would you mind ...?" or simply "Yes, Sir" and "No, Sir") is a sign of respect.
But when "Sir" is used as a title by prefixing it to a name ("Sir Frank Kulash"), it is taken as an official title - specifically that of Knighthood, conferred by the British Crown.
ranit - Please don't think I'm calling you to task on this. Without a doubt, your English is better than my whateveryournativelanguageis. I always get a grin when someone inadvertently tries to confer Knighthood on someone, as it happens a lot around here. But it's a slow day at my office. Of course I quickly spot when something gets mangled in English. I've often wondered what kinds of mangling native English speakers do when they learn another language.
;-) -
Looking for white paper or technical paper about surrogate keys
I am trying to locate any kind of a referance concerning the use of Surrogate Key VS Natural keys. Does anybody know of any authorative referance, maybe from Oracle?
Thanks!
BobHi Bob,
Unfortunately we do not have such documentation on our site (http://otn.oracle.com/documentation/index.html).
Please try the Member feedback forum at: Community Feedback (No Product Questions)
You might also consider searching on Metalink, as that site carries most of the available white papers. http://metalink.oracle.com/
Regards,
Les -
About Surrogate Key and Dimension Key on OWB 10.2
Hi, everyone.
I am using OWB 10.2 and I have a question about Surrogate key and Dimension Key.
I indicated the foreign key as VARCHAR2 type in Fact Table and Dimension Key as VARCHAR2 type is operated as Primary key in Dimension Table. I made Single Level in Dimension Table.
I know that Dimension Key stores the surrogate ID for dimension and is the primary key of the table. Also, Surrogate ID should be only NUMBER type.
So, in this case, Surrogate ID is NUMBER type
Dimension key should be NUMBER type to store the surrogate ID.
But, Dimension key also should operate the primary to relate Foreign key as VARCHAR2 type.
How I can solve this confusing condition?
Please let me know that.
JWSHi JWS,
From a SQL point of view it should not be a problem to join a NUMBER field to a VARCHAR2 field because during execution there will be an implicite cast for the NUMBER value to a VARCHAR2 value. See the example below.
SELECT * FROM DUAL
WHERE 1 = '1'From an OWB point of view it is not possible to have a Dimension with an NUMBER value Key that has a relation to a VARCHAR2 value Foreign key in a Fact table. This is caused due to the creation of a Fact table in OWB in which the Foreign keys in it are build from de Dimension tables that refer to them.
You will loose the reference to the Dimension when changing the type of the Foreign Key.
To resolve this issue I would advise you to use a Sequence that generates your Surrogate Key (NUMBER type) for the Dimension table and store it in the Primary Key Column (VARCHAR2 type).
When validating the mapping you will get a warning, but when executing this should give no problems.
Regards,
Ilona -
How to create surrogate key in dimension without unique value
Hi, I have a dimension where there is no column with unique value. I want to add a surrogate key to replace the existing primary key which is derived from concatenating 3 columns(e.g. 'A'||'B'||'C'). I'm thinking of using sequence. But this won't allow me to link the dimension to fact table. How do I come up with surrogate key under this situation? Thanks. ~Tracy
I'm actually trying to accomplish something similar myself.
In my sources I've got two sorts of customers, ones that are directly reported, and ones whose information is provided with sales records (this is stored in module ODS).
Of course identification is different, but in the datamart (module DWH) I'm sort of forced to use an equivalent way of loading (due to the way it first used to work). To accelerate lookups on dimensions, I copy the ODS surrogate key to DWH dimensions, but this does not work for the 'inbuilt' customers because they do not have a surrogate key in the ODS.
They DO have means of unique identification, and at first I thought I could concatenate these (also 3) columns to use as identification code. Unfortunately this is VARCHAR2, where the surrogate key is (naturally) NUMBER.
So now it looks like I'm forced to first build a table in ODS especially for these 'inbuilt' customers and assign a surrogate key (by sequence) to it, this way it conforms to how 'normal' customers are loaded into DWH.
I guess you'll have to pull of the same trick, i.e. create a table with either only the 'translation' of D-code to a surrogate key or all information that is fed into the dimension, which then can be used as a lookup or as complete source when loading data into your datamart.
Good luck, Patrick
Maybe you are looking for
-
Unable to do something with Photoshop CC
Hi, my configuration: imac 27 late 2013 3.2ghz 8go 1600MHz DDR3 Nvidia GeForce gt 755m OS X 10.10.1 Photoshop Creative Suite (fully legal) 15.2.2 I apologize in advance for my bad english, I'm a french. Since today I'm not able to work with Photoshop
-
10.9.5 update issues -- need help with sync to LG phone
1. Can't sync contacts from imac to Gmail. End goal is to sync contacts to LG phone 2. Notes missing after update 3. Not all calendar entries show up on LG phone
-
ITunes could not connect - timed out
For the past year I've had no problems using iTunes and my Ipod. But for the past month I can't connect properly to iTunes. I get a message saying: "iTunes could not connect to the iTunes store. The Network connection timed out. Make sure your networ
-
At PGR moving avg price will be changed while pickup the goods from custome
Dear All, I want clarification at moving avg price while retunrs from customer at consignment scenario. scenario: goods deliverd with 633 movt type at moving avg price. while returns case with movt type 634 moving avg price was changed. here user wa
-
Previous version had save all tabs before shutting down,which helped a lot if u open same windows always,but this is not there in the latest version I have upgraded recently.Request you to guide me if this can be accommodated in the new version.