Maintainence of surrogate keys

Hi,
Does anyone suggest what ways of maintaining the
surrogate keys in dimension tables? I really want
to know if OWB have this kind of feature to
achieve this purpose?
Regards,
Raymond

Fair enough, I think I misunderstood your question. I will defer to someone able to better discuss the relational side of things. I know this is / can be done using insert triggers, etc. - but since this is a completely common thing to do in a DW, I would expect OWB to have special functionality to address this without resorting to code.
I don't know OWB though, so sorry I can't give a specific answer.
Thx,
Scott

Similar Messages

How to Maintain Surrogate Key Mapping (cross-reference) for Dimension Tables

Hi,
What would be the best approach on ODI to implement the Surrogate Key Mapping Table on the STG layer according to Kimball's technique:
"Surrogate key mapping tables are designed to map natural keys from the disparate source systems to their master data warehouse surrogate key. Mapping tables are an efficient way to maintain surrogate keys in your data warehouse. These compact tables are designed for high-speed processing. Mapping tables contain only the most current value of a surrogate key— used to populate a dimension—and the natural key from the source system. Since the same dimension can have many sources, a mapping table contains a natural key column for each of its sources.
Mapping tables can be equally effective if they are stored in a database or on the file system. The advantage of using a database for mapping tables is that you can utilize the database sequence generator to create new surrogate keys. And also, when indexed properly, mapping tables in a database are very efficient during key value lookups."
We have a requirement to implement cross-reference mapping tables with Natural and Surrogate Keys for each dimension table. These mappings tables will be populated automatically (only inserts) during the E-LT execution, right after inserting into the dimension table.
Someone have any idea on how to implement this on ODI?
Thanks,
Danilo

Hi,
first of all please avoid bolding something. After this according Kimball (if i remember well) is a 1:1 mapping, so no-surrogate key.
After that personally you could use Lookup Table
http://www.odigurus.com/2012/02/lookup-transformation-using-odi.html
or make a simple outer join filtering by your "Active_Flag" column (remember that this filter need to be inside your outer join).
Let us know
Francesco

Should surrogate key be numeric?

1. While on a data modleing course at Oracle, I was advised NOT to use numeric data types unless you needed to do numeric/arithmetic operations on the attribute. E.g. Even though an account_id is all digits it should still be declared as alpha numeric ( VARCHAR2) as it is NOT really a numeric. This makes sense.
I would like opnions on the above
2. If above advice is taken, then even a surrogate key should be alpha numeric since it does not require arithmetic/numeric computations. However, I have also heard that numerics are better than VARCHAR2 for joins. If this join performance advantage is true, then using a NUMBER data type for surrogate keys should be preferred over VARCHAR2? This overrides the advice in 1 above for join performance reasons.
I would like comments on 1 and 2.
Thanks
Edited by: user4900730 on Jun 16, 2010 12:19 PM

user4900730 wrote:
1. While on a data modleing course at Oracle, I was advised NOT to use numeric data types unless you needed to do numeric/arithmetic operations on the attribute. E.g. Even though an account_id is all digits it should still be declared as alpha numeric ( VARCHAR2) as it is NOT really a numeric. This makes sense.
I would like opnions on the above
2. If above advice is taken, then even a surrogate key should be alpha numeric since it does not require arithmetic/numeric computations. However, I have also heard that numerics are better than VARCHAR2 for joins. If this join performance advantage is true, then using a NUMBER data type for surrogate keys should be preferred over VARCHAR2? This overrides the advice in 1 above for join performance reasons.
I would like comments on 1 and 2.
Thanks
Edited by: user4900730 on Jun 16, 2010 12:19 PMSybrand gave a pretty good analysis of the performance issues. The instructor you mention just carried away with the idea of 'not everything that LOOKS like a number IS a number'. I think his point is valid for things like SSN and telephone numbers, etc. But my take is that if you are using a "number" for a surrogate key, then it probably really is a number - even if it is true you'd never do arithmetic on it. Comes back to the first lesson of being a dba (quoted in the first two minutes of the first Oracle class I took at their facility in Atalanta, back at version 7.3). "The answer to almost every question is 'it depends'"

About Surrogate Key and Dimension Key on OWB 10.2

Hi, everyone.
I am using OWB 10.2 and I have a question about Surrogate key and Dimension Key.
I indicated the foreign key as VARCHAR2 type in Fact Table and Dimension Key as VARCHAR2 type is operated as Primary key in Dimension Table. I made Single Level in Dimension Table.
I know that Dimension Key stores the surrogate ID for dimension and is the primary key of the table. Also, Surrogate ID should be only NUMBER type.
So, in this case, Surrogate ID is NUMBER type
Dimension key should be NUMBER type to store the surrogate ID.
But, Dimension key also should operate the primary to relate Foreign key as VARCHAR2 type.
How I can solve this confusing condition?
Please let me know that.
JWS

Hi JWS,
From a SQL point of view it should not be a problem to join a NUMBER field to a VARCHAR2 field because during execution there will be an implicite cast for the NUMBER value to a VARCHAR2 value. See the example below.
SELECT * FROM DUAL
WHERE 1 = '1'From an OWB point of view it is not possible to have a Dimension with an NUMBER value Key that has a relation to a VARCHAR2 value Foreign key in a Fact table. This is caused due to the creation of a Fact table in OWB in which the Foreign keys in it are build from de Dimension tables that refer to them.
You will loose the reference to the Dimension when changing the type of the Foreign Key.
To resolve this issue I would advise you to use a Sequence that generates your Surrogate Key (NUMBER type) for the Dimension table and store it in the Primary Key Column (VARCHAR2 type).
When validating the mapping you will get a warning, but when executing this should give no problems.
Regards,
Ilona

Best Practice loading Dimension Table with Surrogate Keys for Levels

Hi Experts,
how would you load an Oracle dimension table with a hierarchy of at least 5 levels with surrogate keys in each level and a unique dimension key for the dimension table.
With OWB it is an integrated feature to use surrogate keys in every level of a hierarchy. You don't have to care about
the parent child relation. The load process of the mapping generates the right keys and cares about the relation between the parent and child inside the dimension key.
I tried to use one interface per Level and created a surrogate key with a native Oracle sequence.
After that I put all the interfaces in to one big Interface with a union data set per level and added look ups for the right parent child relation.
I think it is a bit too complicated making the interface like that.
I will be more than happy for any suggestions? Thank you in advance!
negib
Edited by: nmarhoul on Jun 14, 2012 2:26 AM

Hi,
I do like the level keys feature of OWB - It makes aggregate tables very easy to implement if your sticking with a star schema.
Sadly there is nothing off the shelf with the built in knowledge modules with ODI , It doesnt support creating dimension objects in the database by default but there is nothing stopping you coding up your own knowledge module (use flex fields maybe on the datastore to tag column attributes as needed)
Your approach is what I would have done, possibly use a view (if you dont mind having it external to ODI) to make the interface simpler.

DataModeler v3.3.0 - Naming standards template for surrogate keys creation

I'm using DM 3.3.0.734 and in the logical model we now can define in the entity properties to "Create Surrogate Key".
When we use Engineer to relational model, for each entity is automatically created a new column using the naming template {entity}_ID as NUMERIC (without precision) and is defined as primary key.
My questions are:
Is possible to define a different naming standard for surrogate key creation?. We define id_{entiry}
Is possible to set the precision of NUMERIC surrogate key?
If we define entity's column as Primary UID, these columns are included in an unique constraint, but are using the naming standards for PK.
As consequence are created the following:
Unique constraint name: entity_PK
Primary key(surrogate)name: entity_PKv1
There is any way to define naming standards like "{entity}_UID" for unique constraints, or even, "{entity}_SK" for surrogate primary key name?
Can anyone help with some of these topics?
Regards,
Ariel.

Hi Ariel,
Naming standards template for surrogate keys creation I logged enhancement request for that.
How to change those bad names (going to change them one by one is not an option):
1) If those "transformed" unique keys are the only ones you have in relational model then simply can apply naming standards
2) You can write transformation script to do that for you
3) you can use new functionality - search, export to excel file, change the names there (using find/replace will be faster) and return changed data back to relational model
you can find description for that here:
https://apex.oracle.com/pls/apex/f?p=44785:24:13179871410726::NO:24:P24_CONTENT_ID,P24_PREV_PAGE:6621,16
http://www.thatjeffsmith.com/archive/2012/11/sql-developer-data-modeler-v3-3-early-adopter-search/
http://www.thatjeffsmith.com/archive/2012/11/sql-developer-data-modeler-v3-3-early-adopter-collaborative-design-via-excel/
You should search for _PK, then filter result on Index and you can export result using report functionality (to XLS or XLSX output format). You can create template and include only table and name (of index) as properties to be included into report.
Regards,
Philip

How to create surrogate key in dimension without unique value

Hi, I have a dimension where there is no column with unique value. I want to add a surrogate key to replace the existing primary key which is derived from concatenating 3 columns(e.g. 'A'||'B'||'C'). I'm thinking of using sequence. But this won't allow me to link the dimension to fact table. How do I come up with surrogate key under this situation? Thanks. ~Tracy

I'm actually trying to accomplish something similar myself.
In my sources I've got two sorts of customers, ones that are directly reported, and ones whose information is provided with sales records (this is stored in module ODS).
Of course identification is different, but in the datamart (module DWH) I'm sort of forced to use an equivalent way of loading (due to the way it first used to work). To accelerate lookups on dimensions, I copy the ODS surrogate key to DWH dimensions, but this does not work for the 'inbuilt' customers because they do not have a surrogate key in the ODS.
They DO have means of unique identification, and at first I thought I could concatenate these (also 3) columns to use as identification code. Unfortunately this is VARCHAR2, where the surrogate key is (naturally) NUMBER.
So now it looks like I'm forced to first build a table in ODS especially for these 'inbuilt' customers and assign a surrogate key (by sequence) to it, this way it conforms to how 'normal' customers are loaded into DWH.
I guess you'll have to pull of the same trick, i.e. create a table with either only the 'translation' of D-code to a surrogate key or all information that is fed into the dimension, which then can be used as a lookup or as complete source when loading data into your datamart.
Good luck, Patrick

Surrogate Key and Map for Cube

Hi
I am new to Data Warehousing and am trying to use OWB 11g.
I am trying to create dimensions with multiple levels. When I create more than one level it need to have surrogate as well business key for each dimension level. But I can create only one surrogate in the dimension, there is no option to create multiple surrogate keys in the same dimension. so what am I missing?
My second question is regarding cube. Do I need to create a Mapping for a cube? if yes, should I move the data to the cube from the dimensions? and where will the measures come from? do i need to load the measures or they will be calculated automatically?
please reply...
regards
Arif

hi
Got it, Yes that was the reason,
The table was not properly deployed after the dimension was modified.
Anyway, the describe of the table is as follows
describe arif.QUESTION_DIM
Name Null Type
DIMENSION_KEY NOT NULL NUMBER
IGV_ID NUMBER
PER_ID NUMBER
DIM_ID NUMBER
IGO_ID NUMBER
INQ_ID NUMBER
ID NUMBER
DIM_ORDEM NUMBER
DIM_AMBITO VARCHAR2(3)
DIM_NOME VARCHAR2(150)
10 rows selected
Now, I am having another problem,
when, I deploy the Map to load the data from three different tables, it gives the following problem
Name               Action               Status          Log
QUESTION_MAP          Create               Warning          ORA-06550: line 297, column 25:
                                        PLS-00302: component 'ID' must be declared
QUESTION_MAP          Create               Warning          ORA-06550: line 1153, column 11:
                                        PL/SQL: SQL Statement ignored
QUESTION_MAP          Create               Warning          ORA-06550: line 1155, column 15:
                                        PL/SQL: ORA-00904: "QUESTION_DIM"."ID": invalid identifier
QUESTION_MAP          Create               Warning          ORA-06550: line 1155, column 31:
                                        PLS-00302: component 'ID' must be declared
QUESTION_MAP          Create               Warning          ORA-06550: line 233, column 1:
                                        PL/SQL: SQL Statement ignored
QUESTION_MAP          Create               Warning          ORA-06550: line 2539, column 11:
                                        PL/SQL: SQL Statement ignored
QUESTION_MAP          Create               Warning          ORA-06550: line 2541, column 15:
                                        PL/SQL: ORA-00904: "QUESTION_DIM"."ID": invalid identifier
QUESTION_MAP          Create               Warning          ORA-06550: line 2541, column 31:
                                        PLS-00302: component 'ID' must be declared
QUESTION_MAP          Create               Warning          ORA-06550: line 297, column 9:
                                        PL/SQL: ORA-00904: "QUESTION_DIM"."ID": invalid identifier
Edited by: user643560 on Oct 22, 2008 9:38 AM

Building dimensions that are based on surrogate keys

Hi -
I am new to AWM and have a question.
I read the docs, but I still do not understand how to build a cube using AWM.
So we have a DW with dimensions and fact tables. The dimension tables use surrogate keys.
The fact table uses the surrogate keys of the dimensions.
I want to define the time dimension in AWM.
The unique key for the dimension is the surrogate key (time_key).
In AWM I defined levels: day, month and year.
Would the hierarchy for the day level be:
time_key -> day?
Thanks,
Frank

Thankyou so much.
now when I try that here is the error that I get for that
line:
1118: Implicit coercion of a value with static type
flash.display:DisplayObject to a possibly unrelated type
flash.display:MovieClip.
I've placed a zipfile with my FLA, .as and .xml here:
http://bigfins.com/temp/test5xml.zip
The actionscript alone is below.
Thankyou again for any assistance.

Apex and Natural Vs Surrogate keys

Hi
We've been using Apex for a few months now and there's a debate raging in our department over whether we should design our database tables using natural or surrogate (based on Oracle sequences / triggers) keys. Our experience as Apex developers shows that Apex itself looks to lean towards surrogate keys, a few examples are below:
- When creating forms on reports / tables Apex only allows 2 primary key columns without adding 'extras' in the background (see a previous post of mine).
- If we have a form on a table and our natural primary keys can be updated, the Apex-created DML statements break, as they look to do the update using the changed key values in the WHERE clause rather than the old ones. The only way around this seems to be to delete the inbuilt DML statements created by Apex and code your own, which is extra work.
- The Apex sample applications themselves seem to use sequences / surrogate keys.
What are people's opinions on this? In particular is there any guidance from the Apex development team on which is best to use with Apex?
Regards
Antilles

Hi Andrew,
As with abots_d, I only use "natural" keys for lookups.
>
1. the department names were here for 20 years and they never changed
>
But, can you guarantee that they never will? My firm has changed departmental names so many times, it's getting ridiculous! But other things also change over time - consider what happens if a person gets married and changes their name and you've used their previous names as the keys (and consider how much data in other tables may use those keys).
>
2. Server names are uniquely generated by special formula in excel to preciously avoid the duplication problem and guarantee the uniqueness within our glamorous bank.
>
SQL could probably recreate that formula and Unique Key constraints would handle the rest
>
3. no, we are not going to extend this app to cover any other banks
>
Given what's happening with the banking industry right now, who can say ;)
>
4. PK that means smth ( aka "server name" ) has a meaning, whereas meaningless - has no (business) value
>
Why does a bit of data have to have explicit "business value"? I would suggest that a surrogate key is a pointer to a record and allows you to easily create relationships. Once created, the key would never be changed regardless of what happens to the data on its record. Thus, the relationship is maintained. Using personnel (which our firm renamed as "Human Resources" a while back) as an example, it's likely that every employee would have an employee number. Does this number actually mean anything in itself, does it have "business value"? Most likely, it's just a convenient way to identify a person and relate records to them.
I would suggest that any non-numeric/date keys are relatively slow. As strings, the only way to check for their sort order would be to (A) convert to upper or lower case and (B) perform a string comparison left-to-right across the entire string. There's also the possibility of certain characters appearing in the strings that can cause issues - for example, quotes, apostrophes, colons, commas, question marks and percentage signs.
Also, consider the length of a VARCHAR2 that you would have to use - how big would it need to be to cover all possibilities? You may say 20 now but tomorrow you get data with 21 characters in it - do you want to update the table plus all related tables for that?
There are further issues with parent, child, grand-child etc relationships where the keys would have to be passed down in full through the relationships. Depending on how many levels you may have, a fair number of the fields on the bottom-most table would be there just for the keys.
It has been a standard industry practice for many years now to "normalise databases" to avoid lots of issues with keys and "repeating data". Apart from very simple lookup tables, I have stuck with those guidelines for years now without any problems at all.
Andy

Problems maintaning surrogate keys

Hi,
I am trying to load a dimension table. the initial load worked fine. now for the regular load if there is one a new record in the source table it does adds 1 new record in the dimension table. But the problem is the surrogate keys(I am using a oracle sequence for this) skips the number of record in the dimension table and assigs next value to the new record. The lode type is "insert/Update".Constraint used for maching is the primary key of source table.
e.g
--initial run.. table had 5 records before
-- Second run.. table has 6 records but the surrogate keys value for the 6th record will be '11' insted of being '6'
can you help me with this.....

Nawneet,
after I posted this problem on the forum i read an article that said i should do something like what you said. so i changed the seq to an function which returns only the next value when called. i imported it to the map and used it insted of the seq...but still no use....the map is doing the same thing during the second run. i also tried chaning the load type from insert/update to update/insert. also changed the default operating mode to row based....still no use...:((.....
what happens is during the first run 5 records get loaded perfectly and every thign is fine, but if I run it again with no new changes in the source records the map runs fine and the data is still good. but what happens is the nextval of the sequence jumps to 11...when i am expecting it to remain at 6 as no new records have been added to the dimension table. I think some how owb is seeing 5 records in the source table during the second run and uses the next val frmo the cache...but does not use them and then since it not used this values go waste.
Problem: I can not schedule the load of the dimension table for every day as the next value of this seq will keep on skeeping and when ever a new record is added to dimension table will have a large value .

Surrogate keys -- which field to be used?

Hello,
I am wondering what field would be best suited to be used as surrogate key for a data warehouse. Since my fact table is likely to have hundreds of thousands of records, I am thinking that uniqueidentifier makes more sense than some type of integer.
Any thoughts on this?
Also I am designing my DW (I am really starting with a data-mart) with a lot of data coming from Dynamics CRM and I was wondering if a surrogate key is really necessary if we are not going to ever delete records from the data sources.
Any advice and insights is greatly appreciated.
Regards,
P.

Since my fact table is likely to have hundreds of thousands of records, I am thinking that uniqueidentifier makes more sense than some type of integer.
Hello,
100 K is very less on data. Integer data type performs best is SQL Server, much better the GUID's and with type integer you can handle 4 billion values.
Olaf Helper
[ Blog] [ Xing] [ MVP]

Loading data into Fact/Cube with surrogate keys from SCD2

We have created 2 dimensions, CUSTOMER & PRODUCT with surrogate keys to reflect SCD Type 2.
We now have the transactional data that we need to load.
The data has a customer id that relates to the natural key of the customer dimension and a product id that relates to the natural key of the product dimension.
Can anyone point us in the direction of some documentation that explains the steps necessary to populate our fact table with the appropriate surrgoate key?
We assume that we need to have an lookup table between the current version of the customer and the incoming transaction data - but not sure how to go about this.
Thanks in advance for your help.
Laura

Hi Laura
There is another way to handling SCD and changing Facts. This is to use a different table for the history. Let me explain.
The standard approach has these three steps:
1. Determine if a change has occurred
2. End Date the existing record
3. Insert a new record into the same table with a new Start Date and dummy End Date, using a new surrogate key
The modified approach also has three steps:
1. Determine if a change has occurred
2. Copy the existing record to a history table, setting the appropriate End Date en route
3. Update the existing record with the changed information giving the record a new Start Date, but retaining the original surrogate key
What else do you need to do?
The current table can use the surrogate key as the primary key with the natural key being being a unique key. The history table has the surrogate key and the end date in the primary key, with a unique key on the natural key and the end date. For end user queries which in more than 90% of the time go against current data this method is much faster because only current records are in the main table and no filters are needed on dates. If a user wants to query history and current combined then a view which uses a union of the main and historical data can be used. One more thing to note is that if you adopt this approach for your dimension tables then they always keep the same surrogate key for the index. This means that if you follow a strict Kimball approach to make the primary key of the fact table be a composite key made up of the foreign keys from each dimension, you NEVER have to rework this primary key. It always points to the correct dimension, thereby eliminating the need for a surrogate key on the fact table!
I am using this technique to great effect in my current contract and performance is excellent. The splitter at the end of the map splits the data into three sets. Set one is for an insert into the main table when there is no match on the natural key. Set two is when there is a match on the natural key and the delta comparison has determined that a change occurred. In this case the current row needs to be copied into history, setting the End Date to the system date en route. Set three is also when there is a match on the natural key and the delta comparison has determined that a change occurred. In this case the main record is simply updated with the Start Date being reset to the system date.
By the way, I intend to put a white paper together on this approach if anyone is interested.
Hope this helps
Regards
Michael

Implementing surrogate keys in dimensions

hello,
First thing, I'm new to ODI! I am using Oracle data integrator 10.1.3.
I have a dimension table 'Dim_Contracts' as target table. The structure is as follows:
PK_Dim_Contract Primary key (surrogate key - to be populated from an Oracle database sequence in the target)
Contract_ID (normal field in target - no constraints in target- to be populated from source - originally a primary key in source)
+ other dimension attributes.
from what i have googled out and read in the forum, i cannot define 'PK_Dim_Contract' as the primary key of my dimension (target) table, to be able to update it from the oracle sequence defined - rather the 'contract_ID', which is the natural key should be the primary key. Is that correct? If yes, isn't it against dimension modelling principle?
More to the point, my question is: How do I populate a sequence in my primary key field in the target table?
Thanks for your help.
Regards,
Anju

Hello Anju,
Welcome in the ODI community ;).
What I suggest you is to set the UNIQUE KEY on Contract_ID in your target. This way you will be able to use flow control and do Incremental Update Loading.
PK_Dim_Contract (surrogate key) can be your primary key in the dabatase.
To populate PK_Dim_Contract from an Oracle Sequence, create it first in your Oracle DB. Add a new sequence to your project (left pane), choose Natural Sequence, choose your schema and enter the name of your Oracle Sequence.
In your interface, define the mapping of PK_Dim_Contract as
:<ODI_SEQUENCE_NAME>_NEXTVALand execute this mapping on the target.
Note: :<ODI_SEQUENCE_NAME>_NEXTVAL works only for SQL Statements. If you want to use the sequence somewhere else, use the following syntax :
#<ODI_SEQUENCE_NAME>_NEXTVALHope it helps,
Jerome

Use of Surrogate Key

In designing table, is it always safe to use surrogate key, say an NUMBER(38) type, generated by a sequence - even a naturally occuring candidate key does exist e.g. student_id?
What are the considerations when choosing the PK column(s)?

This is actually situational. Although I personally prefer to use surrogate keys whenever possible, there are valid reasons why natural keys should be preferred in some situations (such as a reference/lookup table for instance). I discuss key strategies at http://www.agiledata.org/essays/dataModeling101.html#AssignKeys and have been meaning to rework this section into its own article one of these days.
- Scott
http://www.ambysoft.com/scottAmbler.html

Maintainence of surrogate keys

Similar Messages

Maybe you are looking for