Use of Surrogate Key

In designing table, is it always safe to use surrogate key, say an NUMBER(38) type, generated by a sequence - even a naturally occuring candidate key does exist e.g. student_id?
What are the considerations when choosing the PK column(s)?

This is actually situational. Although I personally prefer to use surrogate keys whenever possible, there are valid reasons why natural keys should be preferred in some situations (such as a reference/lookup table for instance). I discuss key strategies at http://www.agiledata.org/essays/dataModeling101.html#AssignKeys and have been meaning to rework this section into its own article one of these days.
- Scott
http://www.ambysoft.com/scottAmbler.html

Similar Messages

Surrogate keys -- which field to be used?

Hello,
I am wondering what field would be best suited to be used as surrogate key for a data warehouse. Since my fact table is likely to have hundreds of thousands of records, I am thinking that uniqueidentifier makes more sense than some type of integer.
Any thoughts on this?
Also I am designing my DW (I am really starting with a data-mart) with a lot of data coming from Dynamics CRM and I was wondering if a surrogate key is really necessary if we are not going to ever delete records from the data sources.
Any advice and insights is greatly appreciated.
Regards,
P.

Since my fact table is likely to have hundreds of thousands of records, I am thinking that uniqueidentifier makes more sense than some type of integer.
Hello,
100 K is very less on data. Integer data type performs best is SQL Server, much better the GUID's and with type integer you can handle 4 billion values.
Olaf Helper
[ Blog] [ Xing] [ MVP]

Building dimensions that are based on surrogate keys

Hi -
I am new to AWM and have a question.
I read the docs, but I still do not understand how to build a cube using AWM.
So we have a DW with dimensions and fact tables. The dimension tables use surrogate keys.
The fact table uses the surrogate keys of the dimensions.
I want to define the time dimension in AWM.
The unique key for the dimension is the surrogate key (time_key).
In AWM I defined levels: day, month and year.
Would the hierarchy for the day level be:
time_key -> day?
Thanks,
Frank

Thankyou so much.
now when I try that here is the error that I get for that
line:
1118: Implicit coercion of a value with static type
flash.display:DisplayObject to a possibly unrelated type
flash.display:MovieClip.
I've placed a zipfile with my FLA, .as and .xml here:
http://bigfins.com/temp/test5xml.zip
The actionscript alone is below.
Thankyou again for any assistance.

Apex and Natural Vs Surrogate keys

Hi
We've been using Apex for a few months now and there's a debate raging in our department over whether we should design our database tables using natural or surrogate (based on Oracle sequences / triggers) keys. Our experience as Apex developers shows that Apex itself looks to lean towards surrogate keys, a few examples are below:
- When creating forms on reports / tables Apex only allows 2 primary key columns without adding 'extras' in the background (see a previous post of mine).
- If we have a form on a table and our natural primary keys can be updated, the Apex-created DML statements break, as they look to do the update using the changed key values in the WHERE clause rather than the old ones. The only way around this seems to be to delete the inbuilt DML statements created by Apex and code your own, which is extra work.
- The Apex sample applications themselves seem to use sequences / surrogate keys.
What are people's opinions on this? In particular is there any guidance from the Apex development team on which is best to use with Apex?
Regards
Antilles

Hi Andrew,
As with abots_d, I only use "natural" keys for lookups.
>
1. the department names were here for 20 years and they never changed
>
But, can you guarantee that they never will? My firm has changed departmental names so many times, it's getting ridiculous! But other things also change over time - consider what happens if a person gets married and changes their name and you've used their previous names as the keys (and consider how much data in other tables may use those keys).
>
2. Server names are uniquely generated by special formula in excel to preciously avoid the duplication problem and guarantee the uniqueness within our glamorous bank.
>
SQL could probably recreate that formula and Unique Key constraints would handle the rest
>
3. no, we are not going to extend this app to cover any other banks
>
Given what's happening with the banking industry right now, who can say ;)
>
4. PK that means smth ( aka "server name" ) has a meaning, whereas meaningless - has no (business) value
>
Why does a bit of data have to have explicit "business value"? I would suggest that a surrogate key is a pointer to a record and allows you to easily create relationships. Once created, the key would never be changed regardless of what happens to the data on its record. Thus, the relationship is maintained. Using personnel (which our firm renamed as "Human Resources" a while back) as an example, it's likely that every employee would have an employee number. Does this number actually mean anything in itself, does it have "business value"? Most likely, it's just a convenient way to identify a person and relate records to them.
I would suggest that any non-numeric/date keys are relatively slow. As strings, the only way to check for their sort order would be to (A) convert to upper or lower case and (B) perform a string comparison left-to-right across the entire string. There's also the possibility of certain characters appearing in the strings that can cause issues - for example, quotes, apostrophes, colons, commas, question marks and percentage signs.
Also, consider the length of a VARCHAR2 that you would have to use - how big would it need to be to cover all possibilities? You may say 20 now but tomorrow you get data with 21 characters in it - do you want to update the table plus all related tables for that?
There are further issues with parent, child, grand-child etc relationships where the keys would have to be passed down in full through the relationships. Depending on how many levels you may have, a fair number of the fields on the bottom-most table would be there just for the keys.
It has been a standard industry practice for many years now to "normalise databases" to avoid lots of issues with keys and "repeating data". Apart from very simple lookup tables, I have stuck with those guidelines for years now without any problems at all.
Andy

Loading data into Fact/Cube with surrogate keys from SCD2

We have created 2 dimensions, CUSTOMER & PRODUCT with surrogate keys to reflect SCD Type 2.
We now have the transactional data that we need to load.
The data has a customer id that relates to the natural key of the customer dimension and a product id that relates to the natural key of the product dimension.
Can anyone point us in the direction of some documentation that explains the steps necessary to populate our fact table with the appropriate surrgoate key?
We assume that we need to have an lookup table between the current version of the customer and the incoming transaction data - but not sure how to go about this.
Thanks in advance for your help.
Laura

Hi Laura
There is another way to handling SCD and changing Facts. This is to use a different table for the history. Let me explain.
The standard approach has these three steps:
1. Determine if a change has occurred
2. End Date the existing record
3. Insert a new record into the same table with a new Start Date and dummy End Date, using a new surrogate key
The modified approach also has three steps:
1. Determine if a change has occurred
2. Copy the existing record to a history table, setting the appropriate End Date en route
3. Update the existing record with the changed information giving the record a new Start Date, but retaining the original surrogate key
What else do you need to do?
The current table can use the surrogate key as the primary key with the natural key being being a unique key. The history table has the surrogate key and the end date in the primary key, with a unique key on the natural key and the end date. For end user queries which in more than 90% of the time go against current data this method is much faster because only current records are in the main table and no filters are needed on dates. If a user wants to query history and current combined then a view which uses a union of the main and historical data can be used. One more thing to note is that if you adopt this approach for your dimension tables then they always keep the same surrogate key for the index. This means that if you follow a strict Kimball approach to make the primary key of the fact table be a composite key made up of the foreign keys from each dimension, you NEVER have to rework this primary key. It always points to the correct dimension, thereby eliminating the need for a surrogate key on the fact table!
I am using this technique to great effect in my current contract and performance is excellent. The splitter at the end of the map splits the data into three sets. Set one is for an insert into the main table when there is no match on the natural key. Set two is when there is a match on the natural key and the delta comparison has determined that a change occurred. In this case the current row needs to be copied into history, setting the End Date to the system date en route. Set three is also when there is a match on the natural key and the delta comparison has determined that a change occurred. In this case the main record is simply updated with the Start Date being reset to the system date.
By the way, I intend to put a white paper together on this approach if anyone is interested.
Hope this helps
Regards
Michael

Joe - Why was SURROGATE KEY left out from ISO table design?

The lack of SURROGATE KEY causes lots of confusion and ultimately loss of productivity. Common practice in SQL Server development to make the SURROGATE KEY the PRIMARY KEY, the source of all trouble because it is not really the "PRIMARY KEY" just
a meaningless integer identifier.
Example:
CREATE TABLE Products (
ProductID INT SURROGATE KEY,
ProductNumber char(12) PRIMARY KEY,
Name nvarchar(100) NOT NULL UNIQUE,
ListPrice DECIMAL (12,2) NOT NULL,
Color varchar(10) );
Is there a hope of correcting this issue?
Thanks.
Kalman Toth Database & OLAP Architect
SQL Server 2014 Database Design
New Book / Kindle: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2014

Thanks Joe.
In SQL Server world we do use SURROGATE IDENTITY (or SEQUENCE object) INT in table design. That's like in our DNA even if it conflicts with Codd. AdventureWorks sample:
SELECT ProductID, ProductNumber, Name, ListPrice, Color
FROM Production.Product ORDER BY ProductNumber;
ProductID ProductNumber Name ListPrice Color
899 FR-T67Y-44 LL Touring Frame - Yellow, 44 333.42 Yellow
900 FR-T67Y-50 LL Touring Frame - Yellow, 50 333.42 Yellow
901 FR-T67Y-54 LL Touring Frame - Yellow, 54 333.42 Yellow
902 FR-T67Y-58 LL Touring Frame - Yellow, 58 333.42 Yellow
886 FR-T67Y-62 LL Touring Frame - Yellow, 62 333.42 Yellow
890 FR-T98U-46 HL Touring Frame - Blue, 46 1003.91 Blue
ProductID is the (SURROGATE) PRIMARY KEY
ProductNumber is the NATURAL KEY (created by the accounting department) - this is the real "PRIMARY KEY"
Name is the CANDIDATE KEY (too long to be a key)
In RDBMS theory when we talk about PRIMARY KEY we mean the ProductNumber column which is used in real life.
However, in reality the ProductID INT meaningless number is the PRIMARY KEY, while the meaningful ProductNumber has to settle for a UNIQUE KEY or unique index.
I understand your point that we should not use SURROGATES, but we do. It's like in our (SQL Server) blood. If I go to a company and design for them without SURROGATE IDENTITY/SEQUENCE, they would fire me. From an ORACLE forum: "Basically,
always use a surrogate key. There are a few special cases where a surrogate key really isn't any better than some "natural" key, and whatever effort is needed to create and populate a surrogate key would just be wasted. These situations
are pretty rare.
Here's one example: Say you have a many-to-many relationship between employees and departments, that is, each employee may be related to 0 or more departments, and each department may be related to 0 or more employees, but an given employee can only be
related to a given department 1 time. In that case, a primary key consisting of both columns, dept_id and emp_id, is about as good as a surrogate key. You'd need a unique constraint on (dept_id, emp_id) in any case, and I don't see any need to create a surrogate
key." LINK:
https://community.oracle.com/thread/2527771?tstart=0
I tell you Joe, 90% of the world running on SURROGATE PRIMARY KEY tables, so why should we care about Codd at this point? Even the perfect PRIMARY KEY candidate, social security number, may have problems such as stolen SSNo duplicates among others: "Natural
key. A key that is formed of attributes that already exist in the real world. For example, U.S. citizens are issued a Social Security Number (SSN) that is unique to them (this isn't guaranteed to be true, but it's pretty
darn close in practice). SSN could be used as a natural key, assuming privacy laws allow it, for a
Person entity (assuming the scope of your organization is limited to the U.S.)." LINK:
http://www.agiledata.org/essays/keys.html A good advice from the article: "Don't naturalize surrogate keys. As soon as you display the value of a surrogate
key to your end users, or worse yet allow them to work with the value (perhaps to search), you have effectively given the key business meaning. This in effect naturalizes the key and thereby negates some of the advantages of surrogate keys."
Kalman Toth Database & OLAP Architect
SQL Server 2014 Database Design
New Book / Kindle: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2014

Looking for white paper or technical paper about surrogate keys

I am trying to locate any kind of a referance concerning the use of Surrogate Key VS Natural keys. Does anybody know of any authorative referance, maybe from Oracle?
Thanks!
Bob

Hi Bob,
Unfortunately we do not have such documentation on our site (http://otn.oracle.com/documentation/index.html).
Please try the Member feedback forum at: Community Feedback (No Product Questions)
You might also consider searching on Metalink, as that site carries most of the available white papers. http://metalink.oracle.com/
Regards,
Les

Reusability of Surrogate Keys

Hi,
Can I use the surrogate keys of the source in my data mart, assuming its a pk in source and source maintains history.
I mean I am not coninced of having a different SKeys in my warehouse (additional sequence).
Can anyone give some solid reasons why not to reuse Skeys of the source.
Also Sks in source are numbers.
Thanks

Fair enough, I think I misunderstood your question. I will defer to someone able to better discuss the relational side of things. I know this is / can be done using insert triggers, etc. - but since this is a completely common thing to do in a DW, I would expect OWB to have special functionality to address this without resorting to code.
I don't know OWB though, so sorry I can't give a specific answer.
Thx,
Scott

Surrogate Keys vs. Natural Keys

Hi All,
Is anyone aware of a recommendation regarding the use of surrogate keys vs. natural keys?
Regards,
Irfan Abdul Rehman

The Natural Keys approach was first. This was the approach used when Relational Databases were first discovered. But I believe, it was based on the premise that the design of the Database does not change over time. Most people seem to side with one or the other. People's opinions here are usually based on what they were brought up with or personal experiences where they have run into problems with one approach or the other.
When viewed within the context of object-oriented design, the Surrogate approach is more common. Consider: does the uniqueness of a table have any relevance to other tables? If the uniqueness of a table changes, why should this have to impact other tables? If a customer is associated with an Order, using Natural Keys, you need to know what columns make the Customer unique when inserting a record into the ORDERS table. With Surrogate Keys, you don't need to know what columns make the Customer unique, the customer is referenced by its Surrogate Key. With Surrogate Keys, you only need to know what makes a Customer unique when dealing with the CUSTOMERS table.
With SQL Server, the Surrogate approach is very straight forward. In Oracle, the appeal of Surrogate Keys is less than with SQL Server as there is a more work to implement them. In SQL Server you specify the Column as an identity column. In Oracle you need to additionally add a sequence. In addition, if you want the value automatic on inserts, you will need to create a trigger (unfortunately here, when you insert, there is no way to find out what you just inserted). If you don't choose to have the value automatic on inserts, your insert SQL statements will require extra SQL code and the Surrogate value is not enforced (i.e. someone could enter any value and this could lead to a Key Violation)
Here are some disadvantages of Natural Keys:
-     Almost always, more columns to join on. If Table B is a detail table of master A and C is a detail to B and D is a detail to C, you will need at least 3 columns to join D to C in an SQL query.
-     If your uniqueness of a table changes (Ex: the number of Columns making your Table unique changes from 2 to 3), with Natural Keys, all of your SQL (Stored Procedures, Reports, Views, SQL Scripts, Application Code) will have to be re-written and your foreign keys relating to that table will have to be changed. With Surrogate Keys usually as simple as modifying that table.
-     If the data type of a Primary Key column changes (Ex: you used a varchar(20), now it's not big enough and has to be changed to varchar(100)), with Natural Keys, all Foreign Keys related to that Table will have to be changed (may also impact SQL Code). With Surrogate Keys usually as simple as modifying that table.
-     The Classic: a value changes in the Primary Key (Like Last Name). Now you've got to update that in every Foreign Key. Which means you'll have a big headache when you have to temporarily drop the constraint(s).

MDX Query to show the latest product text again historical facts (Type 2 dimenion linking on Surrogate key and also Natural Key)

I need to write a MDX query to show the latest product text again historical facts or a chosen product text in time. I can write this query in TSQL, but new to MDX.
The way I do it in TSQL is joining two queries together on the Natural Key as opposed to the surrogate key.
Can this be done in MDX. I know I could write two separate MDX queries, one which get the product text I wan and the other to get the measure with the actual product text and Natural Key, and use a lookup function in ssrs to show the two result sets I the
same tablix by looking up the Natural Keys. But this should be able to be done in one query shouldn't it.
In the dsv the fact knows to join to the dimension using the surrogate key.
Thanks J

Hi Jamster,
According to your description, you want to write a query to show the latest product text, right?
In MDX, we can use LastNonEmpty function to return the lastest member winth a dimension. LastNonEmpty is an aggregation function available in the Enterprise version of SQL Server. However, you can create your own with a little bit of recursive MDX. Here
is a sample query for you reference.
With Member Measures.LastHits as
iif(isempty(Measures.Hits),
([Date].[Year Month Day].prevmember,
Measures.LastHits
),Measures.Hits)
Reference
http://cwebbbi.wordpress.com/2011/03/24/last-ever-non-empty-a-new-fast-mdx-approach/
http://richardlees.blogspot.com/2010/07/getting-last-non-empty-value.html
If this is not what you want, please provide us the detail structure of your cube and the expected result, so that we can make further analysis and give you the exactly MDX query.
Regards,
Charlie Liao
TechNet Community Support

Why does BW use surrogate keys ?

Hi,
can anyone answer me in 1 sentence:
Why dos BW uses surrogate / artifical keys ?
Its not faster while querying - line items are faster & y need query more tables.
Its not faster while loading - surrogate keys need to be looked up and build up.
ThanXs
Martin

A database don't care if it's numeric or not. Index access is index access and that is what matters. But talking about indexes. There is less index'es to maintain when you use surrogates, otherwise you should have an index on each characteristic in the fact table, and it could be a lot. Also there may be a historical technical reason, like the number of key fields available on a table. Remember that SAP is trying to be DB vendor independent so if a supported DB only accepts say 16 key fields, then you need to design you application for that.
-Kristian

How we generate Surrogate Keys without using identify column

Hi All,
How we generate Surrogate Keys without using identify column.
Regards,
Manish

There are various options
1.IDENTITY columns - simplest to implement
2. Using NEWID(), NEWSEQUENTIALID() functions (if you want to use GUID values as surrogate keys)
3. SEQUENCE object (if SQL 2012 and above)
4. Using custom functions to generate keys yourself
This is an good article which compares use of GUIDs against integers as surrogate keys
http://blog.jonathanoliver.com/integers-vs-guids-and-natural-vs-surrogate-keys/
Please Mark This As Answer if it solved your issue
Please Vote This As Helpful if it helps to solve your issue
Visakh
My Wiki User Page
My MSDN Page
My Personal Blog
My Facebook Page

How we use Surrogate Keys for snowflake dimension

Hi All,
my question is - How we use Surrogate Keys for snowflake dimension
i heard from some body Surrogate Keys only work with star schema.
please correct me if i wrong.
Regards,
Manish

Hi manishcal16PPS,
According to your description, you can only create natural key in your dimension. But it's not working when using surrogate key. Right?
In Analysis Services, the snowflake schema of the dimensions are represented by more than one dimension table in other words its takes multiple dimension tables to define a dimension. Surrogate key are just some extra, redundant, unique key based on the
natural key. So there's no direct relationship or some limitations between surrogate keys and snowflake schema.
In this scenario, since there's relationship between the two dimensions, you should create natural key. For using natural key or surrogate key. Please refer to an article below:
Surrogate Key vs. Natural Key
For understanding star/snowflake schema, please see:
Understanding Star and Snowflake Schemas
Regards,
Simon Hou
TechNet Community Support

How to Maintain Surrogate Key Mapping (cross-reference) for Dimension Tables

Hi,
What would be the best approach on ODI to implement the Surrogate Key Mapping Table on the STG layer according to Kimball's technique:
"Surrogate key mapping tables are designed to map natural keys from the disparate source systems to their master data warehouse surrogate key. Mapping tables are an efficient way to maintain surrogate keys in your data warehouse. These compact tables are designed for high-speed processing. Mapping tables contain only the most current value of a surrogate key— used to populate a dimension—and the natural key from the source system. Since the same dimension can have many sources, a mapping table contains a natural key column for each of its sources.
Mapping tables can be equally effective if they are stored in a database or on the file system. The advantage of using a database for mapping tables is that you can utilize the database sequence generator to create new surrogate keys. And also, when indexed properly, mapping tables in a database are very efficient during key value lookups."
We have a requirement to implement cross-reference mapping tables with Natural and Surrogate Keys for each dimension table. These mappings tables will be populated automatically (only inserts) during the E-LT execution, right after inserting into the dimension table.
Someone have any idea on how to implement this on ODI?
Thanks,
Danilo

Hi,
first of all please avoid bolding something. After this according Kimball (if i remember well) is a 1:1 mapping, so no-surrogate key.
After that personally you could use Lookup Table
http://www.odigurus.com/2012/02/lookup-transformation-using-odi.html
or make a simple outer join filtering by your "Active_Flag" column (remember that this filter need to be inside your outer join).
Let us know
Francesco

Should surrogate key be numeric?

1. While on a data modleing course at Oracle, I was advised NOT to use numeric data types unless you needed to do numeric/arithmetic operations on the attribute. E.g. Even though an account_id is all digits it should still be declared as alpha numeric ( VARCHAR2) as it is NOT really a numeric. This makes sense.
I would like opnions on the above
2. If above advice is taken, then even a surrogate key should be alpha numeric since it does not require arithmetic/numeric computations. However, I have also heard that numerics are better than VARCHAR2 for joins. If this join performance advantage is true, then using a NUMBER data type for surrogate keys should be preferred over VARCHAR2? This overrides the advice in 1 above for join performance reasons.
I would like comments on 1 and 2.
Thanks
Edited by: user4900730 on Jun 16, 2010 12:19 PM

user4900730 wrote:
1. While on a data modleing course at Oracle, I was advised NOT to use numeric data types unless you needed to do numeric/arithmetic operations on the attribute. E.g. Even though an account_id is all digits it should still be declared as alpha numeric ( VARCHAR2) as it is NOT really a numeric. This makes sense.
I would like opnions on the above
2. If above advice is taken, then even a surrogate key should be alpha numeric since it does not require arithmetic/numeric computations. However, I have also heard that numerics are better than VARCHAR2 for joins. If this join performance advantage is true, then using a NUMBER data type for surrogate keys should be preferred over VARCHAR2? This overrides the advice in 1 above for join performance reasons.
I would like comments on 1 and 2.
Thanks
Edited by: user4900730 on Jun 16, 2010 12:19 PMSybrand gave a pretty good analysis of the performance issues. The instructor you mention just carried away with the idea of 'not everything that LOOKS like a number IS a number'. I think his point is valid for things like SSN and telephone numbers, etc. But my take is that if you are using a "number" for a surrogate key, then it probably really is a number - even if it is true you'd never do arithmetic on it. Comes back to the first lesson of being a dba (quoted in the first two minutes of the first Oracle class I took at their facility in Atalanta, back at version 7.3). "The answer to almost every question is 'it depends'"

Use of Surrogate Key

Similar Messages

Maybe you are looking for