Why do dimension tables contain data?
Hi there,
as far as I understodd the BW datamodel, the dimensions are to summon characteristics that do belong to the specific view I want to access via my InfoCube. So, the dimensional tables should contain just the mapping between DimIDs and SID, shouldn't they? What else do these dimensional tables contain??
Thanks,
Pascal
Hi,
I think the concept of using dimension table as the intermediate to SID And Facttable is to develope <i>extended star schema</i>. So that it allowing more than 13 characterstics in infocube.
With rgds,
Anil Kumar Sharma .P
Similar Messages
-
Reg: Fact table and Dimension table in Data Warehousing -
Hi Experts,
I'm not exactly getting the difference between the criteria which decide how to create a Fact table and Dimension table.
This link http://stackoverflow.com/questions/9362854/database-fact-table-and-dimension-table states :
Fact table contains data that can be aggregate.
Measures are aggregated data expressions (e. Sum of costs, Count of calls, ...)
Dimension contains data that is use to generate groups and filters.
This's fine but how does one decide which columns to consider for Fact table and which columns for Dimension table?
Any help is much appreciated.
Pardon me if this's not the correct place for this question. My first question in the new forum.
Thanks and Regards,
Ranit BiswasranitB wrote:
But my main doubt was - what is the criteria to differentiate between columns for Fact tables and Dimension tables? How can one decide upon the design?
Columns of a fact table will often be 'scalar' attributes of the 'fact' data item. A dimension table will often be 'compound' attributes of a 'fact'.
Consider employee information. The EMPLOYEE table can be a fact table. It might have scalar attribute columns such as: DATE_HIRED, STATUS, EMPLOYEE_ID, and so on.
Other related information that can't be specified as a single attribute value would often be stored in a 'dimension' table: ADDRESS, PHONE_NUMBER.
Each address requires several columns to define it: ADDRESS1, ADDRESS2, CITY, STATE, ZIP, COUNTRY. And an employee might have several addresses: WORK_ADDRESS, HOME_ADDRESS. That address info would be stored in a 'dimension' table and only the primary key value of the address record would be stored in the EMPLOYEE 'fact' table.
Same with PHONE_NUMBER. Several columns are required to define a phone number and each employee might have several of them. The dimension tables are used to help 'normalize' the data in the employee 'fact' table.
And that EMPLOYEE table might also be a DIMENSION table for other FACT tables. A DEVELOPER table might have an EMPLOYEE_ID column with a value that points to a 'dimension' row in the EMPLOYEE dimension table. -
Dimension Table populating data
Hi
I am in the process of creating a data mart with a star schema.
The star schema has been defined with the fact and dimension tables and the primary and foreign keys.
I have written the script for one of the dimensions and would like to know when the job runs on a daily basis should the job truncate the table every day and rebuild the dimension table or should it only add new records to the table. If it should add only
new records to the table how do is this done?
I assume that the fact table job is run once a day and only new data is added to it?
ThanksIt will depend on the volume of your dimensions. In most of our projects, we do not truncate, we update only updated rows based on a fingerprint (to make the comparison faster than column by column), and insert for new rows (SCD1). For SCD2 we apply
similar approach for updates and inserts, and expirations in batch (one UPDATE for all applicable rows at the end of the package/ETL).
If your dimension is very large, you can consider truncating all data or deleting only affected modified rows (based on nosiness key) to later reload those, but you have to be carefully maintaining the same surrogate keys reference by your
existing facts.
HTH,
Please, mark this post as Answer if this helps you to solve your question/problem.
Alan Koo | "Microsoft Business Intelligence and more..."
http://www.alankoo.com -
Dimension table design Data modeling question
Hi Experts,
Sorry if I am putting my question in a wrong forum and please suggest an appropriate forum.
need your opinion on the existing design of our 10 years old datawarehouse.
There is one dimension table with structure like following
Dimension Table
Dimension Key Number (THIS IS NOT A PRIMARY KEY)
Natural key (from source) Number
source name character
current record indicator char(1)
form_date date
to_date date
many other columns, which if change a new current record is created and previous is marked as H-historical
Data is stored in the dimension table like this
Dimension_key Natural key Source Name current record ind from_date to_date
1 10001 Source1 H 1-jan-2005 31-may-2005
1 10001 Source1 H 1-jun-20005 12-dec-2011
1 10001 Source1 C 13-dec-2011 NULL
2 20002 Source1 H 1-jun-20001 12-dec-2011
2 20002 Source1 C 13-dec-2011 NULL
The problem I see in this design is that there is no surrogate key, if any attribute is changed the new record is inserted by first taking the dimension key based on the (natural_key,source_name,current_record_ind).
Shouldn't it be stored like following based on the data-warehousing principals.
Dimension_key Natural key Source Name current record ind from_date to_date
1 10001 Source1 H 1-jan-2005 31-may-2005
2 10001 Source1 H 1-jun-20005 12-dec-2011
3 10001 Source1 C 13-dec-2011 NULL
4 20002 Source1 H 1-jun-20001 12-dec-2011
5 20002 Source1 C 13-dec-2011 NULL
Please let me know the pros and cons of the current design.And what if you have both the features something like this :
Lineno Dimension_key Natural key Source Name current record ind from_date to_date
1 1 10001 Source1 H 1-jan-2005 31-may-2005
2 1 10001 Source1 H 1-jun-20005 12-dec-2011
3 1 10001 Source1 C 13-dec-2011 NULLI mean just add a new column and populate it with required order by clause. Because what i guess, that in the second example you just added a new column which is something like a line number.
Regards
Girish Sharma -
New table contains data after Successful activation
HI All,
one DSO activation got failed due to red request which was present in the target. We repeated the DSO Activation step once we deleted the bad request from DSO and it got completed successfully.
Generally, as part of ODS Activation, the data will move to ACTIVE table and CHANGELOG table and after that the same data will get delete from the ACTIVATION(NEW) queue (u201CU tableu201D).
After that DSO Activation termination, U table data is not getting deleted for that particular table.
As per the SAP note u201C680480u201D, if any termination happens while activating DSO then there might be a chance of just activating the request after that it wonu2019t delete the activation queue. And also, from that point of time onwards the new table continue to grow.
The above mentioned sap note contains solution upto the version BW 3.5. As my system is BI 7.0, we cant implement any patches as mentioned in the note.
Can any one please tell me in which table and all do i need to delete the entry of that particular request from the table apart from RSODSACTREQ.
Reards,
Sridevi.Hi Sridevi,
Actually it's not advisable to delete the entries from the DB tables. But at times, we are forced to do that to avoid inconsistencies in the system.
If there are no pending requests in DSO for activation and if you are able to upload further to downsteam data targets like Cubes, you need not to worry about Activation Queue.
For the time being, since you have already deleted bad entries from RSODSACTREQ and RSREQICODS tables, I feel there will not be any inconsistencies in the system.. So do not delete from any more tables.
In the next complete data load, if any inconsistency found, then you delete from the other tables.
In order to avoid this in future, make the status to red in request monitor before deleting from the DSO Manage.
Regards,
Suman -
What tables contain data for the CUP requests in 5.3?
What are all of the tables that contain the data that appears in the CUP requests? We are on version 5.3 SP13.
We are selling off one of the divisions of our company and one of the terms is that we have to provide all relevant data, including CUP requests. Since there are several hundreds of requests for this division (last count was over 600), it is not practical to just download the individual requests out of CUP. So plan B is to just give them the data.
I know there are several tables that contain this data, and I know some of them (such as VIRSA_AE_REQD_HDR and VIRSA_AE_RQD_WPHS), but I don't know all of them. (and I would rather not have go thru and to check every table)
Thanks.Hi Bob,
I have never looked at the VT_AE tables to extract any information. Since your requirement is unique, I have a positive note that SAP would help you with it. Giving a try would worth it sometimes
Else, paste the list of tables here, so that some one can help you.
Have a great weekend!!
Cheers,
Raghu -
System table contains data types
Hi all,
I am migrating queries from SqlServer 7.0 to Oracle. I found a system tabel Systypes in SqlServer 7.0 which contains all the avaliable datatypes. Is there any similar table in Oracle from which we can get all data types. Any reply will be appreciated.Not only at least in 10g R2:
SQL> select type_name from dba_types where predefined = 'YES';
TYPE_NAME
KOKED
KOKED1
KOTAD
KOTADX
KOTMD
KOTMI
KOTTB
KOTTBX
KOTTD
BFILE
BINARY ROWID
TYPE_NAME
BINARY_DOUBLE
BINARY_FLOAT
BLOB
CANONICAL
CFILE
CHAR
CLOB
CONTIGUOUS ARRAY
DATE
DECIMAL
DOUBLE PRECISION
TYPE_NAME
FLOAT
INTEGER
INTERVAL DAY TO SECOND
INTERVAL YEAR TO MONTH
LOB POINTER
NAMED COLLECTION
NAMED OBJECT
NUMBER
OCTET
OID
PL/SQL BINARY INTEGER
TYPE_NAME
PL/SQL BOOLEAN
PL/SQL COLLECTION
PL/SQL LONG
PL/SQL LONG RAW
PL/SQL NATURAL
PL/SQL NATURALN
PL/SQL PLS INTEGER
PL/SQL POSITIVE
PL/SQL POSITIVEN
PL/SQL RECORD
PL/SQL REF CURSOR
TYPE_NAME
PL/SQL ROWID
PL/SQL STRING
POINTER
RAW
REAL
REF
SIGNED BINARY INTEGER(16)
SIGNED BINARY INTEGER(32)
SIGNED BINARY INTEGER(8)
SMALLINT
TABLE
TYPE_NAME
TIME
TIME WITH TZ
TIMESTAMP
TIMESTAMP WITH LOCAL TZ
TIMESTAMP WITH TZ
UNSIGNED BINARY INTEGER(16)
UNSIGNED BINARY INTEGER(32)
UNSIGNED BINARY INTEGER(8)
UROWID
VARCHAR
VARCHAR2
TYPE_NAME
VARYING ARRAY
67 rows selected. -
Foreign keys in SCD2 dimensions and fact tables in data warehouse
Hello.
I have datawarehouse in snowflake schema. All dimensions are SCD2, the columns are like that:
ID (PK) SID NAME ... START_DATE END_DATE IS_ACTUAL
1 1 XXX 01.01.2000 01.01.2002 0
2 1 YYX 02.01.2002 01.01.2004 1
3 2 SYX 02.01.2002 1
4 3 AYX 02.01.2002 01.01.2004 0
5 3 YYZ 02.01.2004 1
On this table there are relations from other dimension and fact table.
Need I create foreign keys for relation?
And if I do, on what columns? SID (serial ID) is not unique. If I create on ID, I have to get SID and actual row in any query.>
I have datawarehouse in snowflake schema. All dimensions are SCD2, the columns are like that:
ID (PK) SID NAME ... START_DATE END_DATE IS_ACTUAL
1 1 XXX 01.01.2000 01.01.2002 0
2 1 YYX 02.01.2002 01.01.2004 1
3 2 SYX 02.01.2002 1
4 3 AYX 02.01.2002 01.01.2004 0
5 3 YYZ 02.01.2004 1
On this table there are relations from other dimension and fact table.
Need I create foreign keys for relation?
>
Are you still designing your system? Why did you choose NOT to use a Star schema? Star schema's are simpler and have some performance benefits over snowflakes. Although there may be some data redundancy that is usually not an issue for data warehouse systems since any DML is usually well-managed and normalization is often sacrificed for better performance.
Only YOU can determine what foreign keys you need. Generally you will create foreign keys between any child table and its parent table and those need to be created on a primary key or unique key value.
>
And if I do, on what columns? SID (serial ID) is not unique. If I create on ID, I have to get SID and actual row in any query.
>
I have no idea what that means. There isn't any way to tell from just the DDL for one dimension table that you provided.
It is not clear if you are saying that your fact table will have a direct relationship to the star-flake dimension tables or only link to them through the top-level dimensions.
Some types of snowflakes do nothing more than normalize a dimension table to eliminate redundancy. For those types the dimension table is, in a sense, a 'mini' fact table and the other normalized tables become its children. The fact table only has a relation to the main dimension table; any data needed from the dimensions 'child' tables is obtained by joining them to their 'parent'.
Other snowflake types have the main fact table having relations to one or more of the dimensions 'child' tables. That complicates the maintenance of the fact table since any change to the dimension 'child' table impacts the fact table also. It is not recommended to use that type of snowflake.
See the 'Snowflake Schemas' section of the Data Warehousing Guide
http://docs.oracle.com/cd/B28359_01/server.111/b28313/schemas.htm
>
Snowflake Schemas
The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake.
Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Figure 19-3 presents a graphical representation of a snowflake schema. -
Dimension table and fact table exists data physically
Hi experts,
can anyone plz tell me weather dimension table and fact table exists data physically or not/Hi..Sudheer
SAPu2019s BW is based on "Enhanced Star schema" or "Info Cubes" database design.This database design has a central database table, known as u2018Fact Tableu2019 which is surrounded by associated dimension tables.
Fact table is surrounded by dimensional tables. Fact table is usually very large, that means it contains
millions to billions of records.
These dimension tables doesn't contain data it contain references to the pointer tables that point to the master data tables which in turn contain Master data objects such as customer, material and destination country stored in BW as Info objects. An InfoObjects can contain single field definitions such as transaction data or complex Customer Master Data that hold attributes, hierarchy and customer texts that are stored in their own tables.
SID is surrogate ID generated by the system. The SID tables are created when we create a master data IO. In SAP BW star schema, the distinction is made between two self contained areas: Infocube & master data tables/SID tables.
The master data doesn't reside in the satr schema but resides in separate tables which are shared across all the star schemas in SAP BW. A numer ID is generated which connects the dimension tables of the infocube to that of the master data tables.
The dimension tables contain the dim ID and SID of a particular IO. Using this SID the attributes and texts of an master data Io is accessed.
The SID table is connected to the associated master data tables via teh char key.
Fact table(Transaction data,DIM ID)<>Dimention Table(SID and Dim ID)<->Masterdata table(SID,IO)
Thanks,
Abha -
Difference between Data staging and Dimension Table ?
Difference between Data staging and Dimension Table ?
Data Staging:
Data extraction and transformation is done here.
Meaning that, if we have source data in flat file, we extract it and load into staging tables, we take care of nulls, we change datetime format etc.. and after such cleansing/transformation at then end, load it to Dim/Fact tables
Pros: Makes process simpler and easy and also we can keep track of data as we have data in staging
Cons: Staging tables need space hence need memory space
Dimension Table:
tables which describes/stores the attribute about specific objects
Below is star schema which has dimension storing information related to Product, Customer etc..
-Vaibhav Chaudhari -
Regarding Dimension Table and Fact table
Hello,
I am having basic doubts regarding the star schema.
Let me explain first regarding star schema.
Fact table containes Key fiigures and Dim IDs,Ok,
These DIm ids will be connected to my dimension tables.The Dimension table contains Characterstics and these Dim ids ,Ok.
Then My basic doubt
1.How does DIm id will be linked to SID tables
2.If I have not maintained any master data or text or Heirachies then SID tables will it be generated or not?
3.If it is generated I think there is use of This SID now..as we have not maintained Master data.
4.I am haing 18 characterstic which are no way related to each other in that scnerio how does Dimensions have to identified.?or we need to inclued whole chracterstics in one dimensions or we need to create seprate dimesnions for each of them..?(max is 13 dimensions)
5.If Dimension table contains dim ids and characterstics then where does the values for characterstics will be stored...?
( for ex..sales rep is characterstics for this we will be giving values some names where does these values will be stored..)hi Vasu,
e.g we have infocube with
- dimension 'location' -> characteristic 'sales rep', 'country'
- dimension 'partner'.
fact table
dim-id('sales person') dim-id('partner') revenue
1001 9001 500
1002 9002 300
1003 9004 200
dimenstion table 'location'
dim-id sid-id(sales rep) sid-id(country)
1001 3001 5001
1002 3004 5004
1003 3005 5001
'sales rep' sid table
sid sales rep
3001 abc
3004 pqr
3005 xyz
'country' sid table
5001 country1
5004 country2
so from the link dim-id and sid, we get
"sales rep report"
sales-rep revenue
abc 500
pqr 300
xyz 200
"country report"
country revenue
country1 700
country2 300
hope it's clear. -
Fact and dimension table partition
My team is implementing new data-warehouse. I would like to know that when should we plan to do partition of fact and dimension table, before data comes in or after?
Hi,
It is recommended to partition Fact table (Where we will have huge data). Automate the partition so that each day it will create a new partition to hold latest data (Split the previous partition into 2). Best practice is to create partition on transaction
timestamps so load the incremental data into a empty table called (Table_IN) and then Switch that data into main table (Table). Make sure your tables (Table and Table_IN) should be on one file group.
Refer below content for detailed info
Designing and Administrating Partitions in SQL Server 2012
A popular method of better managing large and active tables and indexes is the use of partitioning. Partitioning is a feature for segregating I/O workload within
SQL Server database so that I/O can be better balanced against available I/O subsystems while providing better user response time, lower I/O latency, and faster backups and recovery. By partitioning tables and indexes across multiple filegroups, data retrieval
and management is much quicker because only subsets of the data are used, meanwhile ensuring that the integrity of the database as a whole remains intact.
Tip
Partitioning is typically used for administrative or certain I/O performance scenarios. However, partitioning can also speed up some queries by enabling
lock escalation to a single partition, rather than to an entire table. You must allow lock escalation to move up to the partition level by setting it with either the Lock Escalation option of Database Options page in SSMS or by using the LOCK_ESCALATION option
of the ALTER TABLE statement.
After a table or index is partitioned, data is stored horizontally across multiple filegroups, so groups of data are mapped to individual partitions. Typical
scenarios for partitioning include large tables that become very difficult to manage, tables that are suffering performance degradation because of excessive I/O or blocking locks, table-centric maintenance processes that exceed the available time for maintenance,
and moving historical data from the active portion of a table to a partition with less activity.
Partitioning tables and indexes warrants a bit of planning before putting them into production. The usual approach to partitioning a table or index follows these
steps:
1. Create
the filegroup(s) and file(s) used to hold the partitions defined by the partitioning scheme.
2. Create
a partition function to map the rows of the table or index to specific partitions based on the values in a specified column. A very common partitioning function is based on the creation date of the record.
3. Create
a partitioning scheme to map the partitions of the partitioned table to the specified filegroup(s) and, thereby, to specific locations on the Windows file system.
4. Create
the table or index (or ALTER an existing table or index) by specifying the partition scheme as the storage location for the partitioned object.
Although Transact-SQL commands are available to perform every step described earlier, the Create Partition Wizard makes the entire process quick and easy through
an intuitive point-and-click interface. The next section provides an overview of using the Create Partition Wizard in SQL Server 2012, and an example later in this section shows the Transact-SQL commands.
Leveraging the Create Partition Wizard to Create Table and Index Partitions
The Create Partition Wizard can be used to divide data in large tables across multiple filegroups to increase performance and can be invoked by right-clicking
any table or index, selecting Storage, and then selecting Create Partition. The first step is to identify which columns to partition by reviewing all the columns available in the Available Partitioning Columns section located on the Select a Partitioning Column
dialog box, as displayed in Figure 3.13. This screen also includes additional options such as the following:
Figure 3.13. Selecting a partitioning column.
The next screen is called Select a Partition Function. This page is used for specifying the partition function where the data will be partitioned. The options
include using an existing partition or creating a new partition. The subsequent page is called New Partition Scheme. Here a DBA will conduct a mapping of the rows selected of tables being partitioned to a desired filegroup. Either a new partition scheme should
be used or a new one needs to be created. The final screen is used for doing the actual mapping. On the Map Partitions page, specify the partitions to be used for each partition and then enter a range for the values of the partitions. The
ranges and settings on the grid include the following:
Note
By opening the Set Boundary Values dialog box, a DBA can set boundary values based on dates (for example, partition everything in a column after a specific
date). The data types are based on dates.
Designing table and index partitions is a DBA task that typically requires a joint effort with the database development team. The DBA must have a strong understanding
of the database, tables, and columns to make the correct choices for partitioning. For more information on partitioning, review Books Online.
Enhancements to Partitioning in SQL Server 2012
SQL Server 2012 now supports as many as 15,000 partitions. When using more than 1,000 partitions, Microsoft recommends that the instance of SQL Server have at
least 16Gb of available memory. This recommendation particularly applies to partitioned indexes, especially those that are not aligned with the base table or with the clustered index of the table. Other Data Manipulation Language statements (DML) and Data
Definition Language statements (DDL) may also run short of memory when processing on a large number of partitions.
Certain DBCC commands may take longer to execute when processing a large number of partitions. On the other hand, a few DBCC commands can be scoped to the partition
level and, if so, can be used to perform their function on a subset of data in the partitioned table.
Queries may also benefit from a new query engine enhancement called partition elimination. SQL Server uses partition enhancement automatically if it is available.
Here’s how it works. Assume a table has four partitions, with all the data for customers whose names begin with R, S, or T in the third partition. If a query’s WHERE clause
filters on customer name looking for ‘System%’, the query engine knows that it needs only to partition three to answer
the request. Thus, it might greatly reduce I/O for that query. On the other hand, some queries might take longer if there are more than 1,000 partitions and the query is not able to perform partition elimination.
Finally, SQL Server 2012 introduces some changes and improvements to the algorithms used to calculate partitioned index statistics. Primarily, SQL Server 2012
samples rows in a partitioned index when it is created or rebuilt, rather than scanning all available rows. This may sometimes result in somewhat different query behavior compared to the same queries running on SQL Server 2012.
Administrating Data Using Partition Switching
Partitioning is useful to access and manage a subset of data while losing none of the integrity of the entire data set. There is one limitation, though. When
a partition is created on an existing table, new data is added to a specific partition or to the default partition if none is specified. That means the default partition might grow unwieldy if it is left unmanaged. (This concept is similar to how a clustered
index needs to be rebuilt from time to time to reestablish its fill factor setting.)
Switching partitions is a fast operation because no physical movement of data takes place. Instead, only the metadata pointers to the physical data are altered.
You can alter partitions using SQL Server Management Studio or with the ALTER TABLE...SWITCH
Transact-SQL statement. Both options enable you to ensure partitions are
well maintained. For example, you can transfer subsets of data between partitions, move tables between partitions, or combine partitions together. Because the ALTER TABLE...SWITCH statement
does not actually move the data, a few prerequisites must be in place:
• Partitions must use the same column when switching between two partitions.
• The source and target table must exist prior to the switch and must be on the same filegroup, along with their corresponding indexes,
index partitions, and indexed view partitions.
• The target partition must exist prior to the switch, and it must be empty, whether adding a table to an existing partitioned table
or moving a partition from one table to another. The same holds true when moving a partitioned table to a nonpartitioned table structure.
• The source and target tables must have the same columns in identical order with the same names, data types, and data type attributes
(length, precision, scale, and nullability). Computed columns must have identical syntax, as well as primary key constraints. The tables must also have the same settings for ANSI_NULLS and QUOTED_IDENTIFIER properties.
Clustered and nonclustered indexes must be identical. ROWGUID properties
and XML schemas must match. Finally, settings for in-row data storage must also be the same.
• The source and target tables must have matching nullability on the partitioning column. Although both NULL and NOT
NULL are supported, NOT
NULL is strongly recommended.
Likewise, the ALTER TABLE...SWITCH statement
will not work under certain circumstances:
• Full-text indexes, XML indexes, and old-fashioned SQL Server rules are not allowed (though CHECK constraints
are allowed).
• Tables in a merge replication scheme are not allowed. Tables in a transactional replication scheme are allowed with special caveats.
Triggers are allowed on tables but must not fire during the switch.
• Indexes on the source and target table must reside on the same partition as the tables themselves.
• Indexed views make partition switching difficult and have a lot of extra rules about how and when they can be switched. Refer to
the SQL Server Books Online if you want to perform partition switching on tables containing indexed views.
• Referential integrity can impact the use of partition switching. First, foreign keys on other tables cannot reference the source
table. If the source table holds the primary key, it cannot have a primary or foreign key relationship with the target table. If the target table holds the foreign key, it cannot have a primary or foreign key relationship with the source table.
In summary, simple tables can easily accommodate partition switching. The more complexity a source or target table exhibits, the more likely that careful planning
and extra work will be required to even make partition switching possible, let alone efficient.
Here’s an example where we create a partitioned table using a previously created partition scheme, called Date_Range_PartScheme1.
We then create a new, nonpartitioned table identical to the partitioned table residing on the same filegroup. We finish up switching the data from the partitioned table into the nonpartitioned table:
CREATE TABLE TransactionHistory_Partn1 (Xn_Hst_ID int, Xn_Type char(10)) ON Date_Range_PartScheme1 (Xn_Hst_ID) ; GO CREATE TABLE TransactionHistory_No_Partn (Xn_Hst_ID int, Xn_Type
char(10)) ON main_filegroup ; GO ALTER TABLE TransactionHistory_Partn1 SWITCH partition1 TO TransactionHistory_No_Partn; GO
The next section shows how to use a more sophisticated, but very popular, approach to partition switching called a sliding
window partition.
Example and Best Practices for Managing Sliding Window Partitions
Assume that our AdventureWorks business is booming. The sales staff, and by extension the AdventureWorks2012 database, is very busy. We noticed over time that
the TransactionHistory table is very active as sales transactions are first entered and are still very active over their first month in the database. But the older the transactions are, the less activity they see. Consequently, we’d like to automatically group
transactions into four partitions per year, basically containing one quarter of the year’s data each, in a rolling partitioning. Any transaction older than one year will be purged or archived.
The answer to a scenario like the preceding one is called a sliding window partition because
we are constantly loading new data in and sliding old data over, eventually to be purged or archived. Before you begin, you must choose either a LEFT partition function window or a RIGHT partition function window:
1. How
data is handled varies according to the choice of LEFT or RIGHT partition function window:
• With a LEFT strategy, partition1 holds the oldest data (Q4 data), partition2 holds data that is 6- to 9-months old (Q3), partition3
holds data that is 3- to 6-months old (Q2), and partition4 holds recent data less than 3-months old.
• With a RIGHT strategy, partition4 holds the holds data (Q4), partition3 holds Q3 data, partition2 holds Q2 data, and partition1
holds recent data.
• Following the best practice, make sure there are empty partitions on both the leading edge (partition0) and trailing edge (partition5)
of the partition.
• RIGHT range functions usually make more sense to most people because it is natural for most people to to start ranges at their lowest
value and work upward from there.
2. Assuming
that a RIGHT partition function windows is used, we first use the SPLIT subclause of the ALTER PARTITION FUNCTIONstatement
to split empty partition5 into two empty partitions, 5 and 6.
3. We
use the SWITCH subclause
of ALTER TABLE to
switch out partition4 to a staging table for archiving or simply to drop and purge the data. Partition4 is now empty.
4. We
can then use MERGE to
combine the empty partitions 4 and 5, so that we’re back to the same number of partitions as when we started. This way, partition3 becomes the new partition4, partition2 becomes the new partition3, and partition1 becomes the new partition2.
5. We
can use SWITCH to
push the new quarter’s data into the spot of partition1.
Tip
Use the $PARTITION system
function to determine where a partition function places values within a range of partitions.
Some best practices to consider for using a slide window partition include the following:
• Load newest data into a heap, and then add indexes after the load is finished. Delete oldest data or, when working with very large
data sets, drop the partition with the oldest data.
• Keep an empty staging partition at the leftmost and rightmost ends of the partition range to ensure that the partitions split when
loading in new data, and merge, after unloading old data, do not cause data movement.
• Do not split or merge a partition already populated with data because this can cause severe locking and explosive log growth.
• Create the load staging table in the same filegroup as the partition you are loading.
• Create the unload staging table in the same filegroup as the partition you are deleting.
• Don’t load a partition until its range boundary is met. For example, don’t create and load a partition meant to hold data that is
one to two months older before the current data has aged one month. Instead, continue to allow the latest partition to accumulate data until the data is ready for a new, full partition.
• Unload one partition at a time.
• The ALTER TABLE...SWITCH statement
issues a schema lock on the entire table. Keep this in mind if regular transactional activity is still going on while a table is being partitioned.
Thanks Shiven:) If Answer is Helpful, Please Vote -
[Rookie] Some clarification on Dimension Table
Hi!
I am preparing for BW certification, and sorry to post this very basic query. On certification notes, I came accross a statement that says - "Dimension table contains link to fact table and SID table".
I think this statement is wrong. Dimension table has DIM-ID & SID-Ids so there is no link to fact table in dimension table.
Please give your comment on this.hi,
should true/correct, check
http://help.sap.com/bp_biv335/BI_EN/documentation/Multi-dimensional_modeling_EN.doc
page 11
... These dimension tables surround the fact table, which contains the facts (key figures), and are linked to that fact table via unique keys, one per dimension table. Each dimension key uniquely identifies a row in the associated dimension table....
hope this helps. -
How and when does a dimension table gets generated
Hi Gurus,
I am new into BI and I will be put into a project within 2 months. I have learned that dimension table contains the sid's of all the charateristics in the dimension table. My conclusions are like
1. Dimension table contains the dim id as the primary key.
2. Dimension table contains sid's of the characteristics.
3. Though sid's in the dimension table are primary keys in thier 'S' table they are not key in the dimension table.
My question is
1. Is there any chance to generate new dim id's for the same combination of sid's because sid's are not part of the key?
2. I got confused when and how does the dimension table gets generated ?
I have searched in the forum and google but still my doubts didnt get clarified. If anyone could throw some light on this topic I would really appreciate it.HI,
All your conclusions are correct.
Now for your questions the answers are in line:
1. Is there any chance to generate new dim id's for the same combination of sid's because sid's are not part of the key?
No new dim id's will be generated, dim Id is unique for the same combination of sid's .
2. I got confused when and how does the dimension table gets generated ?
They get generated when you activate the info provider.
Hope this helps.
thanks,
Rahul -
Create time dimension table in repository without data warehouse
Hi,
I want to implement only BI repository solution in my customer (not datawarehousing). Is it possible to transform the data by repository tools, so that the times columns in fact tables are categorized by the "time dimension" table?
To be more explanatory:
The "Sales" table has the "time of sale" column. It contains the timestamp when the sale was performed. I have imported this table in "physical layer" of the repository. Now I want to create a new "time dimension" table, something like:
CREATE TABLE dimension_time (
Day_Key INT NOT NULL PRIMARY KEY,
Day_Timestamp DATETIME NOT NULL,
Day_Name NVARCHAR(32) NOT NULL,
Day_Text NVARCHAR(32) NOT NULL,
INSERT INTO dimension_time VALUES (20110101, {d '2011-01-01'}, '1/1', 'January 1', 'Saturday', 0, 6, 1, 1, 185, 1, 201052, 'W52', 'Week 52', 52, 201101, '01', 'January', 1, 7, 1004, 'Winter', 'Winter', 20111, 'Q1', '1st Quarter', 1, 20103, 'Q3', '3rd Quarter', 3, 20111, 'S1', '1st Semester', 1, 20102, 'S2', '2nd Semester', 2, 2011, '2011', '2011', 2010, '10/11', '2010/2011', 0);
INSERT INTO dimension_time VALUES (20110102, {d '2011-01-02'}, '2/1', 'January 2', 'Sunday', 0, 7, 2, 2, 186, 2, 201052, 'W52', 'Week 52', 52, 201101, '01', 'January', 1, 7, 1004, 'Winter', 'Winter', 20111, 'Q1', '1st Quarter', 1, 20103, 'Q3', '3rd Quarter', 3, 20111, 'S1', '1st Semester', 1, 20102, 'S2', '2nd Semester', 2, 2011, '2011', '2011', 2010, '10/11', '2010/2011', 0);
and after to add a new column in "sales" fact table for "time dimension ID" and through the repository populate this column based on the "time of sale" column and the corresponding "time dimension ID".
I know that the ETL process might perform it, but I do not want to go for Data Warehousing (it is not real - time, needs more resources, etc).
Is it possible to perform such action only on repository?
Thank you.Hi,
I can do it, but this would be usefull only to create "time dimension" table. But also the "sales" fact table needs to be altered (thus, the "time" column will not contain the value of the time, but the ID of the corresponding time in the "time dimension" table).
I know that on DW this procedure is done automatically by the ETL process.
My question is that does the repository has any tools similar to this?
Thank you.
Maybe you are looking for
-
Fatal: Communication Failure: The computer is no longer able to communicate
I was printing some documents and was doing great, all of a sudden the printer stopped and started showing this message; +"fatal: Communication Failure: The computer is no longer able to communicate with your printer. Turn the printer off, check your
-
DPI setting of a scanned document
I am working with Acrobat 7.0 and Acrobat Reader 9.0. I was given several scanned pdf files. Is it possible to see what dpi setting was used during the scanning process?
-
InDesign (CC 2014) crashing every time I try to run a spell check
InDesign (CC 2014) crashing every time I try to run a spell check, I was wondering if anybody has had this before. I'm am also experiencing a delayed response, specially when I try to type, I get the colour wheel for about 5 to 8 seconds every time
-
Mac OS 10.3 won't install on my HD
So i recently mistakenly formatted my hard drive to install Mac OS 10.5(I have no idea why i thought that would help) and so to figure out that i couldn't(Due to the fact that it was a gray disc) so I found one of my old Mac OS install disc for 10.3
-
My partner and I both have an iPhone 5s on the Three network in the UK Both phones are on iOS 8.03 We are currently in portugal and because Three do not currently allow minutes from the UK to be used in Portugal. We both bought the europass data to u