Analyze command in a Data warehouse env

We are doing data loads daily on our Data warehouse. On certain target tables, we have change data capture enabled. As part of loading the table ( 4 million rows total) , we remove data for a certain time period ( say a month = 50,000+ rows) and loading that again from the source. We are also doing a full table analyze part of this load and is taking a long time.
Question is : Do we need to do the analyze command every day ? Is there a big change we would see if we run the analyze once a week ?
Thanks.

Hi srwijese,
My DW actually has 12TBs and after each dataload we do stats collection from our tables, BUT, we have partitioned tables in most of cases, so we just collect it at partition level using dbms_stat package. I don't know if your enviroment is partitioned or not, if yes, do a stats collection just for partition loaded.
P.S: If you wish add [email protected] (MSN) to share experiences.
Jonathan Ferreira
http://oracle4dbas.blogspot.com

Similar Messages

Partition based on join in data warehouse env.

Hi,
I am working on DW environment and am quite new to it.
The scenario is like there are 2 fact tables and one dimension table.
Now I need to partition both the fact tables based on the dates in Dim table. My problem is there is no date mentioned in fact table other then Surogate keys in fact tables.
Is it possible to partition the table by using join condition between fact table and Dim table and if yes how?
The structure of tables are as follows.
Please help!
Daily sales fact table
planned shipment date sk     NUMBER     PK1, FK1
source system identifier     VARCHAR2(50)     PK2
sales type indicators sk     NUMBER     PK3, FK3
ship-to-customer sk     NUMBER     PK4, FK4
product sk     NUMBER     PK5, FK5
ship-from-location sk     NUMBER     PK6, FK6
quantity in primary units     NUMBER(14,3)
quantity in 9LE     NUMBER(14,3)
Daily Sales Order Fact
sales invoiced date sk     NUMBER     PK1, FK1
source system identifier     VARCHAR2(50)     PK2
sales type indicators sk     NUMBER     PK3, FK3
ship-to-customer sk     NUMBER     PK4, FK4
product sk     NUMBER     PK5, FK5
ship-from-location sk     NUMBER     PK6, FK6
quantity in primary units     NUMBER(14,3)
quantity in 9LE     NUMBER(14,3)
Sales order month Dim
sales order month sk     NUMBER     PK1
sales order month full name     VARCHAR2(50)
sales order month number     NUMBER(2)
sales order month calendar year     NUMBER(4)
sales order month financial year     NUMBER(4)
sales order month start date     DATE     AK1
sales order month end date     DATE
sales order month end date sk     NUMBER
days in sales order month     NUMBER(2)
event number     NUMBER(12)
last update date     DATE
Thanks in advance.

If you take care that the synthetic key values for you date are assigned in ascending order with the date value then you can range equi-partition by range on both of them without too much trouble.
Personally I don't use synthetic values as PK's on dates, partly for this very reason.

Diff b/w Data warehouse and Business Warehouse

Hi all,
what is the Diff b/w Data warehouse and Business Warehouse?

hi..
The diferrence between Datawarehousing and Business Warehouse are as follows.
DataWarehousing is the concept and BIW is a tool that uses this concept in Business applicaitons.
DataWarehousing allows you to analyze tons of data (millions and millions of records of data) in a convinent and optimum way, it is called BIW when applied to Business applications like analyzing the sales of a company.
Advantages- Consedering the volume of business data, BIW allows you to make decisions faster, I mean you can analyze data faster. Support for multiple languges easy to use and so on.
Refer this
Re: WHAT IS THE DIFFERENCE BETWEEN BIW & DATAWAREHOUSING
hope it helps...

Performance issues with data warehouse loads

We have performance issues with our data warehouse load ETL process. I have run
analyze and dbms_stats and checked database environment. What other things can I do to optimize performance? I cannot use statspack since we are running Oracle 8i. Thanks
Scott

Hi,
you should analyze the db after you have loaded the tables.
Do you use sequences to generate PKs? Do you have a lot of indexex and/or triggers on the tables?
If yes:
make sure your sequence caches (alter sequence s cache 10000)
Drop all unneeded indexes while loading and disable trigger if possible.
How big is your Redo Log Buffer? When loading a large amount of data it may be an option to enlarge this buffer.
Do you have more then one DBWR Process? Writing parallel can speed up things when a checkpoint is needed.
Is it possible using a direct load? Or do you already direct load?
Dim

Error in analyzer Log file (/sapdb/data/wrk/ACP/analyzer--- DBAN.err)

Hello All,
I am getting the following Error message in analyzer Log file (/sapdb/data/wrk/ACP/analyzer---> DBAN.err).
the details are as follows:-
=====================================================
<i>2006-07-24 08:55:59
ERROR 5: Cannot execute SQL statement.
[MySQL MaxDB][LIBSQLOD SO][MaxDB] General error;-4008 POS(1) Unknown user name/password combination
SELECT YEAR(NOW()),MONTH(NOW()),DAY(NOW()),HOUR(NOW()),MINUTE(NOW()),SECOND(NOW()) FROM DUAL
2006-07-26 12:15:39
ERROR 20: Database Analyzer not active in directory "/sapdb/data/wrk/ACP/analyzer".
2006-08-03 12:33:08
ERROR 5: Cannot execute SQL statement.
[MySQL MaxDB][LIBSQLOD SO][MaxDB] Communication link failure;-709 CONNECT: (database not running: no request pipe)
SELECT YEAR(NOW()),MONTH(NOW()),DAY(NOW()),HOUR(NOW()),MINUTE(NOW()),SECOND(NOW()) FROM DUAL</i>
=====================================================
can you please tell me what does that mean for my Database.
The main problem that I am facing is I am not able to start my SAP application. when I issue startsap from <SID>adm login then I get error messag saying not able to connect to the database. Although the database is already up and running.
Please help me !
Regards,
Premkishan chourasia

Hi,
well, the error -4008 denotes that the user/password combination used by the DB Analyzer for accessing the DB are incorrect. The DB Analyzer tries to issue SQL commands with the SYSDBA user.
Do you know the user/password combination of your SYSDBA user?
Regards,
Roland

Unread      Implementing heirarichal structure in data warehouse

I want to create a data warehouse for credit card application. Each user can have a credit card and multiple supplementary credit cards. Each credit card has a main limit, which can be sub-divided into sub-limits to supplementary credit cards as requested by the user. Let us consider the following example:
User “A” has a credit card “CC” with Limit “L” and its limit is $100,000.
User “A” requested for a supplementary credit card “CC1” which is assigned limit
“L1” = $50,000. He requests for another supplementary credit card “CC2” which is assigned limit “L2” = $100,000.
Source tables contain data like this:
1. src_client_card_trans: contains transaction data of client/user credit card usage (client_id, credit_card_number, balance_acquired)
Client_id     Credit_card_number     Balance_acquired
A     CC1     $20,000
A     CC2     $50,000
A     CC     $70,000
2. src_card_limits: contains client’s credit cards linked to credit limits.
Credit_card_number     Limit_id
CC1     L1
CC2     L2
CC     L
3. src_limit_structure: contains the relationship of limits and sub-limits.
Limit_id     Sub_Limit_id
L     L1
L     L2
I have designed two dimensions and one fact table. Dimensions are:
1. LIMITS: contains the limit_id.
2. CLIENTS: contains credit card user’s information.
And fact table is LIMIT_BALANCES_FACT, which have some fact columns with the above dimensions.
How can I implement the above scenario of limit hierarchy in data warehouse? Need your suggestions.
Thanks in advance

Much depends on how you want to analyze the data and there are a few options:
1) Use credit limit as an attribute of the customer dimension. This would allow you to create query filters that can just show those customers with a $100,000 credit limit. This would return a list of credit cards (since the attribute would be assigned to each credit card) and then you can simply add or just keep the parents of that result set.
However, this assumes you do not want to measure data specifically relating to credit card limit. For example it would not be possible to view a total amount spent by all customers who had a credit-limit of $100,000.
In this case the attribute, credit limit, is simply used to filter a result set
2) Create a separate dimension called Credit Limit and create three levels:
All
Range
Credit Limit
The level Range would contain groupings of credit limits such as 100-500, 501-1200, 1201-1,000 etc etc.
This would allow you to analyse your data by customer and by credit limit over time. Allowing you to slice and dice quickly and easily.
3) A second customer hierarchy could be added to the customer dimension. This would allow you to drill-down through different credit limits to customers to individual credit cards. It would be advisable to follow the same approach as option 2 and create some groupings for the credit limits to make the drill down easier for your business users to navigate:
All
Range
Credit Limit
Customer
Credit Card
Hope this helps
Keith Laker
Oracle EMEA Consulting
BI Blog: http://oraclebi.blogspot.com/
DM Blog: http://oracledmt.blogspot.com/
BI on Oracle: http://www.oracle.com/bi/
BI on OTN: http://www.oracle.com/technology/products/bi/
BI Samples: http://www.oracle.com/technology/products/bi/samples/

Implementing heirarichal structure in data warehouse

I want to create a data warehouse for credit card application. Each user can have a credit card and multiple supplementary credit cards. Each credit card has a main limit, which can be sub-divided into sub-limits to supplementary credit cards as requested by the user. Let us consider the following example:
User “A” has a credit card “CC” with Limit “L” and its limit is $100,000.
User “A” requested for a supplementary credit card “CC1” which is assigned limit
“L1” = $50,000. He requests for another supplementary credit card “CC2” which is assigned limit “L2” = $100,000.
Source tables contain data like this:
1. src_client_card_trans: contains transaction data of client/user credit card usage (client_id, credit_card_number, balance_acquired)
Client_id     Credit_card_number     Balance_acquired
A     CC1     $20,000
A     CC2     $50,000
A     CC     $70,000
2. src_card_limits: contains client’s credit cards linked to credit limits.
Credit_card_number     Limit_id
CC1     L1
CC2     L2
CC     L
3. src_limit_structure: contains the relationship of limits and sub-limits.
Limit_id     Sub_Limit_id
L     L1
L     L2
I have designed two dimensions and one fact table. Dimensions are:
1. LIMITS: contains the limit_id.
2. CLIENTS: contains credit card user’s information.
And fact table is LIMIT_BALANCES_FACT, which have some fact columns with the above dimensions.
How can I implement the above scenario of limit hierarchy in data warehouse? Need your suggestions.
Thanks in advance

Much depends on how you want to analyze the data and there are a few options:
1) Use credit limit as an attribute of the customer dimension. This would allow you to create query filters that can just show those customers with a $100,000 credit limit. This would return a list of credit cards (since the attribute would be assigned to each credit card) and then you can simply add or just keep the parents of that result set.
However, this assumes you do not want to measure data specifically relating to credit card limit. For example it would not be possible to view a total amount spent by all customers who had a credit-limit of $100,000.
In this case the attribute, credit limit, is simply used to filter a result set
2) Create a separate dimension called Credit Limit and create three levels:
All
Range
Credit Limit
The level Range would contain groupings of credit limits such as 100-500, 501-1200, 1201-1,000 etc etc.
This would allow you to analyse your data by customer and by credit limit over time. Allowing you to slice and dice quickly and easily.
3) A second customer hierarchy could be added to the customer dimension. This would allow you to drill-down through different credit limits to customers to individual credit cards. It would be advisable to follow the same approach as option 2 and create some groupings for the credit limits to make the drill down easier for your business users to navigate:
All
Range
Credit Limit
Customer
Credit Card
Hope this helps
Keith Laker
Oracle EMEA Consulting
BI Blog: http://oraclebi.blogspot.com/
DM Blog: http://oracledmt.blogspot.com/
BI on Oracle: http://www.oracle.com/bi/
BI on OTN: http://www.oracle.com/technology/products/bi/
BI Samples: http://www.oracle.com/technology/products/bi/samples/

Permanent Job Opportunity - Oracle BI Data Warehouse Developer Chicago, IL

Submit Resumes to [email protected]
The Business Intelligence Specialist will play a critical role in designing, developing, deploying, and supporting data warehouse/data mart applications. In this role, the person will be responsible for all BI aspects of a data warehouse/data mart application. Primary duties will be to create reporting standards, as well as coach and support power users with selected Oracle tool. The ideal candidate will have 3+ years demonstrated experience in data warehousing and Business Intelligence tools. Must also possess excellent communication skills and an outstanding track record with the user.
Principal Duties:
Participates with internal clients to define software requirements for development, maintenance and/or improvements
Maintains accuracy, integrity, and availability of the data warehouse
Tests, monitors, manages, and validates data warehouse activity, including data extraction, transformation, movement, loading, cleansing, and updating processes
Designs and optimizes data mart models for Oracle Business Intelligence Suite.
Translates the reporting requirements into data analysis and reporting solutions.
Reviews and sign off on project plan(s).
Reviews and sign off on technical design(s).
Defines and develops BI reports for accessing/analyzing data in warehouse.
Customizes BI tools and data sets for different types of users.
Designs and develop UAT (User Acceptance Testing).
Drives improvement of BI system architecture and development process.
Develops and maintains internal relationships. Actively champions teamwork. Uses internal resources to enhance knowledge and expertise of industry, research, products and services. Provides information and support to others in the company.
Required Skills:
Education and Experience:
BS/MS in Computer Science or equivalent.
3+ years of experience with Oracle, PL/SQL Development and Data Warehousing.
Experience Oracle Business Intelligence Suite and Crystal Reports is a plus.
2-3 years dimensional modeling experience.
Demonstrated hands on experience with Unix/Linux, SQL required.
Demonstrated hands on experience with Oracle reporting tools.
Demonstrated experience with translating business requirements into data analysis and reporting solutions.
Experience in training programs/teach users to use tools.
Expertise with software development process.
Effective mediator - able to facilitate constructive and productive discussions with internal customers, external clients, and development personnel pertaining to feature definition, project scope, and status
Problem solving*identifies and resolves problems in a timely manner, gathers and analyzes information skillfully and maintains confidentiality.
Planning/organizing*prioritizes and plans work activities and uses time efficiently. Work requires continual attention to detail in composing and proofing materials, establishing priorities and meeting deadlines. Must be able to work in a fast-paced environment with demonstrated ability to juggle multiple competing tasks and demands.
Quality control*demonstrates accuracy and thoroughness and monitors own work to ensure quality.
Adaptability*adapts to changes in the work environment, manages competing demands and is able to deal with frequent change, delays or unexpected events.
Benefits/Compensation:
Employees enjoy competitive compensation. We have a full benefits package including medical and dental insurance, long-term disability and life insurance and a 401(k) plan.
The client operates within the healthcare industry.
This is a permanent full-time position. After ensuring your availability and qualifications we will put you in direct contact with the client to move forward in the process.

FORWARD THE UPDATED RESUME AS SOON AS POSSIBLE.

Imported sealed Management Pack and its not showing up in Data Warehouse Job MPSyncJob

Hello,
When i try to Import a sealed Management Pack (with defined new Dimensions for Data Warehouse) it works fine, and it shows up under Management Packs under the Administration Pane. But its not showing up in the Data Warehouse Job "MPSyncJob".
No errors in the Eventlog on the DW Management Server and cant see any errors in failed Data Warehouse Jobs either.
I imported the MP three Days ago and still nothing.
Now what?
/Maekee

I have a smaller SCSM Lab that i can import this MP in and the classes are created in the DW. Now is the question why its not working in my PreProd env and why i dont get any errors in the Event Log? Anyone that can Point me in the right direction for
this?
/Maekee

Unable to register my data warehouse in Service Manager

I have been trying to register my data warehouse but keep getting the same error message each time - "Invalid URI: the hostname could
not be parsed." I know the issue is on the ServiceManager database side of things, but there is not a lot of information related to this error message and the info I do find is unrelated to ServiceManager. I have gone as far as building a
brand new data warehouse server and reinstalling all of the data warehouse databases from scratch. It doesn't matter if I tried to register with the original data warehouse environment or the new environment, I get the same error message. I also
received a powershell command from Microsoft Support to help clean up any residual entries in the ServiceManager database that might be left over. At this point I'm at a loss on how to proceed. Anyone ever run into this issue?
Here's the event log message:
Unable to register Service Manager installation with Data Warehouse installation.
Data Warehouse Server: DW_SCSM
Service Manager Management Server: FlowSMRC
Exception: Microsoft.EnterpriseManagement.UI.Core.Shared.PowerShell.PSServiceException: Invalid URI: The hostname could not be parsed.
at Microsoft.EnterpriseManagement.UI.Core.Shared.PowerShell.PSHostService.executeHelper(String cmd, Object[] input)
at Microsoft.EnterpriseManagement.UI.Core.Shared.PowerShell.PSHostService.Invoke(String cmd, Object[] input)
at Microsoft.EnterpriseManagement.ServiceManager.UI.Administration.DWRegistration.Registration.DWRegistrationHelper.AcceptChanges(WizardMode wizardMode)
Also, I tried registering the data warehouse using powershell Add-SCDWMgmtGroup and get an error saying that Cannot associate Service Manager installation on scsm_server with Service Manager Data Warehouse installation on dw_server.

Just got additional information about this error. If you run the following query in SQL pointing to the ServiceManager database you should see the name of the Management Server. The query is:
select * from MT_Microsoft$SystemCenter$ResourceAccessLayer$SdkResourceStore
There is a column named Server_<some GUID> that should contain the Management Sever NetBIOS name. If it has some other name, run an update query to ensure the column has the NetBIOS name of the Management Server. Restart the SCSM Console
and retry the Database Registration again and it should succeed.
Microsoft has said that this is normally caused by either moving the database or database server to another server and the value is never updated.
Try registering the DW Server again and see if you have better luck.

Data Warehouse Jobs stuck at running - Since February!

Folks,
My incidents have not been groomed out of the console since February. I ran the Get-SCDWJob and found most of the jobs are disabled. See below. I've tried to enable all of them using PowerShell and they never are set back to Enabled.
No errors are present in the Event log. In fact, the Event log shows successfully starting the jobs.
I've restarted the three services. Rebooted the server.
I've been using this blog post as a guide.
http://blogs.msdn.com/b/scplat/archive/2010/06/07/troubleshooting-the-data-warehouse-data-warehouse-isn-t-getting-new-data-or-jobs-seem-to-run-forever.aspx
Anyone have any ideas?
Win 08 R2 and SQL 2008 R2 SP 1.
BatchId Name                 Status       CategoryName     StartTime
EndTime                  IsEnabled
13810   DWMaintenance        Running      Maintenance      3/22/2013 4:26:00 PM
True
13807   Extract_DW_           Running      Extract          2/28/2013 7:08:00 PM
False
        ServMgr_MG
13808   Extract_Ser Running      Extract          2/28/2013 7:08:00 PM
False
        vMgr_MG
13780   Load.CMDWDataMart    Running      Load             2/28/2013 7:08:00 PM
False
13784   Load.Common          Running      Load             2/28/2013 7:08:00 PM
False
13781   Load.OMDWDataMart    Running      Load             2/28/2013 7:08:00 PM
False
13809   MPSyncJob            Running      Synchronization 2/28/2013 8:08:00 PM
True
3405    Process.SystemCenter Running      CubeProcessing   1/31/2013 3:00:00 AM     2/10/2013 2:59:00 PM     True
        ChangeAndActivityMan
        agementCube
3411    Process.SystemCenter Running      CubeProcessing   1/31/2013 3:00:00 AM     2/10/2013 2:59:00 PM     True
        ConfigItemCube
3407    Process.SystemCenter Running      CubeProcessing   1/31/2013 3:00:00 AM     2/10/2013 2:59:00 PM     True
        PowerManagementCube
3404    Process.SystemCenter Running      CubeProcessing   1/31/2013 3:00:00 AM     2/10/2013 2:59:00 PM     True
        ServiceCatalogCube
3406    Process.SystemCenter Running      CubeProcessing   1/31/2013 3:00:00 AM     2/10/2013 2:59:00 PM     True
        SoftwareUpdateCube
3410    Process.SystemCenter Running      CubeProcessing   1/31/2013 3:00:00 AM     2/10/2013 2:59:00 PM     True
        WorkItemsCube
13796   Transform.Common     Running      Transform        2/28/2013 7:08:00 PM
False

Okay, I've done to much work without writing it down. I've gotten it to show me a new error using Marcel's script. The error is below.
It looks like a Cube issue. Not sure how to fix it.
There is no need to wait anymore for Job DWMaintenance because there is an error in module ManageCubeTranslations an
e error is: <Errors><Error EventTime="2013-07-29T19:03:30.1401986Z">The workitem to add cube translations was aborte
cause a lock was unavailable for a cube.</Error></Errors>
Also running the command Get-SCDWJobModule | fl >> c:\temp\jobs290.txt shows the following errors.
JobId               : 302
CategoryId          : 1
JobModuleId         : 6350
BatchId             : 3404
ModuleId            : 5869
ModuleTypeId        : 1
ModuleErrorCount    : 0
ModuleRetryCount    : 0
Status              : Not Started
ModuleErrorSummary : <Errors><Error EventTime="2013-02-10T19:58:30.6412697Z">The connection either timed out or was lo
                      st.</Error></Errors>
ModuleTypeName      : Health Service Module
ModuleName          : Process_SystemCenterServiceCatalogCube
ModuleDescription   : Process_SystemCenterServiceCatalogCube
JobName             : Process.SystemCenterServiceCatalogCube
CategoryName        : CubeProcessing
Description         : Process.SystemCenterServiceCatalogCube
CreationTime        : 7/29/2013 12:57:39 PM
NotToBePickedBefore :
ModuleCreationTime : 7/29/2013 12:57:39 PM
ModuleModifiedTime :
ModuleStartTime     :
ManagementGroup     : DW_Freeport_ServMgr_MG
ManagementGroupId   : f61a61f2-e0fe-eb37-4888-7e0be9c08593
JobId               : 312
CategoryId          : 1
JobModuleId         : 6436
BatchId             : 3405
ModuleId            : 5938
ModuleTypeId        : 1
ModuleErrorCount    : 0
ModuleRetryCount    : 0
Status              : Not Started
ModuleErrorSummary : <Errors><Error EventTime="2013-02-10T19:58:35.1028411Z">Object reference not set to an instance o
                      f an object.</Error></Errors>
ModuleTypeName      : Health Service Module
ModuleName          : Process_SystemCenterChangeAndActivityManagementCube
ModuleDescription   : Process_SystemCenterChangeAndActivityManagementCube
JobName             : Process.SystemCenterChangeAndActivityManagementCube
CategoryName        : CubeProcessing
Description         : Process.SystemCenterChangeAndActivityManagementCube
CreationTime        : 2/10/2013 7:58:31 PM
NotToBePickedBefore : 2/10/2013 7:58:35 PM
ModuleCreationTime : 2/10/2013 7:58:31 PM
ModuleModifiedTime : 2/10/2013 7:58:35 PM
ModuleStartTime     : 2/10/2013 7:58:31 PM
ManagementGroup     : DW_Freeport_ServMgr_MG
ManagementGroupId   : f61a61f2-e0fe-eb37-4888-7e0be9c08593
JobId               : 331
CategoryId          : 1
JobModuleId         : 6816
BatchId             : 3406
ModuleId            : 6242
ModuleTypeId        : 1
ModuleErrorCount    : 0
ModuleRetryCount    : 0
Status              : Not Started
ModuleErrorSummary : <Errors><Error EventTime="2013-02-10T19:58:38.7064180Z">Object reference not set to an instance o
                      f an object.</Error></Errors>
ModuleTypeName      : Health Service Module
ModuleName          : Process_SystemCenterSoftwareUpdateCube
ModuleDescription   : Process_SystemCenterSoftwareUpdateCube
JobName             : Process.SystemCenterSoftwareUpdateCube
CategoryName        : CubeProcessing
Description         : Process.SystemCenterSoftwareUpdateCube
CreationTime        : 2/10/2013 7:58:35 PM
NotToBePickedBefore : 2/10/2013 7:58:39 PM
ModuleCreationTime : 2/10/2013 7:58:35 PM
ModuleModifiedTime : 2/10/2013 7:58:39 PM
ModuleStartTime     : 2/10/2013 7:58:35 PM
ManagementGroup     : DW_Freeport_ServMgr_MG
ManagementGroupId   : f61a61f2-e0fe-eb37-4888-7e0be9c08593
JobId               : 334
CategoryId          : 1
JobModuleId         : 6822
BatchId             : 3407
ModuleId            : 6246
ModuleTypeId        : 1
ModuleErrorCount    : 0
ModuleRetryCount    : 0
Status              : Not Started
ModuleErrorSummary : <Errors><Error EventTime="2013-02-10T19:58:42.2943950Z">Object reference not set to an instance o
                      f an object.</Error></Errors>
ModuleTypeName      : Health Service Module
ModuleName          : Process_SystemCenterPowerManagementCube
ModuleDescription   : Process_SystemCenterPowerManagementCube
JobName             : Process.SystemCenterPowerManagementCube
CategoryName        : CubeProcessing
Description         : Process.SystemCenterPowerManagementCube
CreationTime        : 2/10/2013 7:58:39 PM
NotToBePickedBefore : 2/10/2013 7:58:42 PM
ModuleCreationTime : 2/10/2013 7:58:39 PM
ModuleModifiedTime : 2/10/2013 7:58:42 PM
ModuleStartTime     : 2/10/2013 7:58:39 PM
ManagementGroup     : DW_Freeport_ServMgr_MG
ManagementGroupId   : f61a61f2-e0fe-eb37-4888-7e0be9c08593
JobId               : 350
CategoryId          : 1
JobModuleId         : 6890
BatchId             : 3410
ModuleId            : 6299
ModuleTypeId        : 1
ModuleErrorCount    : 0
ModuleRetryCount    : 0
Status              : Not Started
ModuleErrorSummary : <Errors><Error EventTime="2013-02-10T19:58:45.8355723Z">Object reference not set to an instance o
                      f an object.</Error></Errors>
ModuleTypeName      : Health Service Module
ModuleName          : Process_SystemCenterWorkItemsCube
ModuleDescription   : Process_SystemCenterWorkItemsCube
JobName             : Process.SystemCenterWorkItemsCube
CategoryName        : CubeProcessing
Description         : Process.SystemCenterWorkItemsCube
CreationTime        : 2/10/2013 7:58:42 PM
NotToBePickedBefore : 2/10/2013 7:58:46 PM
ModuleCreationTime : 2/10/2013 7:58:42 PM
ModuleModifiedTime : 2/10/2013 7:58:46 PM
ModuleStartTime     : 2/10/2013 7:58:42 PM
ManagementGroup     : DW_Freeport_ServMgr_MG
ManagementGroupId   : f61a61f2-e0fe-eb37-4888-7e0be9c08593
JobId               : 352
CategoryId          : 1
JobModuleId         : 6892
BatchId             : 3411
ModuleId            : 6300
ModuleTypeId        : 1
ModuleErrorCount    : 0
ModuleRetryCount    : 0
Status              : Not Started
ModuleErrorSummary : <Errors><Error EventTime="2013-02-10T19:58:49.6887476Z">Object reference not set to an instance o
                      f an object.</Error></Errors>
ModuleTypeName      : Health Service Module
ModuleName          : Process_SystemCenterConfigItemCube
ModuleDescription   : Process_SystemCenterConfigItemCube
JobName             : Process.SystemCenterConfigItemCube
CategoryName        : CubeProcessing
Description         : Process.SystemCenterConfigItemCube
CreationTime        : 2/10/2013 7:58:46 PM
NotToBePickedBefore : 2/10/2013 7:58:50 PM
ModuleCreationTime : 2/10/2013 7:58:46 PM
ModuleModifiedTime : 2/10/2013 7:58:50 PM
ModuleStartTime     : 2/10/2013 7:58:46 PM
ManagementGroup     : DW_Freeport_ServMgr_MG
ManagementGroupId   : f61a61f2-e0fe-eb37-4888-7e0be9c08593

Data Warehouse Infrastructure

I have a requirement to build a Data Warehouse and Analytics / Reporting capability with the following requirements...
Maximum of 1TB for Production Data + DR + Test/Dev Env.
SSIS (up to 25 sources), SSAS (cubes, 5 concurrent users) and SSRS (2 concurrent users, max 500 reports).
I needs a Production, DR and Test/Dev Environment
I have been told that I will require 12 servers each having 4 cores and 12GB of storage (4 for Prod, 4 DR and 4 Test/Dev).
To give you an idea of load we plan to have 1 full time ETL developer, 5 Data Analysts, 2 Reporting Analysts. We are quite a small business and don't have a particularly large
amount of data.
The model has SQL Server, SSIS, SSAS, SSRS on different servers across each Environment.
Any idea if this is overkill? I also have an estimate of 110 days for Setting up the Servers, Installing the SQL Server software and general Infrastructure design activity.

Agree. Overkill. Big overkill.
I would recommend production/DR/Dev each have 2 servers. I'd put SSAS, SSRS and SSIS one one and the DB on the other.
In production, SSAS/SSRS will be active during the daytime; SSIS will likely be active off hours. So putting all that on one box should be fine for sharing the load. The DB on a second box would be good since it will likely be busy during the daytime
and night time. Four processors may be heavy depending on the types of queries and usage patterns. I suspect you can get by with 2 processor servers, but would recommend buying the 4 processor boxes for dev and production, get them configured and run
some performance baselines before putting in the DR environment. Then, if you find the CPUs idling, you can always cut the DR environment to 2 processor boxes. Not sure it's worth the minor cost savings to save 2 processors on 2 boxes with that effort, but
if you're looking to cut corners, you may find that a 2 processor per server DR environment is within your performance comfort zone.
For the dev environment, one box may well handle it all, but I'd go for 2. On average, a Dev environment isn't all that busy, but when you need the horsepower, you need it. And since it's Development AND Test, you help yourself by having realistic production
level performance on what you're testing. Four processors is fine, but max it out on memory.
As for hard drives, be careful about configuration. You need the space on your DW server and maybe for the SSAS server depending on how the cubes are built (ROLAP/MOLAP). When you speak about amounts of data, be careful since you'll want a lot of indexes,
and that can double the DB size for a DW. Your DW will also run faster if you have different filegroups for data/indexes/temp DB, but only if those different filegroups are on different physical media that work well in parallel. You can always get fancier
with more filegroups to have different ones for staging tables, for segregating fact & dimension tables etc. But for this size DB, that's overkill as well.
Mainly, I'd look at spending hardware $s on memory for the servers, but get less of them.
Now... two questions...
1) Can you clarify the disk space needs? How much total data space in one environment, without indexes? Based on that, add the same for indexes, add half as much (?) for TempDB and you have the core disk needs. Depending on how much it is,
you can decide on RAID, filegroup configuration, etc. And if the disk space with indexes is small enough that it all fits in memory, then disk and filegroup configuration becomes inconsequential except for ETL loads.
2) The 25 sources... can you clarify that? 25 source systems? Total of 25 source applications? Total of 25 tables? Curious, because I'm wondering about how long you'd keep 1 full time ETL developer busy.

Where to find best practices for tuning data warehouse ETL queries?

Hi Everybody,
Where can I find some good educational material on tuning ETL procedures for a data warehouse environment? Everything I've found on the web regarding query tuning seems to be geared only toward OLTP systems. (For example, most of our ETL
queries don't use a WHERE statement, so the vast majority of searches are table scans and index scans, whereas most index tuning sites are striving for index seeks.)
I have read Microsoft's "Best Practices for Data Warehousing with SQL Server 2008R2," but I was only able to glean a few helpful hints that don't also apply to OLTP systems:
often better to recompile stored procedure query plans in order to eliminate variances introduced by parameter sniffing (i.e., better to use the right plan than to save a few seconds and use a cached plan SOMETIMES);
partition tables that are larger than 50 GB;
use minimal logging to load data precisely where you want it as fast as possible;
often better to disable non-clustered indexes before inserting a large number of rows and then rebuild them immdiately afterward (sometimes even for clustered indexes, but test first);
rebuild statistics after every load of a table.
But I still feel like I'm missing some very crucial concepts for performant ETL development.
BTW, our office uses SSIS, but only as a glorified stored procedure execution manager, so I'm not looking for SSIS ETL best practices. Except for a few packages that pull from source systems, the majority of our SSIS packages consist of numerous "Execute
SQL" tasks.
Thanks, and any best practices you could include here would be greatly appreciated.
-Eric

Online ETL Solutions are really one of the biggest challenging solutions and to do that efficiently , you can read my blogs for online DWH solutions to know at the end how you can configure online DWH Solution for ETL using Merge command of SQL Server
2008 and also to know some important concepts related to any DWH solutions such as indexing , de-normalization..etc
http://www.sqlserver-performance-tuning.com/apps/blog/show/12927061-data-warehousing-workshop-1-4-
http://www.sqlserver-performance-tuning.com/apps/blog/show/12927103-data-warehousing-workshop-2-4-
http://www.sqlserver-performance-tuning.com/apps/blog/show/12927173-data-warehousing-workshop-3-4-
http://www.sqlserver-performance-tuning.com/apps/blog/show/12927061-data-warehousing-workshop-1-4-
Kindly let me know if any further help is needed
Shehap (DB Consultant/DB Architect) Think More deeply of DB Stress Stabilities

Oracle for a Data Warehouse & Data Mining Project

Hello,
I am working for a marketing company and we have a pretty big OLTP database and my supervisor wants to make use of these data for decision making. The plan is to create a
data mart(if not a warehouse) and use data mining tools on top of it.
He does not want to buy one of those expensive tools so I was wondering if we could handle such a project just by downloading OWB and Darwin from Oracle site? None of us are data warehouse specialists so it will be very though for us. But I would like to learn. Actually, I was looking for some example warehouse + mining environment implementations to get the main ideas. I will appreciate any suggestions and comments.
Thank you

Go to
http://www.oracle.com/ip/analyze/warehouse/datamining/
for white papers, demos, etc. as a beginning.
Also, Oracle University offers a course on Oracle data Mining.

Normalized (3NF) VS Denormalized(Star Schema) Data warehouse :

what are the benefits of normalized data warehouse (3NF) over the denormalized (Star schema)?
if DW is in the 3NF then is need to create the seprate physical database which contains several data marts( star schema)with physical tables, which feeds to cube or create the views(SSAS data source view) on top of 3NF warehouse of star schema which feeds to
cube?
please explin the pros and cons of 3NF and denormalized DW.
thanks in advance.
Zaim Raza.

Hi Zaim,
Take a look to this diagram:
1) Normally, 3NF schema is typical for ODS layer, which is simply used to fetch data from sources, generalize, prepare, cleanse data for upcoming load to data warehouse.
2) When it comes to DW layer (Data Warehouse), data modelers general challenge is to build historical data silo.
Star schema with slow changing facts and slow changing dimensions are partially suitable.
The DataVault and other similar specialized methods provides, in my opinion, wider possibility and flexibility.
3) Star schema is perfectly suitable for datamarts. SQL Server 2008 and higher contains numerous query analyzer improvements to handle such workload efficiently. SQL Server 2012 introduced column stored indexes, that makes possibility to
create robust star model datamarts with SQL Query performance comparable to MS OLAP.
So, your choice is:
1) Create solid, consistent DW solution
2) Create separate datamarts on top of DW for specific business needs.
3) Create necessary indexes, PK, FK key and statistics (of FK in fact tables) to help sql optimizer as much as possible.
4) Forget about approach of defining SSAS datasource view on top of 3NF (or any other DWH modeling method), since this is the way to performance and maintenance issues in the future.

Analyze command in a Data warehouse env

Similar Messages

Maybe you are looking for