Data Warehouse using MSSQL - SSIS : Installation best practices

Hi All,
I am working on a MSSQL - 2008 R2 , based data warehouse building. The requirement is to read source data from files, put it in stage database and perform data cleansing etc .. and then move the
data to data warehouse db .. Now the question is about the required number of physical servers and in which server which component (MSSQL , SSIS ) of MSSQL should be installed based on any best practices:
Store Source files --> Stage database --> data warehouse db
The data volumne will be high ( per day 20 - 30 k transactions) ... Please suggest
Thank you
MSSQL.Arc

Microsoft documentation: "Use a Reference Architecture to Build An Optimal Warehouse
Microsoft SQL Server 2012 Fast Track is a reference architecture data warehouse solution giving you a step-by-step guide to build a balanced hardware...Microsoft SQL Server 2012 Fast Track is a reference architecture
data warehouse solution giving you a step-by-step guide to build a balanced hardware configuration and the exact software setup.
Step-by-step instructions on what hardware to buy and how to put the server together.
Setup instructions on installing the software and all the specific settings to configure.
Pre-certified by Microsoft and industry hardware partners for the most optimal hardware and software configuration."
LINK:
https://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/reference-architecture.aspx
Kalman Toth Database & OLAP Architect
IPAD SELECT Query Video Tutorial 3.5 Hours
New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012

Similar Messages

"Installation best practices." Really?

"Install Final Cut Pro X, Motion 5, or Compressor 4 on a new partition - The partition must be large enough to contain all the files required by the version of Mac OS X you are installing, the applications you install, and enough room for projects and media…"
As a FCS3 user, if you were to purchase an OS Lion Mac, what would your "Installation best practices" be? It seems the above recommendation is not taking into consideration FCS3s abrupt death, or my desire to continue to use it for a very long time.
Wouldn't the best practice be to install FCS3 on a separate partition with an OS that you never, ever update? Also, there doesn't appear to be any value added to FCS with Lion. That's why I would be inclined to partition FCS3 with Snow Leopard -- but I'm really just guessing after being thrown off a cliff without a parachute.
Partitioning… does this mean I'll need to restart my computer to use FCS? What about my other "applications"? Will I be able to run Adobe Creative Suite off the other partition, or is the "best practice" to install a duplicate of every single application I own on the FCS partition?
Note: This is not to say I'll never embrace FCX. But paying (with time & money) to be a beta tester just isn't gonna happen. If it's as easy to use as claimed, I'm not falling behind, as has been suggested by some. I'm just taking a pass on the early adopter frustration.

Okay, but are you not concerned with future OS updates that may render FCS3 useless? Perhaps our needs are different, but I want and need FCS3 to continue to work in the future.
That "best practices" link up at the top of this page is there for a reason, and it says "partition." What it doesn't say is why, and that's really disappointing and concerning. It's a little late in the game, but I would prefer Apple walk like a man and lay it on the line; the good, the bad, and the ugly.
I'm glad to hear Lion is working okay for you!

Updating of data in APO DP - what is 'best practice'?

I am using APO DP V5.
I am using XI as an integration broker to receive data from external sources which is to be loaded into APO DP.
There are of course different ways of doing this, for example:
1. From XI, pass an XML message which then loads a delta queue in APO BW, loads an InfoCube, which is then used to load data into an APO planning area
2. From XI, pass the data into BAPI parameters, which are then used to directly update data in an APO planning area via a BAPI call (eg BAPI PlanningBookAPS)
What is the recommended 'best practice' here?
Thanks,
Bob Austin

Hi Bob,
I do not have experience in SCM 5.0 and XI but shall give my views from general experience.
What kind of data you are loading in APO DP is the driver for the two ways mentioned by you.
Traditionally (older versions of SCM/APO and non XI interface) Sales History is captured into an Infocube and then loaded to DP Planning Area. The reason - you normally use this Infocube for generation of CVCs which is the Master Data in APO DP. Moreover you will use this data to generate Proportional Factors for disaggregation of Statistically generated Forecast at aggregate level to the lowest level. So there is a requirement for loading Historical data from external systems to APO DP via an Infocube.
The other kind of data would be market and sales forecast. This is a timeseries data which need to be directly loaded to the Planning Area for immediate Interactive Planning. In that case rather than putting it in an Infocube and then periodically loading the data from the Infocube to the Planning Area, its better to load the data directly in the Planning Area. There the BAPI will be a good option.
Hope this gives some ideas to your query.
Thanks,
Somnath

Client on Server installation best practice

Hi all,
I wonder on this subject, searched and found nothing relevant, so I ask here :
Is there any best practice/state of the art when you have a client application installed on the same machine as the database ?
I know the client app use the server binaries, but must I avoid it ?
Should I install a Oracle client home and parameter the client app to use the client libraries ?
In 11g there is no more changeperm.sh anymore, doest-il prove Oracle agrees to have client apps using server libraries ?
Precision : I'm on AIX 6 (or 7) + Oracle 11g.
Client app will be an ETL tool - explaining why it is running on DB machine.

GReboute wrote:
EdStevens wrote:
Given the premise "+*when*+ you have a client application installed on the same machine as the database", I'd say you are already violating "best practice".So I deduce from what you wrote, that you're absolutely against coexisting client app and DB server, which I understand, and usually agree.Then you deduce incorrectly. I'm not saying there can't be a justifiable reason for having the app live on the same box, but as a general rule, it should be avoided. It is generally not considered "best practice".
But in my case, should I load or extract 100s millions of rows, then GB flow through the network, with possible disconnection issues, although I could have done it locally ?Your potentially extenuating circumstances were not revealed until this architecture was questioned. We can only respond to what we see.
The answer I'm seeking is a bit more elaborate than "shouldn't do that".
By the way, CPU or Memory resources shouldn't be an issue, as we are running on a strong P780.

Can UCP(Utility Control Point) and MDW(Management Data Warehouse) use together?

Hi,
Can a server join a MDW and UCP at the same time? It seems that after i enrolled a instance which has joined a MDW, the collection data uploaded to UCP data warehouse only.
So can UCP use exist MDW data directly?
thanks.

Hi mark.gao,
UCP system uses the Data Collector feature from SQL Server version 2008.
If you’ve implemented the Management Data Warehouse feature in SQL Server 2008, you’ll either have to break that relationship and start pointing that data to your UCP instance, or you won’t be able to monitor it with this utility.
When enrolling an instances in UCP that is already using MDW, the UCP enrollment routine will change all the collections to go to the sysutility_mdw.
We could change the collection set to send data, including UCP data to our MDW database , the UCP utility on the manager would not see the MDW database. So in personally, UCP not use the MDW data directly.
Thanks,
Sofiya Li
Sofiya Li
TechNet Community Support

Can one build a data warehouse using SQL rather than Warehouse Builder?

I would like to build a data warehouse purely using SQL statements. Where can I find the data warehouse extension of SQL statements?

I am exploring the internal workings of Warehouse Builder.
I have written a SQL script to generate sample data to be inserted into tables, then write SQL script to do Extraction, Transformation and Loading using MERGE,, GROUP BY CUBE, DECODE, etc.
If anyone has any experience of just using SQL to perform ETL, would you share your expeience here? Thanks.

Re-installation "best practices"?

well..... Bridge has gone bad..... and I find that I need to reinstall the entire suite to install Bridge..... (feel free to fill in the blanks)
Does a new installation uninstall before installing? are there any best practices for a do-over?
thanks

sorry, it's CS5
The problem that I'm having is with Bridge and unfortunately the recommendation is to uninstall first but Bridge seems to be one of the components without an uninstaller, or individual installer. So after reinstalling I seem to have the same problems. I posted here in the PS forum because it seems that the installer is linked to the PS install.
regards.

Regarding REFRESHING of Data in Data warehouse using DAC Incremental approa

My client is planning to move from Discoverer to OBIA but before that we need some answers.
1) My client needs the data to be refreshed every hour (incremental load using DAC) because they are using lot of real time data.
We don't have much updated data( e.g 10 invoices in an hour + some other). How much time it usually takes to refresh those tables in Data wareshouse using DAC?
2) While the table is getting refreshed can we use that table to generate a report? If yes, what is the state of data? Stale or incorrect(undefined)?
3) How does refresh of Fin analytics work? Is it one module at a time or it treats all 3 modules (GL, AR and AP) as a single unit of refresh?
I would really appreciate if I can get an answer for all the questions.
Thank You,

Here you go for answers:
1) Shouldn't be much problem for a such small amt of data. All depends on ur execution plan in DAC which can always be created as new and can be customized to load data for only those tables...(Star Schema) ---- Approx 15-20 mins as it does so many things apart from loading table.
2) Report in OBIEE will give previous data as I believe Cache will be (Shud be) turned on. You will get the new data in reports after the refresh is complete and cache is cleared using various methods ( Event Polling preferred)
3) Again for Fin Analytics or any other module, you will have OOTB plans. But you can create ur new plans and execute. GL, AR, AP are also provided seperate..
Hope this answers your question...You will get to know more which going through Oracle docs...particular for DAC

Installation Best Practice

Hi Guys
I just bought the Master Collection upgrade from CS5 to CS 5.5 (as part of the deal for auto upgrade to CS6). Are there any best practices I should observe for installing the update to make sure it doesn't gum up or fall over or whatever (or should I just whack the disc in and let it run)???
We're running Win7 SP1 64bit.
Regards,
Graham

Also, as Steve mentioned above, you can Turn off UAC, anti-virus program just to be sure that any of your security settings won't interrupt the application.
Here is a quick guide to Restart Windows in a modified mode | Windows 7, Vista -
http://helpx.adobe.com/x-productkb/global/restart-windows-modified-mode-windows.html
Enjoy!

Using XML with Flex - Best Practice Question

Hi
I am using an XML file as a dataProvider for my Flex
application.
My application is quite large and is being fed a lot of data
– therefore the XML file that I am using is also quite large.
I have read some tutorials and looked thorough some online
examples and am just after a little advice. My application is
working, but I am not sure if I have gone about setting and using
my data provider in the best possible (most efficient) way.
I am basically after some advice as to weather I am going
about using (accessing) my XML and populating my Flex application
is the best / most efficient way???
My application consists of the main application (MXML) file
and also additional AS files / components.
I am setting up my connection to my XML file within my main
application file using HTTPService :
<mx:HTTPService
id="myResults"
url="
http://localhost/myFlexDataProvider.xml"
resultFormat="e4x"
result="myResultHandler(event)" />
and handling my results with the following function:
public function myResultHandler(event:ResultEvent):void
myDataFeed = event.result as XML;
within my application I am setting my variable values by
firstly delacring them:
public var fName:String;
public var lName:String;
public var postCode:string;
public var telNum:int;
And then, giving them a value by “drilling” into
the XML, E;g:
fName = myDataFeed.employeeDetails.contactDetails.firstName;
lName = myDataFeed.employeeDetails.contactDetails.lastName;
postCode =
myDataFeed.employeeDetails.contactDetails.address.postcode;
telNum = myDataFeed.employeeDetails.contactDetails.postcode;
etc…
Therefore, for any of my external (components in a different
AS file) components, I am therefore referencing there values using
Application:
import mx.core.Application;
And setting the values / variables within the AS components
as follows:
public var fName:String;
public var lName:String;
fName =
Application.application.myDataFeed.employeeDetails.contactDetails.firstName;
lName =
Application.application.myDataFeed.employeeDetails.contactDetails.lastName;
As mentioned this method seems to work, however, is it the
best way to do it??? :
- Connect to my XML file
- Set up my application variables
- Give my variables values from the XML file ……
Bearing in mind that in this particular application there are
many variable that need to be set and there for a lot of lines of
code just setting up and assigning variables values from my XML
file.
Could someone Please advise me on this one????
Thanks a lot,
Jon.

I don't see any problem with that.
Your alternatives are to skip the instance variables and
query the XML directly. If you use the values in a lot of places,
then the Variables will be easier to use and maintain.
Also, instead of instance variables, you colld put the values
in an "associative array" (object/hashtable), or in a dictionary.
Tracy

ETL for Data warehouse - use view instead of transformations?

When populating staging tables, is anyone else using this approach of using a view - as opposed to transformations within your ssis dataflow ? I had not thought of this approach, but I suppose it results in the same goal - to get the wanted schema
for data flowing into the destination. I suppose it would be just a matter of using the view as your source - as opposed to the underlying table(s), followed by transformations before the destination?

Hi sb,
I would say that it depends. You want your load to be efficient and your want your load to be simple and easy to enhance later. Sometimes these goals can be conflicting, so you need to decide what's important for your implementation.
Regarding efficiency, you will typically be better off with a view as the filtering, lookups etc will be done at source, so less data transferred to your staging area. For example, the view might only ask for 12 of 25 columns in a source table, so
you will be bringing over, perhaps, half the amount of data. Another example, your view might join two tables at source, while another design option would bring over all of the larger table and perform a lookup (on the smaller table) for each record
of the larger table. This could be extremely inefficient if each lookup went back to source.
Regarding easy enhancements, in the first example, if you bring over all 25 columns, you might find it easier to add one of the, as yet, unused 13 columns. Regarding the second example above, with views, there is a risk that a new view will be created
for new requirements, resulting in multiple views importing overlapping data. You really only want to import each datum once, with no duplication. Note; duplication is unlikely if the views are essentially one view per logical table in the source
system.
I've sat on the fence a bit answering this question, but it really does depend, and it is a big question. What you need to do is understand the ramifications of the design you implement. Having qualified my response, I very often use views to
perform simple 1:1 mainipulation of the source data.
Hope that helps a little,
Richard

Using Delete Cascade a best practice?

Hi,
My current requirement need to delete specific records from parent and also from child tables.
So, DELETE CASCADE will be helpful.
I need to know whether it is a good practice to use that? I hope it is.
And also I think the performance will be better than deleting manually.
please throw me a light on this.
Thank you.

> However if you allow CASCADE DELETE, data in both child1 and child2 will be wiped out when deleting the parent key which may not be the intended behavior. Since it could cause unintended data loss, I would not use it if I am designing the table.
We need to be realistic about CASCADE DELETE. How do you audit trail all the rows deleted in related tables?
Assume you work for AdventureWorks Cycles and a manager is telling you: delete that old mountain bike from the product table. As soon as you do that, a month later another manager demands is back for some reporting.
The point is don't ever DELETE data. Move it over to history/audit table so you can provide it if somebody asks for it again (guaranteed someone will).
And my point: CASCADE DELETE should be part of a bigger db maintenance procedure. Therefore, I prefer stored procedure implementation with proper audit trail.
Audit trail example:
http://www.sqlusa.com/bestpractices2005/auditwithoutput/
Kalman Toth Database & OLAP Architect
SELECT Video Tutorials 4 Hours
New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012

Use of AC adapter - best practice?

I just purchased a MacBook Pro a few days ago and the shop assistant told me it was best to fully charge the battery and then use it until it was completely dead and then fully recharge it at which point I should take out the AC adapter. He basically told me that I should never leave the AC adapter in when the computer is fully charged even when I am using it in an office for several hours at a time.
I have read and executed the instructions regarding calibrating the battery and this makes sense to me, however I cannot find any articles supporting the statement that I should never leave the AC adapter in once the battery is fully charged.
Could you please tell me if I should leave it in or out?

Morten Twellmann wrote:
I just purchased a MacBook Pro a few days ago and the shop assistant told me it was best to fully charge the battery and then use it until it was completely dead and then fully recharge it at which point I should take out the AC adapter. He basically told me that I should never leave the AC adapter in when the computer is fully charged even when I am using it in an office for several hours at a time.
Complete and utter nonsense. In fact, except for when you do the battery calibration every few months, you are better off NOT deep cycling (ie, full charge/discharge) lithium ion batteries. By never using the computer with the AC adaptor in as you were told, you are forcing the computer to operate in a low energy consumption mode. The end result is lower processor/graphics performance. Using the computer on AC power, even when the battery is full, is not harmful to the battery.
http://electronics.howstuffworks.com/lithium-ion-battery2.htm
http://www.batteryuniversity.com/parttwo-34.htm
Could you please tell me if I should leave it in or out?
You can do whatever suits your needs/desire. The only thing to remember is that you should try and regularly use the battery - either use it on batter power on a regular basis, or, if this isn't always possible, perform the calibration as recommended by Apple.

Installation - Best Practices

Howdy everybody,
I want to know if is recommended that i install Oracle Database (XE, Enterprise, Standard) using oracle-validated, yum install (from internet) or using packages from DVD - Media ?
if DVD - Media, they are can send me the commands necessary to install?
for example:
for Oracle XE 11gR2:
rpm -Uvh libaio.1234.rpm
rpm -Uvh unixodbc.1234.rpm
what is command full to install?
and about Enterprise and Express, is necessary that I install all packages from dvd for Oracle Express? or only Enterprise?
Thanks everybody!
Att,
Lucas
Edited by: 1005247 on 18/05/2013 07:08

1005247 wrote:
all right... but there are many people that said is not good to use from yum install (internet), because if you install from internet, many packages is not necessary to oracle....
what you think?
What do I think?
http://www.youtube.com/watch?v=bufTna0WArc
The 'oracle-validated' package is actually just a specification of packages that are needed to support oracle. It will not download anything that is not needed to support oracle. If you would rather do rpm package installs from the distribution media, and thus enter into 'package dependency hell' -- well, knock yourself out.
And about oracle express, is necessary that you install all packages? similar is enterprise?What do the Installation Guides say?

PI UDS - Installation Best practices

Hi all:
Per the installation guide the only requirement is to have the PI API installed on the same box as the UDS.
I have the following questions on installing the UDS Framework & PI UDS. Appreciate your answers:
1) Can this be installed on the xMII server with along with the PI API/SDK?
2) Can this be installed on any machine with the PI API/SDK installed?
3) What operating systems does it work with and Is a server class machine a requirement?
4) If the only option is to install this on the PI server, are there memory or CPU requirements to runs this?
5) What is the expected network data packet size and frequency? Im assuming this is a background task that continually runs.

Srinivasan,
1.) I would highly recommend the native connections vs OLEDB is every case where you can get to the data you require from the native xMII UDS. The only reasons you would favor the OLEDB UDS is in cases where you can not retrieve the data from the native xMII UDS.
As for the instabilities, we feel that we have resolved all the issues in the upcoming release, but if you still find a situation where the xMII UDS fails, please send in a support case and the appropriate fixes will be done.
The current known issues with the 4.0.2.5 xMII OLEDB UDS are as follows.
- Unicode characters are not always correctly handled.
- Administrative modes are not accessible.
- Other smaller minor fixes.
- The connection string dialog is populated and encoded correctly.
2) Currently the new xMII UDSs are in Acceptance Testing and there is no access to the xMII OLEDB UDS. We are all trying very hard to get this through the process as quickly as possible, but with testing resource issues, it may still take some time.
Martin.

Data Warehouse using MSSQL - SSIS : Installation best practices

Similar Messages

Maybe you are looking for