ETL processing Performance and best practices

I have been tasked with enhancing an existing ETL process. The process includes dumping data from a flat file to staging tables and process records from the initial tables to the permanent table. The first step, extracting data from flat file to staging
tables is done by Biztalk, no problems here. The second part, processing records from staging tables and updating/inserting permanent tables is done in .Net. I find this process inefficient and prone to deadlocks because the code loads the data from the initial
tables(using stored procs) and loops through each record in .net and makes several subsequent calls to stored procedures to process data and then updates the record. I see a variety of problems here, the process is very chatty with the database which is a
big red flag. I need some opinions from ETL experts, so that I can convince my co-workers that this is not the best solution.
Anonymous

I'm not going to call myself an ETL expert, but you are right on the money that this is not an efficient way to work with the data. Indeed very chatty. Once you have the data in SQL Server - keep it there. (Well, if you are interacting with other data
source, it's a different game.)
Erland Sommarskog, SQL Server MVP, [email protected]

Similar Messages

Can anyone recommend tips and best practices for FrameMaker-to-RoboHelp migration ?

Hi. I'm planning a migration from FM (unstructured) to RH. I'd appreciate any tips and best practices for the migration process. (Note that at the moment I plan to import the FM documents into, not link to them from, RH.)
For example, my current FM files are presently not optimally "chunked", so that autoconverting FM file sections (based on, say, Header 1 paragraph layout) won't always result in an optimal topic set. I'm thinking of going through the FM docs and inserting dummy paragraphs with a tag somethike like "topic_break", placed in more appropriate locations that the existing headers. Then, during import to RH, I'd use the topic_break paragraph to demark the topics. Is this a good technique? Beyond paragraph-based import delineation, do you know of any guidelines for redrafting FM chapter file content into RH topics?
Also, are there any considerations/gotchas in the areas of text review workflow, multiple authoring, etc. after the migration? (I've not managed an ongoing RH doc project before, so any advice would be greatly appreciated.
Thanks in advance!
-Kurt
BTW, the main reason for the migration: Info is presently scattered in various (and way to many) PDF files. There's no global index. I'd like to make a RoboHelp HTML interface (probably WebHelp layout) so it can be a one-stop documentation shop for users.

Jeff
Fm may produce better output for your requirements but for many what Rh produces works just fine. My recent finding re Word converting images to JPG before import will mean a better experience for many.
Once Rh is set up, and it's not difficult, for many its printed documents will do the job. I would say try it and then judge.
See www.grainge.org for RoboHelp and Authoring tips
@petergrainge

FWSM interface monitoring and best practices documentation.

Hello everyone
I have a couple of questions regarding vlan interface monitoring and best practices specifically for this service module.
I couldn’t find a suggestion or guideline as for how to define a VLAN interface on a management station. The FWSM total throughput is 5.5gbs and the interfaces are mapped to vlans carried on trunks over 10gb etherchannels. Is there a common practice, or past experience, to set some physical parameters to logical interfaces? "show interface" command states BW as unknown.
Additionally, do any of you have a document addressing best practices for FWSM? I have this for other platforms and general recommendations based on newer ASA versions but nothing related to FWSM.
Thanks a lot!
Regards
Guido

Hi,
If you are looking for some more command to check for the throughput through the module:-
show firewall module <number> traffic
Also , I think as this is End of life , you might have to check for some old documentation from Cisco on the best practices.
http://www.cisco.com/c/en/us/products/collateral/switches/catalyst-6500-series-switches/prod_white_paper0900aecd805457cc.html
https://supportforums.cisco.com/discussion/11540181/ask-expertconfiguring-troubleshooting-best-practices-asa-fwsm-failover
Thanks and Regards,
Vibhor Amrodia

EP Naming Conventions and Best Practices

Hi all
Please provide me EP Naming Conventions and Best Practices documents
Thanks
Vijay

Hi Daya,
For SAP Best Practices for Portal, read thru these documents :-
[SAP Best Practices for Portal - doc 1 |http://help.sap.com/bp_epv170/EP_US/HTML/Portals_intro.htm]
[SAP Best Practices for EP |http://www.sap.com/services/pdf/BWP_SAP_Best_Practices_for_Enterprise_Portals.pdf]
And for Naming Conventions in EP, please go through these two links:-
[Naming Conventions in EP|naming standards;
[EP Naming Conventions |https://websmp210.sap-ag.de/~sapidb/011000358700005875762004E]
Hope this helps,
Regards,
Shailesh
Edited by: Shailesh Kumar Nagar on May 30, 2008 4:09 PM

EP Naming Conventions and Best Practices documents

Hi all
Please provide me EP Naming Conventions and Best Practices documents
Thanks
Vijay

Hi,
Check this:
Best Practices in EP
http://help.sap.com/saphelp_nw04/helpdata/en/43/6d9b6eaccc7101e10000000a1553f7/frameset.htm
Regards,
Praveen Gudapati

Adobe LiveCycle Process Management Overview and Best Practices

To get familiar with the best practices of process management watch this recording of a webinar hosted by Avoka Technologies.

To get familiar with the best practices of process management watch this recording of a webinar hosted by Avoka Technologies.

High performance website, best practices?

Hello all,
I'm working on a system with a web service/Hibernate (Java code linking web pages to the database) front-end which is expected to process up to 12,000 transactions per second with zero downtime. We're at the development/demonstration stage for phase 1 functionality but I don't think there has been much of a planning stage to make sure the metrics can be reached. I've not worked on a system with this many transactions before and I've always had downtime where database and application patches can be applied. I've had a quick look into the technologies available for Oracle High Availability and, since we are using 11g with RAC I know we have at least paid for them even if we're not using them.
There isn't a lot of programming logic in the system (no 1000-line packages accessing dozens of tables, in fact there are only about 20 tables) and there are very few updates. It's mostly inserts and small queries getting a piece of data for use in the front-end.
What I'd like to know is the best practice development for this type of system. As far as I know, the only person on the team with authority and an opinion on technical architecture wants to use the database as a store of data and move all the logic into the front-end. The thinking behind this is
1) it's easier to load balance or increase capacity in the front-end
2) the database will be the bottleneck in the system so should have as little demand placed on it as possible
3) pl/sql packages cannot always be updated without downtime (I'm not sure if this is true or if it can be managed -- the concern is that packages become invalid whilst the upgrade script is running -- or how updates in the front-end could be managed any better, especially if they need to be coordinated with changes to tables)
4) reference tables can be cached in the front-end to cut down on data access
Views please!

Couple of thoughts
- Zero downtime (Or at least very close to it) can be acheivable, but there is a rapidly diminishing return on cost in squeezing the last few percent out of uptime, if you can have the odd planned maintenance window then you can make your life a lot easier.
-If you decide ahead of time that the database is going to be the bottleneck, then it probably will be!
-I can understand where they are coming from with their thinking, the web tier will be easier to scale out, but eventually all that data still needs to get into the database. The database layer is where you need to start the design to get the most out of the platform. Can it handle 12,000 TPS? If it can't then it doesn't matter how quickly your application layer can service those requests.
-If this is mainly inserts, could these be queued in somesort of message queue? Allow the clients to get an instant (Well almost) 'Done' confirmation, where the database will be eventually consistent? Very much depends on what this is being used for of course but this could help with both the performance (At east the 'percieved' performance) and the uptime requirement.
- Caching fairly static data sounds like a good idea to me.
Carl

Oracle EPM 11.1.2.3 Hardware Requirement and best practice

Hello,
Could anyone help me find the Minimum Hardware Requirement for the Oracle EPM 11.1.2.3 on the Windows 2008R2 Server? What's best practice to get the optimum performance after the default configuration i.e. modify or look for the entries that need to be modified based on the hardware resource (CPU and RAM) and number of users accessing the Hyperion reports/files.
Thanks,
Yash

Why would you want to know the minimum requirements, surely it would be best to have optimal server specs, the nearest you are going to get is contained in the standard deployment guide - About Standard Deployment
Saying that it is not possibly to provide stats based on nothing, you would really need to undertake a technical design review/workshop as there many topics to cover before coming up with server information.
Cheers
John

Large heap sizes, GC tuning and best practices

Hello,
I’ve read in the best practices document that the recommended heap size (without JVM GC tuning) is 512M. It also indicates that GC tuning, object number/size, and hardware configuration play a significant role in determining what the optimal heap size is. My particular Coherence implementation contains a static data set that is fairly large in size (150-300k per entry). Our hardware platform contains 16G physical RAM available and we want to dedicate at least 1G to the system and 512M for a proxy instance (localstorage=false) which our TCP*Extend clients will use to connect to the cache. This leaves us 14.5G available for our cache instances.
We’re trying to determine the proper balance of heap size vs num of cache instances and have ended up with the following configuration. 7 cache instances per node running with 2G heap using a high-units value of 1.5G. Our testing has shown that using the Concurrent Mark Sweep GC algorithm warrants no substantial GC pauses and we have also done testing with a heap fragmentation inducer (http://www.azulsystems.com/e2e/docs/Fragger.java) which also shows no significant pauses.
The reason we opted for a larger heap was to cut down on the cluster communication and context switching overhead as well as the administration challenges that 28 separate JVM processes would create. Although our testing has shown successful results, my concern here is that we’re straying from the best practices recommendations and I’m wondering what others thoughts are about the configuration outlined above.
Thanks,
- Allen Bettilyon

Hello,
I’ve read in the best practices document that the recommended heap size (without JVM GC tuning) is 512M. It also indicates that GC tuning, object number/size, and hardware configuration play a significant role in determining what the optimal heap size is. My particular Coherence implementation contains a static data set that is fairly large in size (150-300k per entry). Our hardware platform contains 16G physical RAM available and we want to dedicate at least 1G to the system and 512M for a proxy instance (localstorage=false) which our TCP*Extend clients will use to connect to the cache. This leaves us 14.5G available for our cache instances.
We’re trying to determine the proper balance of heap size vs num of cache instances and have ended up with the following configuration. 7 cache instances per node running with 2G heap using a high-units value of 1.5G. Our testing has shown that using the Concurrent Mark Sweep GC algorithm warrants no substantial GC pauses and we have also done testing with a heap fragmentation inducer (http://www.azulsystems.com/e2e/docs/Fragger.java) which also shows no significant pauses.
The reason we opted for a larger heap was to cut down on the cluster communication and context switching overhead as well as the administration challenges that 28 separate JVM processes would create. Although our testing has shown successful results, my concern here is that we’re straying from the best practices recommendations and I’m wondering what others thoughts are about the configuration outlined above.
Thanks,
- Allen Bettilyon

Performance Tuning Best Practices/Recommendations

We recently went like on a ECC6.0 system. We have 3 application servers that are showing a lot of swaps in ST02.
Our buffers were initially set based off of SAP Go-Live Analysis checks. But it is becoming apparent that we will need to enlarge some of our buffers.
Are there any tips and tricks I should be aware of when tuning the buffers?
Does making them too big decrease performance?
I am just wanting to adjust the system to allow the best performance possible, so any recommendations or best practices would be appreciated.
Thanks.

Hi,
Please increase the value of parameters in small increments. If you set the parameters too large, memory is wasted. This can result in paging if too much memory is taken from the operating system and allocated to SAP buffers.
For example, if abap/buffersize is 500000, change this to 600000 or 650000. Then analyze the performance and adjust parameters accordingly.
Please check out <a href="http://help.sap.com/saphelp_nw04/helpdata/en/c4/3a6f4e505211d189550000e829fbbd/content.htm">this link</a> and all embedded links. The documentation provided there is fairly elaborate. Moreover, the thread mentioned by Prince Jose is very good for a guideline as well.
Best regards

Installation and best practices

I saw this link being discussed in a thread about "Live Type," but I think it needs a thread of its own, so I'm going to begin it here.
http://support.apple.com/kb/HT4722?viewlocale=en_US
I have Motion 4 (and everything else with FCS 2, of course), and just purchased Motion 5 via the App Store. (I'm sure I'll be buying FCP X also at some point, but decided to hold off for now.)
When I was reading the "Live Type" thread there was some discussion about Motion 5 overwriting Motion 4 projects or something like that, so I started freaking out. I've opened both 5 and 4, but am closing them until I understand what's going on.
Since I purchased Motion 5 from the App Store, I'm just under the assumption that my Mac took care of everything correctly. I see that Motion 4 resides in the FCS folder and Motion 5 is a stand-alone in the Applications folder.
So I guess my questions are these ...
1) What's so important about having FCS 2009 on a separate drive? I have a couple other internal drives with more than enough lots and lots of free space, so that isn't an issue for me. I just wonder why this is a "best practice." The two programs CAN share the same drive ...the link says so.
2) I supppose that I'll let 4 and 5 reside side by side for now. How do I make sure Motion 5 won't screw up my Motion 4 projects? (My hunch is that you can open a M4 project in M5 and do a "save as" ...this will create an M5 version and leave the M4 alone. Am I correct about that?) Maybe the answer to this is related to my first question.
3) I want to make sure I'm not missing something my the words "startup disk." Although I have 3 drives in my MacPro, only is a "startup disk" ...the other two are for storage. If I move everything from FCS to a different internal drive, does it make any difference that the destination drive is NOT a starup disk.
**I'm gonna separate this part out a bit because it may or may not be related to the previous quesitons.**
I noticed the Motion 5 came with very little content and only a few templates, but I read in another thread that additional content/t can be downloaded free when I do an update. I also read that thread that this free content is pretty much the same as the content that I have with Motion 4.
1) If I download this additional content (which is basically the same as what's in Motion 4), will I just have a duplicate of all that material?
2) Could this be part of the reason that Apple reccomends that Motion 5 be on a separate drive ...so that the content and templates don't get mixed up?
--Just a couple months ago, I finally got around to cleaning out all the FCS content, throwing away duplicates and organinzing thins properly. If I've got to got through this process again, I want to do it correclty the first time.

When you install Motion 5 or FCP X, all your Final Cut Studio apps are moved into a folder called Final Cut Studio. This is because you can't have two apps with the same name in the same folder. I'm running them both on the same drive, no problems.
Motion 5 does not automatically overwrite any Motion project files, that is hogwash. When you open a v.4 file into 5, it will ask if you want to convert the original to 5, or open a copy called Untilted and make it a v.5 project. Very simple. If you're super paranoid, Duplicate the original Motion project file, and open the copy into v.5 to be extra safe. Remember once a project file is version 5, it can't be opened into previous versions.
You can't launch both at the same time, duh.
The System Drive, or OS drive, is just that, the drive your operating system is installed on. All applicactions should be on that drive, and NOT be moved to other drives. Especiall pro apps like these. Move them to a non-OS drive, and you'll regret it. Trust me.
Yes, run Software Update (Apple Menu) and you'll get additional content for Motion 5 that v.4 doesn't have. It won't be any problem with space on your drive. That stuff takes up very little space.
Apple recommends two different OS drives, or partitions, only to avoid an overwhelming flood of people screaming "What happened to my Final Cut Studio legacy apps?" and other such problems. Hey, they're put into a new folder, that's all, breath...
If you're having excessive problems, you may not have hardware up to speed. CPU speed is needed, at least 8GB RAM (if not 12 or 16 for serious work), but your graphics card needs to really be up to speed. iMacs and MacBook Pros barely meet up, and will work well. Mac Pros can get much more powerful graphics cards. Airs and Minis should be avoided like the plauge.
After checking hardware, be sure to run Disk Utility to "repair" all drives. Then, get the free app "Preference Manager" by Digital Rebellion (dot com) to safely trash your app's preference files, which resets it, and can fix a lot of current bugs.

Tips and best practices for translating C into LabVIEW? SERIOUS newbie...

I need to translate a C function into LabVIEW. This will be my *first* LabVIEW project. I've been reading some tutorials, and I'm still struggling to get my brain out of "C/C++ mode" and learn the LabVIEW paradigms.
Structurally, the function that I need to translate gets called from a while-loop and performs a bunch of mathematical calculations.
The basic layout is something like this (this obviously isn't the actual code, it just illustrates the general flow control and techniques that it uses).
struct Params
// About 20 int and float parameters
int CalculateMetrics(Params *pParams,
float input1, float input2 [etc])
int errorCode = 0;
float metric1;
float metric2;
float metric3;
// Do some math like:
metric1 = input1 * (pParams->someParam - 5);
metric2 = metric1 + (input2 / pParams->someOtherParam);
// Tons more simple math
// A couple for-loops
if (metric1 < metric2)
// manipulate metric1 somehow
else
// set some kind of error code
errorCode = ...;
if (!errorCode)
metric3 = metric1 + pow(metric2, 3);
// More math...
// etc...
// update some external global metrics variables
return errorCode;
I'm still too green to understand whether or not a function like this can translate cleanly from C to LabVIEW, or whether the LabVIEW version will have significant structural differences.
Are there any general tips or "best practices" for this kind of task?
Here are some more specific questions:
Most of the LabVIEW examples that I've seen (at least at the beginner level) seem to heavily rely on using the front panel controls to provide inputs to functions. How do I build a VI where the input arguments(input1, input2, etc) come as numbers, and aren't tied to dials or buttons on the front panel?
The structure of the C function seems to rely heavily on the use of stack variables like metric1 and metric2 in order to perform calculations. It seems like creating temporary "stack" variables in LabVIEW is possible, but frowned upon. Is it possible to keep this general structure in the LabVIEW VI without making the code a mess?
Thanks guys!

There's already a couple of good answers, but to add to #1:
You're clearly looking for a typical C-function. Any VI that doesn't require front panel opening (user interaction) can be such a function.
If the front panel is never opened the controls are merely used to send data to the VI, much like (identical to) the declaration of a C-function. The indicators can/will be return values.
Which controls and indicators are used to sending data in and out of a VI is almost too easy; Click the icon of the front panel (top right) and show connector, click which control/indicator goes where. Done. That's your functions declaration.
Basically one function is one VI, although you might want to split it even further, dont create 3k*3k pixel diagrams.
Depending on the amount of calculations done in your If-Thens they might be sub vi's of their own.
/Y
LabVIEW 8.2 - 2014
"Only dead fish swim downstream" - "My life for Kudos!" - "Dumb people repeat old mistakes - smart ones create new ones."
G# - Free award winning reference based OOP for LV

APO - Aggregate Forecasting and Best Practices

I am interested in options to save time and improve accuracy of forecasts in APO Demand Planning. Can anyone help me by recommending best practices, processes and design that they have seen work well?
We currently forecast at the Product level (detailed). We are considering changing that to the Product Family level. If you have done this, please reply.

Hello Dan -
Doing it at the Product level is very detailed (but it depends on the number of SKU's Available and that are to be forecasted).
Here for me on my project we have about sample size 5000 finished goods to start with and forecasting at that minute level wont help. For me i have classified a product group level where i have linked all the similar FGs. tht way when you are working with the product group you are able to be a little from assertive in your forecasting. After that you can use proportional factors that will help you in allocating the necessary forecast down to product level. This way you are happy as well as the management is happy (high level report) as they dont have to go thru all the data that is not even necessary for them to see.
Hope this helps.
Regards,
Suresh Garg

Static NAT refresh and best practice with inside and DMZ

I've been out of the firewall game for a while and now have been re-tasked with some configuration, both updating ASA's to 8.4 and making some new services avaiable. So I've dug into refreshing my knowledge of NAT operation and have a question based on best practice and would like a sanity check.
This is a very basic, I apologize in advance. I just need the cobwebs dusted off.
The scenario is this: If I have an SQL server on an inside network that a DMZ host needs access to, is it best to present the inside (SQL server in this example) IP via static to the DMZ or the DMZ (SQL client in this example) with static to the inside?
I think its to present the higher security resource into the lower security network. For example, when a service from the DMZ is made available to the outside/public, the real IP from the higher security interface is mapped to the lower.
So I would think the same would apply to the inside/DMZ, making 'static (inside,dmz)' the 'proper' method for the pre 8.3 and this for 8.3 and up:
object network insideSQLIP
host xx.xx.xx.xx
nat (inside,dmz) static yy.yy.yy.yy
Am I on the right track?

Hello Rgnelson,
It is not related to the security level of the zone, instead, it is how should the behavior be, what I mean is, for
nat (inside,dmz) static yy.yy.yy.yy
- Any traffic hitting translated address yy.yy.yy.yy on the dmz zone should be re-directed to the host xx.xx.xx.xx on the inside interface.
- Traffic initiated from the real host xx.xx.xx.xx should be translated to yy.yy.yy.yy if the hosts accesses any resources on the DMZ Interface.
If you reverse it to (dmz,inside) the behavior will be reversed as well, so If you need to translate the address from the DMZ interface going to the inside interface you should use the (dmz,inside).
For your case I would say what is common, since the server is in the INSIDE zone, you should configure
object network insideSQLIP
host xx.xx.xx.xx
nat (inside,dmz) static yy.yy.yy.yy
At this time, users from the DMZ zone will be able to access the server using the yy.yy.yy.yy IP Address.
HTH
AMatahen

Not a question, but a suggestion on updating software and best practice (Adobe we need to create stickies for the forums)

Lots of you are hitting the brick wall in updating, and end result is non-recoverable project. In a production environment and with projects due, it's best that you never update while in the middle of projects. Wait until you have a day or two of down time, then test.
For best practice, get into the habit of saving off your projects to a new name by incremental versions. i.e. "project_name_v001", v002, etc.
Before you close a project, save it, then save it again to a new version. In this way you'll always have two copies and will not loose the entire project. Most projects crash upon opening (at least in my experience).
At the end of the day, copy off your current project to an external drive. I have a 1TB USB3 drive for this purpose, but you can just as easily save off just the PPro, AE and PS files to a stick. If the video corrupts, you can always re-ingest.
Which leads us to the next tip: never clear off your cards or wipe the tapes until the project is archived. Always cheaper to buy more memory than recouping lost hours of work, and your sanity.
I've been doing this for over a decade and the number of projects I've lost? Zero. Have I crashed? Oh, yeah. But I just open the previous version, save a new one and resume the edit.

Ctrl + B to show the Top Menu
View > Show Sidebar
View > Show Staus Bar
Deactivate Search Entire Library to speed things up.
This should make managing your iPhone the same as it was before.

ETL processing Performance and best practices

Similar Messages

Maybe you are looking for