Batch processing and parallelism

I have recently taken over a project that is a batch application that processes a number of reports. For the most part, the application is pretty solid from the perspective of what it needs to do. However, one of the goals of this application is to achieve good parallelism when running on a multi CPU system. The application does a large number of calculations for each report and each report is broken down into a series of data units. The threading model is such that only say 5 report threads are running with each report thread processing say 9 data units at a time. When the batch process executes on a 16-CPU Sun box running Solaris 8 and JDK 1.4.2, the application utilizes on average 1 to 2 CPUs with some spikes to around 5 or 8 CPUs. Additionally, the average CPU utilization hovers around 8% to 22%. Another oddity of the application is that when the system is processing the calculations, and not reading from the database, the CPU utilization drops rather increase. So goal of good parallelism is not too good right now.
There is a database involved in the app and one of the things that does concern me is that the DAOs are implemented oddly. For one thing, these DAO's are implemented as either Singletons or classes with all static methods. Some of these DAO's also have a number of synchronized methods. Each of the worker threads that process a piece of the report data does make calls to many of these static and single instance DAO's. Furthermore, there is what I'll call a "master DAO" that handles the logic of what work to process next and write the status of the completed work. This master DAO does not handle writing the results of the data processing. When each data unit completes, the "Master DAO" is called to update the status of the data unit and get the next group of data units to process for this report. This "Master DAO" is both completely static and every method is synchronized. Additionally, there are some classes that perform data calculations that are also implemented as singletons and their accessor methods are synchronized.
My gut is telling me that in order to achieve, having each thread call a singleton, or a series of static methods is not going to help you gain good parallelism. Being new to parallel systems, I am not sure that I am right in even looking there. Additionally, if my gut is right, I don't know quite how to articulate the reasons why this design will hinder parallelism. I am hoping that anyone with an experience is parallel system design in Java can lend some pointers here. I hope I have been able to be clear while trying not to reveal much of the finer details of the application :)

There is a database involved in the app and one of
the things that does concern me is that the DAOs are
implemented oddly. For one thing, these DAO's are
implemented as either Singletons or classes with all
static methods. Some of these DAO's also have a
number of synchronized methods. Each of the worker
threads that process a piece of the report data does
make calls to many of these static and single
instance DAO's. Furthermore, there is what I'll call
a "master DAO" that handles the logic of what work to
process next and write the status of the completed
work. This master DAO does not handle writing the
results of the data processing. When each data unit
completes, the "Master DAO" is called to update the
status of the data unit and get the next group of
data units to process for this report. This "Master
DAO" is both completely static and every method is
synchronized. Additionally, there are some classes
that perform data calculations that are also
implemented as singletons and their accessor methods
are synchronized. What I've quoted above suggests to me that what you are looking at may actually be good for parallel processing. It could also be a attempt that didn't come off completely.
You suggest that these synchronized methods do not promote parallelism. That is true but you have to consider what you hope to achieve from parallelism. If you have 8 threads all running the same query at the same time, what have you gained? More strain on the DB and the possiblility of inconistencies in the data.
For example:
Senario 1:
say you have a DAO retrieval that is synchronized. The query takes 20 seconds (for the sake of the example.) Thread A comes in and starts the retrieval. Thread B comes in and requests the same data 10 seconds later. It blocks because the method is synchronized. When Thread A's query finishes, the same data is given to Thread B almost instantly.
Senario 2:
The method that does the retrieval is not synchronized. When Thread B calls the method, it starts a new 20 second query against the DB.
Which one gets Thread B the data faster while using less resources?
The point is that it sounds like you have a bunch of queries where the results of those queries are bing used by different reports. It may be that the original authors set it up to fire off a bunch of queries and then start the threads that will build the reports. Obviously the threads cannot create the reports unless the data is there, so the synchrionization makes them wait for it. When the data gets back, the report thread can continue on to get the next piece of data it needs if that isn't back it waits there.
This is actually an effective way to manage parallelism. What you may be seeing is that the critical path of data retrieval must complete before the reports can be generated. The best you can do is retrieve the data in parallel and let the report writers run in parallel once the data the need is retrieved.
I think this is what was suggest above by matfud.

Similar Messages

Batch Processing and Putting Two files together?

Hello,
I'm trying to find out if there is a way, in Photoshop, to automate placing a logo file and border from another file into a set of photos? Basically, I have a folder of, let's say, 4x6 images, and I have a file that has two layers, a thin transparent border layer, and a layer housing the logo. I would like to find out if it's possible to automate the process where I can batch a lot of files to put this file (or the two layers) onto the original image, then save and close and go on to the next file. Any ideas how to accomplish this? Thanks!
Regards,
Dave

Here is a simple script I made a while back that allows you to place one of two different logo files on the image, depending whether the image is upright or horizontal in orientation.
All you need is to put you two logo files in a folder and tell the script which folder they are in. After that, when you run the script, it will place the appropriate logo file onto you image depending on the orientation. I used "C:\\MyLogoA.tif" and "C:\\MyLogoB.tif" for this script.
You can run this script from a batch process.
var doc = app.activeDocument; // This defines the active document
var width = doc.width.value; // This is the width of the original image
var height = doc.height.value; // This is the height of the original image
// Call the placeLogo function
if(width>height){
placeLogo("C:\\MyLogoA.tif");
}else{
placeLogo("C:\\MyLogoB.tif");
// This is the placeLogo function
function placeLogo(path)
// =======================================================
var id35 = charIDToTypeID( "Plc " );
var desc8 = new ActionDescriptor();
var id36 = charIDToTypeID( "null" );
desc8.putPath( id36, new File( path ) );
var id37 = charIDToTypeID( "FTcs" );
var id38 = charIDToTypeID( "QCSt" );
var id39 = charIDToTypeID( "Qcsa" );
desc8.putEnumerated( id37, id38, id39 );
var id40 = charIDToTypeID( "Ofst" );
var desc9 = new ActionDescriptor();
var id41 = charIDToTypeID( "Hrzn" );
var id42 = charIDToTypeID( "#Pxl" );
desc9.putUnitDouble( id41, id42, 0.000000 );
var id43 = charIDToTypeID( "Vrtc" );
var id44 = charIDToTypeID( "#Pxl" );
desc9.putUnitDouble( id43, id44, 0.000000 );
var id45 = charIDToTypeID( "Ofst" );
desc8.putObject( id40, id45, desc9 );
executeAction( id35, desc8, DialogModes.NO );
// =======================================================

Batch processing and replication

Oracle 11gr2 (11.2.0.3) Linux x86_64
I wanted to know if anyone has come up with a solution for replicating batch process data. Oracle recommends in the documentation (as a best practice) to not replicate batch processing data through streams, rather to run the batch process on the source and then on the dest database. If we cannot do that, what are our options for this?
Thanks all.

Anyone have any ideas/thoughts?

Batch processing and rendering multiple clips in SpeedGrade CC?

I'm new to SpeedGrade CC, just watched 2 hrs of Lynda training, and I'm just about ready to go. Before people jump on my question, let me walk through what my indended use will be.
Unlike most of the content / workflow that was discussed in the training, I'm not color grading a sequence of clips stitched together in a timeline, but multiple clips that have been pre-edited to length, that I want to apply the same color correction two. This will only be done to small groups of clips, maybe 4-5 at a time, but since I'm all about efficiencies, I wanted to ask what the best workflow for doing this is.
Let's assume that I've taken one of the clips and adjusted everything natively in Sg (no Dynamic link from Pr). I like where I ended up with the settings so I saved a .look preset file.
So what is the next best way to handle applying these settings to the other files? Creating mutliple, separate Sg projects doesn't seem the efficient way, and having to cue up each succesively for Render, equally as slow. In the lessons the instructor illuded to working with and processing "dailies", which I also assume would be achieved through a batch process, but that isn't covered.
I appreciate the advice!
Steve

Interesting ... process ... you have there. Hmmm. I can't think of any way you could work in Sg that isn't on a timeline. Whether made in PrPro or there in Sg(native) ... it's a video editing program, and that's done on a timeline. Plus the way the both PrPro and Sg are designed, you MUST define and name a project before you can start to work.
Now, other than where the working files for the project will be kept, you don't really have to complete the forms out in PrPro especially. After you give your project a name and say where it's files will be kept, you can simply skip the rest and when you create a new sequence & drop a clip onto it, the sequence settings will be set to match your footage.
Now ... do you have all one type footage (codec, frame size & rate) or different kinds, say some 1080p-24fps, some 720i-60fps, some 460p-29.976fps, that sort of thing?
You know, what I'm thinking ... might actually be the easiest. Create a project in PrPro ... and a new sequence for each type of footage. Use the media browser panel to import all your footage into the project panel ... drag & drop a few similar clips to a sequence, then DL that over to Sg (takes a couple seconds) to grade/look 'em. Save 'em back to PrPro, then render that sequence out. Then when you know you've got a good render, either delete the clips from that timeline & re-use it, or create a new one. Do your next group. Rinse and repeat, so to speak.
I take it you've no reason to save the sequences of graded clips past rendering them, so you should be able to use just the one "projects" and import folders as necessary, removing them as you will. You won't spend near any time with the "project" details, but the programs will be happy.
Again, as noted above you can either copy a grade to other clips on a sequence or put an "adjustment layer" over the clips of a sequence in PrPro (project panel: new item -> adjustment layer) and then grade that ... it will automatically be applied to all clips under it.
And before you ask again, there isn't any way to work a single clip without it being a "project" with a timeline. These aren't photoshop, where you can open a single image.
Neil

Batch processing and maplisteners -detecting when the maplisteners complete

Hi
I have a scenario.
Get data from data source 1 and load to cache (say datasource cache) . on this cache I have set maplistener that does some transformation puts the data to another cache. I want to start another process when data transformation is complete.
Since I have not implemented synchronous listener, I have no way to know when the transformation is complete (as we may not know when all threads for maplistener for the datasource cache completes.
Is there is a way to find out all the tranasformation is complete. (Please note that N record/object in a datasource may form one transformed record, hence counting may not be good ide)
It is possible to do an update a flag in each record in datasource cache and make another thread to check if all records are transformed start the process. But I would like hear from your experience if you have any other better solution, that I can leverage from coherence itself. (in otherways, If I sense some inactivity in tranformed cache, I can safely assume that the transformation process is over). Views welcome !!!!
regards
Ganesan

Hi Ganesan,
Why don't you fire off some events from the transformation threads when they finished?
You should be able to know how many transformation operations were to be done. When you get that many events that they completed, you are done.
Best regards,
Robert

How to batch processing and renaming of jpg's

Hello,
I am sort of new with PS CS5 Extended and I am trying to automate a bunch of JPG files by saving them as High Quality into another folder then add the following suffix _hr.jpg. Can this be possible ?
Scenario:
Source Folder: Original
Original File Name: DSC_xxxx.jpg
Target Folder: hr
Save As: DSC_xxxx copy_hr.jpg
Quality: 12
Thanks for your help,
G

Yes, use Bridge or Camera Raw.
Benjamin

Best practices for batch processing without SSIS

Hi,
The gist of this question is in general how should a C# client insert/update batches records using stored procedures. The ideas I can think of are:
1) create 1 SP with a parameter of type XML, and pass say 100 records at a time, on 1 thread. The SP reads the XML as a table and does a single INSERT.
2) create 1 SP with many parameters, that inserts 1 records. I can either build a big block of EXEC statements for say 100 records at a time, or call the SP 1 and a time, on 1 thread. Obviously this seems the slowest.
3) Parallel processing version of either of the above: Pass 100 records at a time via XML parameter, big block of EXEC statements, or 1 at a time, and use PLINQ to make multiple connections to the database.
The records will be fairly wide, substantial records.
Which scenario is likely to be fastest and avoid lock contention?
(We are doing batch processing and there is not a current SSIS infrastructure, so it's manual: fetch data, call web services, update batches. I need a batch strategy that doesn't involve SSIS - yet).
Thanks.

The "streaming" option you mention in your linked thread sounds interesting, is that a way to input millions of rows at once? Are they not staged into the tempdb?
The entire TVP is stored in tempdb before the query/proc is executed. The advantage of the streaming method is that it eliminates the need to load the entire TVP into memory on either the client or server. The rowset is streamed to the server
and SQL Server uses the insert bulk method is to store it in tempdb. Below is an example C# console app that streams 10M rows as a TVP.
using System;
using System.Data;
using System.Data.SqlClient;
using System.Collections;
using System.Collections.Generic;
using Microsoft.SqlServer.Server;
namespace ConsoleApplication1
class Program
static string connectionString = @"Data Source=.;Initial Catalog=MyDatabase;Integrated Security=SSPI;";
static void Main(string[] args)
using(var connection = new SqlConnection(connectionString))
using(var command = new SqlCommand("dbo.usp_tvp_test", connection))
command.Parameters.Add("@tvp", SqlDbType.Structured).Value = new Class1();
command.CommandType = CommandType.StoredProcedure;
connection.Open();
command.ExecuteNonQuery();
connection.Close();
class Class1 : IEnumerable<SqlDataRecord>
private SqlMetaData[] metaData = new SqlMetaData[1] { new SqlMetaData("col1", System.Data.SqlDbType.Int) };
public IEnumerator<SqlDataRecord> GetEnumerator()
for (int i = 0; i < 10000000; ++i)
var record = new SqlDataRecord(metaData);
record.SetInt32(0, i);
yield return record;
IEnumerator IEnumerable.GetEnumerator()
throw new NotImplementedException();
Dan Guzman, SQL Server MVP, http://www.dbdelta.com

Batch process to add Javascript to PDF files

Hi All,
I have written a small piece of Javascript for my PDF files. The idea is to add a date stamp to each page of the document before printing. To do this, I have added the following code to the "Document Will Print" action:
for (var pageNumber = 0; pageNumber < this.numPages; pageNumber++)
var dateStamp = this.addField("Date","text",pageNumber,[700,10,500,40]);
dateStamp.textSize=8;
dateStamp.value = "Date Printed: " + util.printd("dd/mmm/yyyy",new Date());
My question is this: Does anyone know of a way to batch process a whole directory (of around 600 PDF's) to insert my Javascript into the "Document Will Print" action of each file?
Many thanks for any information you may have.
Kind regards,
Aaron

> Can I just confirm a few things please? Firstly, should I be going into "Batch Sequences" -> "New Sequence" and selecting "Execute JavaScript" as my sequence type?
Yes, you are creating new batch sequence that will use JavaScript.
> My second question is, how can I insert my body of script into the variable "cScript"? I have quotation marks and other symbols that I imagine I will have to escape if I wish to do this?
You ca either use different quotation marks or us the JavaScript escape character '\' to insert quotation marks
Your will print code will only work for a full version of Acrobat and not Reader, because Reader will not allow the addition of fields. Also each time you print you will be creating duplicate copies of the field. So it might be better to add the form field only in the batch process and then just add the script to populate the date field in the WillPrint action.
// add form field to each page of the PDF
for (var pageNumber = 0; pageNumber < this.numPages; pageNumber++)
var dateStamp = this.addField("Date","text",pageNumber,[700,10,500,40]);
dateStamp.textSize=8;
this.setAction("WillPrint", "dateStamp.value = \"Date Printed: \" + util.printd(\"dd/mmm/yyyy\",\new Date());");

Please Help, Issues when batch processing Multi Frame images in Fireworks

I hope someone will know how to do this and tell me where I am going wrong.
Objective 1
I have a large number of images that all have the exact same number of frames (4) and are all exactly the same size that I want to get resized, cropped and water marked (when I say 'watermarked I mean have my websites logo pasted in a specificlocation on each frame, I have my websites logo as a seperate .png file which I copy from).
Current Process
I create a command which will crop the image then paste my companys url/log onto the first frame, move it to the correct location on the frame then copy it again and then paste it onto each of the other frames, the command then resizes the image tot he exact proportions I want.
I start a batch process and use my command, making sure that I also export from the .png to animated gif (and I edit the settings to make it 256 animated gif).
Error 1
The process described above resizes and crops my images however it does NOT put the watermark in the correct place, it seems to put it int he right place on the first frame and then in a different place on all the followingn frames thus giving the effect of the watermark jumping around. Which is obviously NOT what I want.
Question 1
Please let me know what process I should be following to correct the Error 1 above.
Objective & Question 2
I want to do exactly the same thing as shown above but this time the files have a varying number of frames (from 2 to 45 frames (or instances as CS4 seems to call them)). Is there a way to paste my logo to the to the exact same location on ALL frames?
Other information
I have tried WHXY Paste extension and I can not see how to use that to solve the issue. I have also tried 'Paste in Place' however I can not see how to use the Paste In Place extension as it is not appering in my list of commands.
Many thanks in advance for your help.
Andy

Andy, you could start a batch process which will do most of the things you are asking for. THe batch process can be done using Fireworks. The right way to start is going to Archivo / Procesar por Lotes and then follow all the different option this tool has to offer. Yes, I know I gave you the names in Spanish but that is the way I use Adobe Fireworks.
I hope this was usefull for you!
Best regards,
Frank Meier | Curso Arreglos Florales

Batch processing with an image overlay

Is it possible to do batch processing with an image overlay in FW CS4?
I'm trying to resize several hundred images and place rounded corners on them.
If this cannot be done in FW, does anyone know of another program that could accomplish this?

Hi Marje,
It's possible to include both resizing and image overlay in a custom Fireworks command that can be used in batch processing. To get started, you could check out this tutorial that deals with the first step.
That article describes how to perform image resize and overlay (in that case, a watermark), and then how to record the steps and turn them into a custom command that can be later used in batch processing.
Once you saved the custom command, click File >> Batch Process, and follow the steps below:
In the first window, select the images you want to process.
On the next screen, open the Commands dropdown menu and select the custom command you created (it'll probably be on the bottom of the list), and click the Add button to include it in the batch process list.
Finally, on the next screen select the location of the processed files, and optionally save the batch script for later use.
Good luck!

SAP R/3 vs XI Batch Processing

We are planning to move some of our existing interfaces between legacy systems & R/3 to be processed through XI. We would like to do this in such a manner that the business doesn't get impacted much and also keeping the project costs to minimum.
Current interfaces use batch processing in R/3 with 1000's of records sitting in a file ftp'd from legacy system and a pre-edit ABAP program will validate the data and posts transactions in R/3 using BDC sessions. Errors will be reported through some custom application built in ABAP. Users will process the BDC session and correct the errors to post the documents. In some cases we use BAPI's but when the BAPI errors out a BDC session is created for processing the errors.
We are on XI3.0 SP12 & R/3 4.6B
Can XI do all the stuff that R/3 does today in terms of batch processing and error correction by business user etc. What are the limitations and are there any workarounds to overcome them.

XI is not meant for that and cannot do that. XI is an integration engine and does not replace R/3 work.
What it means: XI can receive data and modify the data in mapping and deliver it to another system and invoke a process in case of R/3.
XI has 3 adapters for other SAP systems.
1. IDoc
2. RFC
3. XI Proxy
4. File Adapter (I dont think you want to create a file again, but one of my previous client used this option)
I guess you know about the first 2.
The third one is, when you receive the data from source system, modify it according to the requirements and pass it to an ABAP Class where you can write your own code.
Let me simplify this for your process.
When a file is received, XI can pick up the file and send it to R/3 System using XI proxy (in internal tables or structures) which will be executing a class. Within that class you can code to pass that data to your ABAP program with SUBMIT PROGRAM. This is just an example.
This cannot replace your BDC process, but can make it to do the same.
regards
Shravan

Bridge - Batch Processing has stopped working...help

I am using Photoshop for Windows in CS2. Have been for years. Work with batch processing and Web Gallery all the time...for years.
Batch Processing and Web Gallery stopped working (along with other Photoshop features). I upgraded to Windows 7, and photoshop/bridge/indesign, etc work once again.
Batch Processing and Web Gallery still don't work. When I select items Bridge and click "Tools", "Photoshop", "Web Photo Gallery" it clicks over to photoshop, as if the process is about to begin. But then Photoshop lies there doing nothing. No action menu pops up if I select "Batch" no "Web Photo Gallery" dialog box initiates if I select the gallery.
Any assistance, that can save me the $1000 on a CS upgrade, will be greatly apprecieated.
Thank you for your time.

Look in edit/preferences/scripts and make sure there is a checkmark for both Bridge and Photoshop. Without that they can't talk.

Acrobat Pro X batch processing different in Win 7

Greetings,
My company just completed moving everyone from Win XP to Win 7 (yes, I know, but better late than never). I regularily use Acrobat Pro X to batch process and password protect large numbers of a variety of documents (Word, Publisher, Excel and PowerPoint 2-up printouts.)
Under Windows XP the process was pretty straight forward:
1) Open Acrobat Pro
2) Secect "Batch Process" File -> Create -> Batch Create Multiple files... (a window appears).
3) Drag a large group of documents into the window (you can drag over 50 docs, it doesn't matter), then begin the process (which automatically walks through all the documents and PDFs them.)
4) When the process is complete, open up the Action Wizard File -> Action Wizard -> Select the appropriate action
5) A window appears, drag the PDFs into the window and go. All PDS quickly get the security applied.
Done
With Windows 7 the functionality is different (I really don't know if it is the operating system change or some type of policy change that I am unaware of). The process is far slower because I literally have to PDF and Apply Security to each document one at a time. The process goes like this:
Under Windows 7:
1) Select a group of documents, but NO MORE THAN 15.
2) Drag the documents over the "Acrobat Pro X" icon and launch them this way. Each document created a "temporary PDF". Multiple windows open up, stacked on top of each other.
3) Go to each window individually and first save the file (that way the file name is preserved), then go to the Action Wizard and apply the security. Then close the PDF.
4) Repeat this process for each open window (document).
5) Repeat as necessary until you have the several hundred documents processed.
Literally this is a "hands-on" process for each document. Is there a better way? Am I missing something in the Acrobat or Windows 7 settings?
If I try to batch process the old way under Windows 7 I get a series of error messages for each document. (I can't even get to the action wizard process.)
Any suggestions?
Is there a third party app that will work without having to administer it so much?
Thank you,
TPK

Hi Test Screen Name,
While reproducing the problem I realized I was in error as to how far in the sequence the problem occured. I actually do get as far as batch creating PDF,. The only difference there is that I can no longer "drag and drop" files in the batch create window. I have to use the "Add files..." command in the upper left of the batch create window.
So, the application batch creates the files. Afterward, I use the Action Wizard to batch "Password Protect" the files. It is during this command run that the error occures. (Note: I am trying to save over the old files by having them save to the same directory under the same name, just like I used to be able to do.) The error I get is:
Action Completed.
Saved to: \\HOME\path-to-original-files\
Warning/Errors
The file may be read-only, or another user may have it open. Please save the document in a different
Document: Name of document1.pdf
Output: Name of document1.pdf
The file may be read-only, or another user may have it open. Please save the document in a different
Document: Name of document2.pdf
Output: Name of document2.pdf
The error message loops through all the documents. I don't have the documents open or in use. By default they shouldn't be "read-only". This all did not occure when I previously used the application with Windows XP.
I have not yet tried saving them to a different directory. I will try that later today. (I didn't want to have a lot of versions of the same documents, it tends to be confusing.)
Thank you for your reply,
TPK

Architectural design for FTP batch processing

Hello gurus,
I would like your help in determining the design for the following.
We receive several HL7 messages as a text file and copied to a shared network folder. All these files are created into several different folders depending on the region, message type. We need to come up with a B2B process to read all the files from the netwrok folder using FTP (batch process) and translate if needed (depending on the scenario) and transfer the files over to other destination folder on the network (using FTP).
For this, we can create TPs with Generic FTP channel and this works without any issues. By doing this way, we need to create TP for each and every type of message which reads the files from its own specified directory location on the network based on the the polling interval.
My question is, instead of creating TPs for each and every type of file, is there a way by which I can write a common web service that reads the source files from the network and based on the type of the file route to the proper destination folders. If it is possible, I would like to know the architecture for accomplishing this task.
I really appreciate your kind help on this.
Thanks and regards,
Raghu

Hi Raghu,
Is it a B2B communication scenario?
By doing this way, we need to create TP for each and every type of message which reads the files from its own specified directory location on the network based on the the polling interval.Why cann't you have only one TP with multiple documents, channels and agreements?
My question is, instead of creating TPs for each and every type of file, is there a way by which I can write a common web service that reads the source files from the network and based on the type of the file route to the proper destination folders. If it is possible, I would like to know the architecture for accomplishing this task.Depends on your use case and products you want to use. You can very well use FTP adapter with BPEL and poll for files. Use DVM in composite to figure out the destination and send it there. You may use OSB if it is a typical routing case with heavy load and performance is a concern. You may use B2B as well here. So ultimately you need to figure out what you want and what tools you want to use.
Regards,
Anuj

OO Batch Model and optimised Java for batch???

Hi All,
I'm looking to see if there is any literature of OO models for batch processing and optimising of batch java.
Thoughts & comments welcome.........
I have an existing batch process running on a mainframe which is very successful. We would like to leverage this by building a similar batch process to run 'anywhere' so likely options are Java/Unix.
There are many patterns/models etc for OO based GUI / interactive processes but very few for (that I have found) for batch.
I have worked mainly with mainframe batch and online applications and come with the baggage that activity that can be processed in batch should be to avoid overloading the online container (CICS region, web server etc).
I believe that this continues to be true, as well as the particular data we are processing benefits from efficiencies of batching the data together to store eventually on tape.
In view of not finding any literature (which I doubt is the case) it seems that the problem is the same, so probably the solution is also similar.
In the procedural solution, a Jackson (or similar), would have been designed which would then reflect the procedures build into the code.
I expect that if instead of procedures classes where defined, certainly at a higher level then the design would still be ok.
(So a the higher level you have a 'main' class, which instansiates a 'read' io object, a processing object which handles the actual processing activity and a write io object).
The level to which would would combine procedures together, or further split them out, would then be the main point of discussion.
( However am open to the above suggestion being completly wrong).
Then there is efficient configuration when processing........
When running on the mainframe the code is loaded once, the memory for all the working storage strucures created. When actually processing there is no instantiating classes, or running the garbage collector etc. I re-use the same memory for each new record red in / processed / written out and all the code is normally loaded once when first called and the same code is re-used until all records processed.
Is there any way that I can replicate this within Java either in it's own JVM or running in a container such as websphere? When processing the volume of data that we do (20 million db entries + 40GB of document data avg) then anything not optimised is costing money and available processing time.

I suspect that batching is underused thoughrather
than overused.Can you elaborate on that? What kind ofconditions
would you advocate batching for? Running daily, monthly, etc reports. Or something
that feeds those.I don't disagree, I just don't even see this as 'batching'. Batching to me is when you take something that could be done incrementally and purposely doing it in large groups at set times or time-periods. If you have a daily report and you do it daily, you're just doing the most obvious approach. It might not even be the most efficient.
>>
I have some experience working with batch java
applications running in Unix. And I can tell you
that they did not improve anything. I suspect I would agree with that. I am not
advocating that the batching be done in java. Just
that idea that an 'incremental' process that requires
moving data versus a 'batch' process that doesn't
isn't something that I would normally consider a good
idea.I think we are thinking about different things. I'm really just talking about incremental or real-time vs. batching.
They were
actually the source of many of our issues. That
added abitrary time lags during times of lowvolumes,
sometimes adding 30 minutes or more to theprocessing
of a transaction as it waited for the nextscheduled
batch. They also made our backlogs worse in timesof
high volumes because the incoming data flow was
uneven, we would often get big batches of datafrom
partner systems (more batching, gotta love it)that
hit us when our batch process was sleeping. 5,10,
15 minutes would pass where the server ran at 10%
capacity while huge backlogs were piling up. It
didn't make anything better. It was just causing
idling.
What was the timeliness requirements for the
processing? Did it need to be completed by 2am in
the morning? Or could it have really just been
completed on demand?It was B2B transactions ASAP was the time requirement. I guess the upper limit was 6 hours or so. But batching didn't really decrease the processing time per transaction anyway and the server was never dedicated to the batch or anything so there were still context switches.
It just seems to me that in Java with all the nice
threading we have access to, the server shouldnever
be idle and if you cannot handle your volume youare
better off adding more servers, not attempting to
batch things.I have created applications that were intended to run
'batch' jobs which could be spread across servers.
Those particular processes had to finish within a
very narrow time span as well - about two hours as I
recall. There was an incremental as well as batch
functionality that needed to be run for this. The
batch functionality ran on the database. The
incremental took the batched results and handle the
incremental part.
Although management was never willing to dedicate
more than one server to the processing so I guess it
wasn't that important to them.
I have seen apps that claimed they were 'fast'
because they did all of the incremental processing
outside of the database. The design required moving,
literally, the entire database over the network to
other servers which would then process it.
Processing it in the database would have taken
orders of less time. And that was time sensitive
data. I can't remember if that app allowed for
multiple boxes to do the processing. I do know that
the people working on it could never figure out the
bottleneck (it was scaling to something like 12
hours a day which was not acceptable.)I guess I don't see doing it on the DB as implying batching. We use triggers to drive processes in Java, COBOL, whatever.

Batch processing and parallelism

Similar Messages

Maybe you are looking for