Processing in chunks

Hi,
I have a table with 3 million rows and I want to extract them into 100000 row chunks. there is a PK ID on the table and I can get the min and max PK ID and also use MOD to give me every 100000th PK ID like this:
select reference from (
select reference, rownum as rownumber from (
select reference from prop_cand_gen order by reference
) where MOD(rownumber-1,100000) = 0
order by reference
So now I need to come up with a query to extract the data based on the PKID ranges so something along the lines of:
select data1, data2 from table where refrence between 1 and 100000
select data1, data2 from table where reference between 100001 and 200000
right up to max(reference)
So I could right the 30 queries manually inputing the results from the mod query but i am after a more dynamic way to put this through a loop[ and generate the queries from MIn to MAx in chunks of 100000.
Any ideas welcome :0

Hi,
Welcome to the forum!
Think carefully about why you need to divide this into chunks.
If you have a good reason, you can write a query that will generate the 30 (or however many) SQL statements for you. For example:
WITH got_r_num AS
 SELECT rfrnc -- REFERENCE isn't a very good column name
 , ROW_NUMBER () OVER (ORDER BY rfrnc) AS r_num
 FROM pop_cand_gen_order
SELECT 'SELECT data1, data2 FROM table_x WHERE rfrnc >= '
 || rfrnc
 || CASE
 WHEN LEAD (rfrnc) OVER (ORDER BY rernc) IS NOT NULL
 THEN ' AND rfrnc ~ ' || LEAD (rfrnc) OVER (ORDER BY rernc) - 1
-- This character ^ is supposed to be a < sign
 END
 || ';'
FROM got_r_num
WHERE MOD (r_num, 100000) = 1
ORDER BY r_num
;Edited by: Frank Kulash on Oct 2, 2012 11:06 AM
Used >= and < instead of BETWEEN
Edited by: Frank Kulash on Oct 2, 2012 11:11 AM
This site is getting confused by the < sign in a string literal.
Use <, not ~, to try this

Similar Messages

Processing file in chunks

I have a file structure like Header-Transaction Records(100K)-Trailer with Transaction Records of 100k.
I want to process the this file through file-xi-file scenario. I am using the "Recordsets per message"
(5000 records per each chunk)parameter in the sender file communicaiton channel to process in chunks. But I didn't observed anything differce in processing by XI. The file is processed same way as it before without any "Recordsets per message" parameter set. How can I monitor that this parameter is took into effect and the system performance is improved.
If I got to write a operating sytstem command/script to split , how to write it for the file structure that i am
talking abt above ??
thank
kumar

Hi Palnati
This works for FCC and NFS else its not having any effect.
I replied your question in other thread.
handling huge files
Thanks
Gaurav

How do I fill by chunks a clob inside a table while XML is generated?

Hello everybody:
Needed to generate a single XML file from around 50 million rows, where the XML ouput matches not only row data but another default values for another elements and attributes (not from database but using strings and types with default values). Its generation will be a monthly task, so the process will repeat time by time. For generating that document, currently focus is based on using a batch process relying on Spring Batch, by using a single JDBC query where a join happens between a table and a materialized view (Both are datawarehouses, so it is not expected a change onto the data, and are single, non-partitioned tables). From my point of view, that approach uses: database resources, network resources, disk resources, and processing resources, including the price of making the join, sending to network, creating objects, validating, and making the file, finally wasting too much time generating that XML file. Promisse above that approach is to restart from the last execution point (savepoints) and to process by chunks of data (ResultSets). That processs currently is in development.
Seen Oracle has XML capabilites, another approach may be delegating the complete generation of that file to the database (Current version in use here is 10g release 2). My current approach is to generate a clob where I will put all the XML throughput and putting it inside a table (My first approach was to put directly into a disk file, unfortunately, isn't possible because internal policies). Considering limitations on memory, processing and disk space, needed to append a single row-as-xml to the clob as soon as it is rendered, and putting the clob inside the field as soon as possible, or putting the clob inside the field, and appending into it as the data is generated (preffered approach); also, considering its a very large operation, needed to make savepoints for each chunk of data processed (each 20,000 rows, as example). Some way I think it may be done is making a temporary clob, filling the clob while getting the rowid, using that rowid as savepoint (No other CRUD operations awaited for the table and the materialized view), and later appending the clob to the clob(s) in table (is there some way for append one clob to another one in table?). Explained all those points, how do I manage the process in order to achieve that goals, using a PL/SQL stored procedure while using built-in functions?.
Update: I did a small pseudocode where I describe my current approach:
Defining a cursor;
Creating a segment clob;
Creating a temporary clob;
Defining a chunk size;
Defining a temp rowid variable;
Defining a control number to be 0;
While cursor
     getting rowid and generating next row as xml into a temporary clob (XMLSerialize):
     increment the control number;
     if control number equals 1
write rowid variable to a control table;
     append the temporary clob to a segment clob;
     if control number is equals than the chunk size defined:
          append the segment clob to the clob inside a control table;
          restart the control number to be 0;
     else
          continue;
At this time, I'm with doubt about how to append the segment clob directly to the clob inside the control table (I find using that clob as PL/SQL variable would use too much memory and swap).
Any help shall be apreciated. Thanks in advance.

Well, of course it's looping. That's what the "while" statement does. What you need is more like this:
if (tableNumberJTextField.getText().equals("")){
JOptionPane.showMessageDialog(null, "Table Number cannot be empty","Error", JOptionPane.ERROR_MESSAGE);
tableNumberJTextField.requestFocusInWindow();
}and you want this code to be called every time tableNumberJTextField loses focus.

Loading, processing and transforming Large XML Files

Hi all,
I realize this may have been asked before, but searching the history of the forum isn't easy, considering it's not always a safe bet which words to use on the search.
Here's the situation. We're trying to load and manipulate large XML files of up to 100MB in size.
The difference from what we have in our hands to other related issues posted is that the XML isn't big because it has a largly branched tree of data, but rather because it includes large base64-encoded files in the xml itself. The size of the 'clean' xml is relatively small (a few hundred bytes to some kilobytes).
We had to deal with transferring the xml to our application using a webservice, loading the xml to memory in order to read values from it, and now we also need to transform the xml to a different format.
We solved the webservice issue using XFire.
We solved the loading of the xml using JAXB. Nevertheless, we use string manipulations to 'cut' the xml before we load it to memory - otherwise we get OutOfMemory errors. We don't need to load the whole XML to memory, but I really hate this solution because of the 'unorthodox' manipulation of the xml (i.e. the cutting of it).
Now we need to deal with the transofmation of those XMLs, but obviously we can't cut it down this time. We have little experience writing XSL, but no experience on how to use Java to use the XSL files. We're looking for suggestions on how to do it most efficiently.
The biggest problem we encounter is the OutOfMemory errors.
So I ask several questions in one post:
1. Is there a better way to transfer the large files using a webservice?
2. Is there a better way to load and manipulate the large XML files?
3. What's the best way for us to transform those large XMLs?
4. Are we missing something in terms of memory management? Is there a better way to control it? We really are struggling there.
I assume this is an important piece of information: We currently use JDK 1.4.2, and cannot upgrade to 1.5.
Thanks for the help.

I think there may be a way to do it.
First, for low RAM needs, nothing beats SAX. as the first processor of the data. With SAX, you control the memory use since SAX only processes one "chunk" of the file at a time. You supply a class with methods named startElement, endElement, and characters. It calls the startElement method when it finds a new element. It calls the characters method when it wants to pass you some or all of the text between the start and end tags. It calls endElement to signal that passing characters is over, and to let you get ready for the next element. So, if your characters method did nothing with the base-64 data, you could see the XML go by with low memory needs.
Since we know in your case that the characters will process large chunks of data, you can expect many calls as SAX calls your code. The only workable solution is to use a StringBuffer to accumulate the data. When the endElement is called, you can decode the base-64 data and keep it somewhere. The most efficient way to do this is to have one StringBuffer for the class handling the SAX calls. Instantiate it with a big enough size to hold the largest of your binary data streams. In the startElement, you can set the length of the StringBuilder to zero and reuse it over and over.
You did not say what you wanted to do with the XML data once you have processed it. SAX is nice from a memory perspective, but it makes you do all the work of storing the data. Unless you build a structured set of classes "on the fly" nothing is kept. There is a way to pass the output of one SAX pass into a DOM processor (without the binary data, in this case) and then you would wind up with a nice tree object with the rest of your data and a group of binary data objects. I've never done the SAX/DOM combo, but it is called a SAXFilter, and you should be able to google an example.
So, the bottom line is that is is very possible to do what you want, but it will take some careful design on your part.
Dave Patterson

Downloading a report in excel format : getBlob WIS 30270 Error

Hello experts,
I 'm working in "Business Objects XI R2 SP2", the server is a "windows 2003 server".
I have a big report , that some times returns me that message error when I try to download it in excel format:
Error message: "An internal error occured while calling the getblob api error wis30270".
The number of lines in the report is 27800, the number of columns is 62, the file is 27mb.
If I push the drill button the I can download it without errors, but this is not a good solution for the final users...
Best regards
Camillo

Hi Camillo,
Following information might help you to resolve the issue.
.Business Objects Enterprise provides thresholds to protect the WebIntelligence report server and the Web Application servers from processing large chunks of binary and character data and avoid crashing.
Maximum Binary File Size: Maximum size of a binary file (for example: PDF, XLSu2026) that can be generated by the Report Server. If a binary file generated from a Web Intelligence document is greater than this limit, the generation is stopped to protect the server and an error is returned. Increase this limit if the server has to generate large binary files (for example, if InfoView users view large documents in *.pdf format).
Maximum Character File Size: Maximum size of a text stream (for example: HTML, XML) that will be transferred to the application server. If a text stream sent to the application server is greater than this limit, the generation is stopped to protect the server and an error is returned.
In CMC, under Servers>WebIntelligence Report server, you can possibly set both the above values to 50MB, but you have to ensure that the application server can handle the increased load multiplied by the number of simultaneous connections. If not, it can crash and block all access to applications.
2. Ensure that the Web Application servers have enough memory available to process incoming requests. Usually the -Xmx value for java process is set at 256Mb, which is recommended to be at least 1024Mb or more depending on your Export requirements and application server limitations.
Regards,
Sarbhjeet Kaur

Transaction Time out in BPEL

Hi All,
About my issue ,
A legacy system written in PL/SQL procedures written to process huge chunk of data ( kind of adapter which processes more than 2000 files each invoke updates some tables , which takes around 6 hrs to complete ). and we are trying to invoke those pl/sql procedures to integrate the system with SOA(11G).
We have developed a BPEL communicating to DB through DBadapter ( invokes the required PL/SQL procedures ).
All BPELs used are asynchronous processes . wanted to know how the DB Adapter works . is it a good practice to make the BPEL wait for the pl/sql procedure to return (Though BPEL process is a Asynchronous one) .
Please advice me on this .
Cheers
Vamsi

Vamsi,
I believe configuring Pick activity will not help in this case. If you are configuring a DB adapter in a BPEL process then between the invoke and DB adapter it is a sync call. So try increasing transaction time out period through admin console and let me know the results.
Another option is, remove all activities and services from main BPEL after invoke activity (of DB adapter) and create another BPEL and create the removed activities. Create a concurrent program to invoke the PLSQL procedure and at the end of the procedure call the end point url of the second bpel and send the out variables to this bpel.
Hope this helps
Thanks

ALBPM - Batch events and Load Balancing

Hi,
We are planning to design a BPM solution for one of our current applications. The need we have is that the BPM solution should be able to start process instances in a batch mode. We receive about 25,000 to 30,000 events in a batch file and we need to start one BPM process instance for each record. Currently we are evaluating the ALBPM, but trying to figure out a best approach to do this. Also I would like to know what will be be ther better options to configure load balance instances of BPM for this scenario. I could not find any documents from BEA to address these. Anybody tried or came across this situation? Any documents/examples that can answer or suggest options for the batch modes? Thanks in advance!

First of all, load balancing applies at various levels.
1) load balancing at the horizontal level is only achieved by creating a WLS cluster and deploying the BPM engine in the cluster.
2) load balancing at the vertical level can be achieved by creating many BPM engines and deploying processes on each Engine, this will distribute instances between the engines.
if you need to create 25k instances of the same process in a batch, there is no problem, it will queue the executions and dispatch them to the available threads.
A good approach would be to create instances in smaller chunks, to do this you can create a "batch process" that can get the file, split it in smaller chunks and process one chunk at a time. An important thing to consider when working with so many instances is that the transactions tend to get bigger and longer, so if you have a huge chunk, you can get a transaction timeout or a DB exception because of redo logs sizes.
Hope this helps!
MAriano Benitez
Join us at BEAParticipate, May 6-9 2007 | Atlanta, Georgia

Restrict data flow to MRP_ATP_DETAILS_TEMP

Hi,
we have a custom program which pulls the details of item availability for SOs through ATP rules. we are using the api MRP_ATP_PUB.CALL_ATP to pull the atp detail for the items.
but when this program is run each time it inserts about *70million* rows into MRP_ATP_DETAILS_TEMP table. then we have to run the purge atp temp table program to truncate the temp table. But as per our DBA this is a risk for the db storage space. so we have stopped running of this program now.
please answer my below 2 questions -
1) is it really a risk if the table rows reaches 70 million records? upto what maximum number of rows it is safe?
2) is there a way to stop data insert to atp temp tables when the api is called? any profile option or debug level responsible for this? we dont want data to be inserted into the temp table.
Regards,
Samir Kumar Das

1) is it really a risk if the table rows reaches 70 million records? upto what maximum number of rows it is safe?
There is no magic number. It depends on the size of your database. 70m is huge for a small company but peanuts for say, Boeing.
2) is there a way to stop data insert to atp temp tables when the api is called? any profile option or debug level responsible for this? we dont want data to be inserted into the temp table
Oracle uses this table to calculate ATP results. So if you use the API, it will insert records.
Having said that, 70m for ATP check is excessive.
Are you doing an ATP check for ALL open sales orders?
If it is just one or even hundred order lines, 70m seems too much. You need to raise an SR with Oracle.
If you are doing it for all open lines (and there are thousands and thousands) here are a few things you can try
1) Find how many sales order lines the custom program processes. See if you can reduce it by putting more conditions (such as ignore orders more than a year old, ignore orders more than 2 months in future etc.)
2) Try to process in chunks. Instead of calling purge AFTER your custom program finishes, try calling it from your custom program every time say x number of order lines are processed.
Sandeep Gandhi

Dashboards ? Ask me how... but first ask ourself Why you need a Dashboard..

"A dashboard is a visual display of the most important information needed to achieve one or more objectives which fits entirely on a single computer screen so it can be monitored at a glance." - Stephen Few

That is a good way to describe the purpose of a Dashboard, but not enough to understand the whole concept.

Few phenomena characterize our time more uniquely and powerfully than the rapid rise and influence of information technologies. These technologies have unleashed a tsunami of data that rolls over and flattens us in its wake. Taming this beast has become a primary goal of the informtion industry. One tool that has emerged from this effort in recent years if the information dashboard. This single-screen display of the most important information people need to do a job, presented in a way that allows them to monitor what's going on in an instant, is a powerful new medium of communication.

Most information dashbarods that are used in business today fall far short potential. The root of the problem is not technology (only ;)... ) but poor visual design. To serve their purpose and fulfill their potential, dashboards must display a dense array of information in a small amount of space in a manner that communicates clearly and immediately. This requires design that taps into and leverages the power of visual perception to send and process large chunks of information rapidly.

We can learn a lot with the Human Being Mr. Steve Jobs from Apple, in one of the biography about this expert I found a pretty new and interesting perspective about DESIGN:

"Design is not just what it looks like and feels like. Design is how it works." - Steve Jobs

"In most people's vocabularies, design means veneer. It's interior decorating. It's the fabric of the curtains of the sofa. But to me, nothing could be further from the meaning of design. Design is the fundamental soul of a human-made creation that ends up expressing itself in successive outer layers of the product or service."- Steve Jobs

Everybody on a leadership position wants to have a beautiful (with colors and animations) Dashboard to show and talk about it to his neighbor in the next door (office) and the beauty of a Dashboard goes beyond.. it must touch your visual sense and provoke your ability to see exactly what you are looking for... to solve your problem or even to provoke the insight before it happens.

Any thoughts ? I've attached some examples and you can take your own conclusion...

I would recommend a good literature for those that are interested in: INFORMATION DASHBOARD DESIGN - The Effective Visual Communication of Data - Stephen Few.

Amazon:
<a href="http://www.amazon.com/Information-Dashboard-Design-Effective-Communication/dp/0596100167/ref=sr_1_1?ie=UTF8&s=books&qid=1248497014&sr=8-1">http://www.amazon.com/Information-Dashboard-Design-Effective-Communication/dp/0596100167/ref=sr_1_1?ie=UTF8&s=books&qid=1248497014&sr=8-1</a>

"A dashboard is a visual display of the most important information needed to achieve one or more objectives which fits entirely on a single computer screen so it can be monitored at a glance." - Stephen Few

That is a good way to describe the purpose of a Dashboard, but not enough to understand the whole concept.

Few phenomena characterize our time more uniquely and powerfully than the rapid rise and influence of information technologies. These technologies have unleashed a tsunami of data that rolls over and flattens us in its wake. Taming this beast has become a primary goal of the informtion industry. One tool that has emerged from this effort in recent years if the information dashboard. This single-screen display of the most important information people need to do a job, presented in a way that allows them to monitor what's going on in an instant, is a powerful new medium of communication.

Most information dashbarods that are used in business today fall far short potential. The root of the problem is not technology (only ;)... ) but poor visual design. To serve their purpose and fulfill their potential, dashboards must display a dense array of information in a small amount of space in a manner that communicates clearly and immediately. This requires design that taps into and leverages the power of visual perception to send and process large chunks of information rapidly.

We can learn a lot with the Human Being Mr. Steve Jobs from Apple, in one of the biography about this expert I found a pretty new and interesting perspective about DESIGN:

"Design is not just what it looks like and feels like. Design is how it works." - Steve Jobs

"In most people's vocabularies, design means veneer. It's interior decorating. It's the fabric of the curtains of the sofa. But to me, nothing could be further from the meaning of design. Design is the fundamental soul of a human-made creation that ends up expressing itself in successive outer layers of the product or service."- Steve Jobs

Everybody on a leadership position wants to have a beautiful (with colors and animations) Dashboard to show and talk about it to his neighbor in the next door (office) and the beauty of a Dashboard goes beyond.. it must touch your visual sense and provoke your ability to see exactly what you are looking for... to solve your problem or even to provoke the insight before it happens.

Any thoughts ? I've attached some examples and you can take your own conclusion...

I would recommend a good literature for those that are interested in: INFORMATION DASHBOARD DESIGN - The Effective Visual Communication of Data - Stephen Few.

Amazon:
<a href="http://www.amazon.com/Information-Dashboard-Design-Effective-Communication/dp/0596100167/ref=sr_1_1?ie=UTF8&s=books&qid=1248497014&sr=8-1">http://www.amazon.com/Information-Dashboard-Design-Effective-Communication/dp/0596100167/ref=sr_1_1?ie=UTF8&s=books&qid=1248497014&sr=8-1</a>

DBMS_PARALLEL_EXECUTE multiple threads taking more time than single thread

I am trying to insert 10 million records from source table to target table.
Number of chunks = 100
There are two scenarios:
dbms_parallel_execute(..... parallel_level => 1) -- for single thread
dbms_parallel_execute(..... parallel_level => 10) -- for 10 threads
I observe that the average time taken by 10 threads to process each chunk is 10 times the average time taken in case of single thread.
Ideally it should be same which would reduce the time taken by a factor of 10 (due to 10 threads).
Due to the above mentioned behavior, the time taken is the same in both cases.
It would be great if anybody can explain me the reason behind such behavior.
Thanks in advance

Source Table = TEST_SOURCE
Target Table = TEST_TARGET
Both tables have 100 columns
Below is the code:
DECLARE
l_task VARCHAR2(30) := 'test_task_F';
l_sql_stmt VARCHAR2(32767);
l_try NUMBER;
l_stmt VARCHAR2(32767);
l_status NUMBER;
BEGIN
l_stmt := 'select dbms_rowid.rowid_create( 1, data_object_id, lo_fno, lo_block, 0 ) min_rid,
 dbms_rowid.rowid_create( 1, data_object_id, hi_fno, hi_block, 10000 ) max_rid
 from (
 select distinct grp,
 first_value(relative_fno)
 over (partition by grp order by relative_fno, block_id
 rows between unbounded preceding and unbounded following) lo_fno,
 first_value(block_id )
 over (partition by grp order by relative_fno, block_id
 rows between unbounded preceding and unbounded following) lo_block,
 last_value(relative_fno)
 over (partition by grp order by relative_fno, block_id
 rows between unbounded preceding and unbounded following) hi_fno,
 last_value(block_id+blocks-1)
 over (partition by grp order by relative_fno, block_id
 rows between unbounded preceding and unbounded following) hi_block,
 sum(blocks) over (partition by grp) sum_blocks
 from (
 select relative_fno,
 block_id,
 blocks,
 trunc( (sum(blocks) over (order by relative_fno, block_id)-0.01) / (sum(blocks) over ()/100) ) grp
 from dba_extents
 where segment_name = upper(''TEST_REGION_SOURCE'')
 and owner = ''FUSION'' order by block_id
 (select data_object_id from user_objects where object_name = upper(''TEST_REGION_SOURCE'') )';
DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);
DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => l_task,
sql_stmt => l_stmt,
by_rowid => true);
l_sql_stmt := 'insert into FUSION.TEST_REGION_TARGET(REGION_ID,REGION1,REGION2,REGION3,REGION4,
 ...., REGION99
 SELECT REGION_ID,REGION1,REGION2,REGION3,REGION4,
 .....,REGION99
 from FUSION.TEST_REGION_SOURCE WHERE (1=1) AND rowid BETWEEN :start_id AND :end_id ';
DBMS_PARALLEL_EXECUTE.run_task(task_name => l_task,
sql_stmt => l_sql_stmt,
language_flag => DBMS_SQL.NATIVE,
parallel_level => 10);
-- If there is error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)
Loop
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.resume_task(l_task);
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);
END LOOP;
DBMS_PARALLEL_EXECUTE.drop_task(l_task);
END;
Edited by: 943978 on Jul 2, 2012 9:22 AM

WatchService and SwingWorker: how to do it correctly?

cross-posted to SOF:
http://stackoverflow.com/questions/7784909/watchservice-and-swingworker-how-to-do-it-correctly
For maximum feedback (though many regulars roam everywhere :-), here's a copy
WatchService sounded like an exciting idea ... unfortunately it seems to be as low-level as warned in the tutorial/api plus doesn't really fit into the Swing event model (or I'm missing something obvious, a not-zero probability ;-)
Taking the code from WatchDir (simplyfied to handle a single directory only), I basically ended up
extend SwingWorker
do the registration stuff in the constructor
put the endless loop waiting for a key in doInBackground
publish each WatchEvent when retrieved via key.pollEvents()
process the chunks by firing propertyChangeEvents with the deleted/created files as newValue
@SuppressWarnings("unchecked")
public class FileWorker extends SwingWorker<Void, WatchEvent<Path>> {
 public static final String DELETED = "deletedFile";
 public static final String CREATED = "createdFile";
 private Path directory;
 private WatchService watcher;
 public FileWorker(File file) throws IOException {
 directory = file.toPath();
 watcher = FileSystems.getDefault().newWatchService();
 directory.register(watcher, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY);
 @Override
 protected Void doInBackground() throws Exception {
 for (;;) {
 // wait for key to be signalled
 WatchKey key;
 try {
 key = watcher.take();
 } catch (InterruptedException x) {
 return null;
 for (WatchEvent<?> event : key.pollEvents()) {
 WatchEvent.Kind<?> kind = event.kind();
 // TBD - provide example of how OVERFLOW event is handled
 if (kind == OVERFLOW) {
 continue;
 publish((WatchEvent<Path>) event);
 // reset key return if directory no longer accessible
 boolean valid = key.reset();
 if (!valid) {
 break;
 return null;
 @Override
 protected void process(List<WatchEvent<Path>> chunks) {
 super.process(chunks);
 for (WatchEvent<Path> event : chunks) {
 WatchEvent.Kind<?> kind = event.kind();
 Path name = event.context();
 Path child = directory.resolve(name);
 File file = child.toFile();
 if (StandardWatchEventKinds.ENTRY_DELETE == kind) {
 firePropertyChange(DELETED, null, file);
 } else if (StandardWatchEventKinds.ENTRY_CREATE == kind) {
 firePropertyChange(CREATED, null, file);
}The basic idea is to make using code blissfully un-aware of the slimy details: it listens to the property changes and f.i. updates arbitrary models as appropriate:
 String testDir = "D:\\scans\\library";
 File directory = new File(testDir);
 final DefaultListModel<File> model = new DefaultListModel<File>();
 for (File file : directory.listFiles()) {
 model.addElement(file);
 final FileWorker worker = new FileWorker(directory);
 PropertyChangeListener l = new PropertyChangeListener() {
 @Override
 public void propertyChange(PropertyChangeEvent evt) {
 if (FileWorker.DELETED == evt.getPropertyName()) {
 model.removeElement(evt.getNewValue());
 } else if (FileWorker.CREATED == evt.getPropertyName()) {
 model.addElement((File) evt.getNewValue());
 worker.addPropertyChangeListener(l);
 JXList list = new JXList(model);Seems to work, but I feel uncomfortable
Outing myself as the thread agnostic I am: all example snippets I have seen so far do block the waiting thread by using watcher.take(). Why do they do it? Would expect at least some use watcher.poll() and sleep a bit.
the SwingWorker publish method doesn't quite seem to fit: for now it's okay, as I'm watching one directory only (didn't want to galopp too far into the wrong direction :) When trying to watch several directories (as in the original WatchDir example) there are several keys and the WatchEvent relative to one of those. To resolve the path, I would need both the event and the key - but can pass on only one. Most probably got the distribution of logic wrong, though
Feedback (here or there, will take all :-) highly welcome!
Cheers
Jeanette

finally settled on a version that's good enough (for now, at least), published over at SOF, copied here:
Actually, @Eels's comment didn't stop knocking in the back of my head - and finally registered: it's the way to go, but there is no need for any "artificial" struct, because we already have the perfect candidate - it's the PropertyChangeEvent itself :-)
Taking the overall process description from my question, the first three bullets remain the same
- same: extend SwingWorker
- same: do the registration stuff in the constructor
- same: put the endless loop waiting for a key in doInBackground
- changed: create the appropriate PropertyChangeEvent from each WatchEvent when retrieved via key.pollEvents and publish the PropertyChangeEvent
- changed: fire the previously created event in process(chunks)
@SuppressWarnings("unchecked")
public class FileWorker extends SwingWorker<Void, PropertyChangeEvent> {
 public static final String FILE_DELETED = StandardWatchEventKinds.ENTRY_DELETE.name();
 public static final String FILE_CREATED = StandardWatchEventKinds.ENTRY_CREATE.name();
 public static final String FILE_MODIFIED = StandardWatchEventKinds.ENTRY_MODIFY.name();
 // will change to a map of key/directories, just as the tutorial example
 private Path directory;
 private WatchService watcher;
 public FileWorker(File file) throws IOException {
 directory = file.toPath();
 watcher = FileSystems.getDefault().newWatchService();
 directory.register(watcher, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY);
 @Override
 protected Void doInBackground() throws Exception {
 for (;;) {
 // wait for key to be signalled
 WatchKey key;
 try {
 key = watcher.take();
 } catch (InterruptedException x) {
 return null;
 for (WatchEvent<?> event : key.pollEvents()) {
 WatchEvent.Kind<?> kind = event.kind();
 // TBD - provide example of how OVERFLOW event is handled
 if (kind == OVERFLOW) {
 continue;
 publish(createChangeEvent((WatchEvent<Path>) event, key));
 // reset key return if directory no longer accessible
 boolean valid = key.reset();
 if (!valid) {
 break;
 return null;
 * Creates and returns the change notification. This method is called from the
 * worker thread while looping through the events as received from the Watchkey.
 * @param event
 * @param key
 protected PropertyChangeEvent createChangeEvent(WatchEvent<Path> event, WatchKey key) {
 Path name = event.context();
 // evolve into looking up the directoy from the key/directory map
 Path child = directory.resolve(name);
 PropertyChangeEvent e = new PropertyChangeEvent(this, event.kind().name(), null, child.toFile());
 return e;
 @Override
 protected void process(List<PropertyChangeEvent> chunks) {
 super.process(chunks);
 for (PropertyChangeEvent event : chunks) {
 getPropertyChangeSupport().firePropertyChange(event);
}Feedback still highly welcome, of course, especially if there's something wrong :-)
Thanks
Jeanette

OSB 11g - Maximum file size OSB can handle

Hi,
What is the maximum size (of a file) which OSB can handle, when then file is polled using FTP, and then use XQuery transformation on it?
Thanks,
Sanjay

Processing large files in OSB will be limited by the constraints on both Heap Memory and CPU utilization.
To load larger objects in memory you will need enough Heap Size. If your heap size is not enough you will start getting Out of memory errors and weblogic server might crash. You can either add more memory to the server(which itself has a limit) or you can use content streaming. When you use content streaming the whole object will not be loaded into memory buffer and you will be able to consume large messages. But content streaming has certain limitations because the whole object is not loaded in memory. You can find these limitations at following link and decide if you can implement your use case with these limitations.
http://download.oracle.com/docs/cd/E13159_01/osb/docs10gr3/userguide/context.html#wp1110513
If not then the best possible option would be to break the file into smaller chunks and then using OSB to read and process those chunks.
Another limitation is the CPU utilization. If you are not transforming the large content CPU should not be an issue. But if you need to transform before passing the payload to backend system then it becomes complicated. Transformations (both XQuery and XSLT ) take a lot of CPU cycles and if you are transforming huge payloads it will consume 100% CPU resulting in overall deterioration of the performance. Like Anuj mentioned, it is not recommended to have complex transformations and transformations of large XMLs within OSB. Its best to move out such transformations to other systems. But, it is also not recommended that you move many transformations to various backend systems. If your architecture involved many flows which have transformations on large objects, you should think about spending on an XML Appliance meant for transformations. These products are purpose built to focus on doing complex XMLXML processing with little impact on the overall CPU utilization. They use hardware acceleration to achieve this. You can contact Oracle Sales team or Oracle support to get more information about this.
Regards
Abhi

What is DBMS_PARALLEL_EXECUTE doing in the background

What is the best way to see all the actual SQL's that are being executed in the background when a package is executed?
For an example, I am interested in knowing what DBMS_PARALLEL_EXECUTE package is doing in the background. I've read what the procedures do and I understand the functionality. But I'd like to know how it does it. I wanted to know what create_chunks_by_number_col is doing in the background.

970021 wrote:
What is the best way to see all the actual SQL's that are being executed in the background when a package is executed?
For an example, I am interested in knowing what DBMS_PARALLEL_EXECUTE package is doing in the background. I've read what the procedures do and I understand the functionality. But I'd like to know how it does it. I wanted to know what create_chunks_by_number_col is doing in the background.
OK - I'm confused.
You said you 'read what the procedures do' but the doc explains pretty clearly (IMHO) exactly how it creates the chunks.
http://docs.oracle.com/cd/E11882_01/appdev.112/e16760/d_parallel_ex.htm#CHDHFCDJ
CREATE_CHUNKS_BY_NUMBER_COL Procedure
This procedure chunks the table (associated with the specified task) by the specified column. The specified column must be a NUMBER column. This procedure takes the MIN and MAX value of the column, and then divide the range evenly according to chunk_size. The chunks are:
CREATE_CHUNKS_BY_NUMBER_COL Procedure
This procedure chunks the table (associated with the specified task) by
the specified column. The specified column must be a NUMBER column. This
procedure takes the MIN and MAX value of the column, and then divide the
range evenly according to chunk_size. The chunks are:
START_ID                              END_ID
min_id_val                            min_id_val+1*chunk_size-1
min_id_val+1*chunk_size               min_id_val+2*chunk_size-1
min_id_val+i*chunk_size               max_id_val
So I am at a loss to know how that particular example is of any value to you.
That package creates a list of START_ID and END_ID values, one pair of values for each 'chunk'. It then starts a parallel process for each chunk that queries the table using a where clause that is basically just this:
WHERE userColumn BETWEEN :START_ID AND END_ID
The RUN_TASK Procedure explains part of that
RUN_TASK Procedure
This procedure executes the specified statement (sql_stmt) on the chunks in parallel. It commits after processing each chunk. The specified statement must have two placeholders called start_id, and end_id respectively, which represent the range of the chunk to be processed. The types of the placeholder must be rowid where ROWID based chunking was used, or NUMBER where number based chunking was used. The specified statement should not commit unless it is idempotent.
The SQL statement is executed as the current user.
Examples
Suppose the chunk table contains the following chunk ranges:
START_ID                              END_ID
1                                     10
11                                    20
21                                    30
And the specified SQL statement is:
UPDATE employees
      SET salary = salary + 10
      WHERE e.employee_id BETWEEN :start_id AND :end_id
This procedure executes the following statements in parallel:
UPDATE employees
      SET salary =.salary + 10 WHERE employee_id BETWEEN 1 and 10;
      COMMIT;
UPDATE employees
      SET salary =.salary + 10 WHERE employee_id between 11 and 20;
      COMMIT;
UPDATE employees
      SET salary =.salary + 10 WHERE employee_id between 21 and 30;
      COMMIT;
You could just as easily write those queries yourself for chunking by number. But you couldn't execute them in parallel unless you created a scheduler job.
So like the doc says Oracle is just:
1. getting the MIN/MAX of the column
2. creating a process for each entry in the 'chunk table'
3. executing those processes in parallel
4. commiting each process individually
5. maintain status for you.
I'm not sure what you would expect to see on the backend for an example like that.

How to run CFLOOP in a batch of 100

I need to loop over a logic for each record I queried from a table once a day
Each day our dept. may receive a different number of records. Today we may get 100 new records, tomorrow may be 5000 and the next day again may be none.
My application crashes whenever we get 5000 or more records; so, I plan to loop in a batch of 100, for example, if today I get 5000 records, I want to run my app. 50 times, each process is looping 100 records. Is there a way to do this?

The solution depends on just what is really causing your CF script to fail - can you give us information on the error you are receiving from CF? Are you running out of memory? If so, is it because the resultset from the query is so large, or is it because there is code in the CFLOOP that is causing memory usage to grow (creating variables, adding to a structure, displaying data to the output stream, etc)? Are you sure that CF is crashing, and not your browser?
If the problem is the overall memory footprint due to the size of the query, then one possible way to solve this is to create a var at the top of the script (variables.recordsToProcess), give it a value of around 100 to start with, and then write a query just to get the total record count (SELECT COUNT(*)). Now you can compute the number of passes needed to process all of the data, and can wrap most of your current code in a CFLOOP where the SQL query you are using now has been modified to return only one chuck of data (TOP #variables.recordsToProcess#), and the current CFLOOP only processes that chunk of data(TO="#variables.recordsToProcess#).
If the problem is due to an expanding memory footprint that you cannot get around, another conceptually similar way to solve this is If you pass a URL var to the script (url.recordsToProcess) that you can then use in the SQL query (TOP #url.recordsToProcess#), and in the CFLOOP (TO="#url.recordsToProcess#). Then write another script that does a query just to get the total record count, and have it loop over a CFHTTP call (that passes the URL var as a query param) to your current script for the appropriate number of times. If the reason for the memory expansion is because you are writing a lot data to the output stream, then as you execute each CFHTTP you can append its output to a file. that way you don't have all of the data sitting in CF memory at any one time. If you need to make the output stream accessible to the browser, then at the end of that script that is doing the cfhttp calls, just put a link to the file that the user can click on - or you could put in a CFCONTENT tag that points to the file.
You're not using CFDUMP in there to output data, are you? If so, then get rid of it and your problems will probably go away. Just spit out an HTML table with your data - you won't have all of the JS crappola that CFDUMP puts in there.
-reed

How to Load 100 Million Rows in Partioned Table

Dear all,
I a workling in VLDB application.
I have a Table with 5 columns
For ex- A,B,C,D,DATE_TIME
I CREATED THE RANGE (DAILY) PARTIONED TABLE ON COLUMN (DATE_TIME).
AS WELL CREATED NUMBER OF INDEX FOR EX,
INDEX ON A
COMPOSITE INDEX ON DATE_TIME,B,C
REQUIREMENT
NEED TO LOAD APPROX 100 MILLION RECORDS IN THIS TABLE EVERYDAY ( IT WILL LOAD VIA SQL LOADER OR FROM TEMP TABLE (INSERT INTO ORIG SELECT * FROM TEMP)...
QUESTION
TABLE IS INDEXED SO I AM NOT ABLE TO USE SQLLDR FEATURE DIRECT=TRUE.
SO LET ME KNOW WHAT THE BEST AVILABLE WAY TO LOAD THE DATA IN THIS TABLE ????
Note--> PLEASE REMEMBER I CAN'T DROP AND CREATE INDEX DAILY DUE TO HUGE DATA QUANTITY.

Actually a simpler issue then what you seem to think it is.
Q. What is the most expensive and slow operation on a database server?
A. I/O. The more I/O, the more latency there is, the longer the wait times are, the bigger the response times are, etc.
So how do you deal with VLT's? By minimizing I/O. For example, using direct loads/inserts (see SQL APPEND hint) means less I/O as we are only using empty data blocks. Doing one pass through the data (e.g. apply transformations as part of the INSERT and not afterwards via UPDATEs) means less I/O. Applying proper filter criteria. Etc.
Okay, what do you do when you cannot minimize I/O anymore? In that case, you need to look at processing that I/O volume in parallel. Instead of serially reading and writing a 100 million rows, you (for example) use 10 processes that each reads and writes 10 million rows. I/O bandwidth is there to be used. It is extremely unlikely that a single process can fully utilised the available I/O bandwidth. So use more processes, each processing a chunk of data, to use more of that available I/O bandwidth.
Lastly, think DDL before DML when dealing with VLT's. For example, a CTAS to create a new data set and then doing a partition exchange to make that new data set part of the destination table, is a lot faster than deleting that partition's data directly, and then running a INSERT to refresh that partition's data.
That in a nutshell is about it - think I/O and think of ways to use it as effectively as possible. With VLT's and VLDB's one cannot afford to waste I/O.

Processing in chunks

Similar Messages

Maybe you are looking for