Bulk-loading performance

I'm loading Twitter stream data into JE. There's about 2 million data pieces daily, each about 1K. I have a user class and a twit (status) class, and for each twit, I update the user; I also have secondaries on twits for replies, and use DPL. In fact this is all in Scala, but works with JE just fine, as it should. Since each twit insertion updates its user, e.g. with total count of twits per user incremented, originally I had a transaction for each UserTwit insertion, and several threads working on inserting, similar to the architecture I developed first for PostgreSQL. However, that was too slow. So then I switched to a single thread, no transactions, and deferred write. Here's what happens with that: the loading works very quickly through all twits, in about 10-20 minutes, and then spends about 1-2 hours on store.sync; store.close; env.sync; env.close. Do I need to sync both if I have only one DPL store and nothing else in this environment, and/or do I lose any more time with two syncs? Should I do anything special to stop checkpointing thread or cleaning one?
I already have 2,000+ small 10M jdb files, and wonder how can I agglomerate them together into say 1 GB files each, since this is about how much the database grows daily.
Overall, the PostgreSQL performance is about 2-4 hours per bulkload, similar to BDB JE. I implemented exactly the same loading logic with the PG or BDB backends, and hoped that BDB will be faster, but for now not by an order of magnitude... And this is given that PG doesn't use RAM cache, while with JE I specify the cache size of 50 GB explicitly, and it takes about 15 GB of RAM when quickly going through the put phase, before hanging for an hour or two in sync.
The project, tfitter, is open source, and is available at github:
http://github.com/alexy/tfitter/tree/master
I use certain tricks to convert the Java classes from and back to Scala's, but all the time is spent in sync, so it's a JE question --
I'd appreciate any recommendations to make it faster with the JE.
Cheers,
Alexy

Aleksy,
A few of us were talking about your question, and had some more options to add. Without more detailed data, such as the stats obtained from Environment.getStats() or the thread dumps as Charles and Gordon (gojomo) suggested, our suggestions are bit hypothetical.
Gordon's point about GC options and Charlie's suggestion of je.checkpointer.highPriority are CPU oriented. Charlie's point about Entity.sync vs Environment.sync is also in that category. You should try those suggestions because they will certainly reduce the workload some. (If you need to essentially sync up everything in an environment, it is less overhead to call Environment.sync, but if only some of the entity stores need syncing, it is more worthwhile to call Entity.sync).
However, your last post implied that you are more I/O bound during the sync phase. In particular, are you finding that you have a small number of on-disk files before the call to sync, and a great many afterwards? In that case, the sync is dumping out the bulk of the modified objects at that time, and it may be useful to change the .jdb file size during this phase by setting je.log.fileMax through EnvironmentConfig.setConfigParam().
JE issues a fsync at the boundary of each .jdb file, so increasing the .jdb file dramatically can reduce the number of fsyncs, and improve your write throughput. As a smaller, secondary benefit, JE is storing some metadata on a per-file basis, and increasing the file size can reduce that overhead, though generally that is a minor issue. You can see the number of fsyncs issued through Environment.getStats()
There are issues to be careful about when changing the .jdb file size. The file is the unit of log cleaning. Increasing the log file size can make later log cleaning expensive if that data becomes obsolete later. If the data is immutable, that is not a concern.
Enabling the write disk cache can also help during the write phase.
Again, send us any stats or thread dumps that you generate during the sync phase.
Linda

Similar Messages

Critical performance problem upon bulk load of groups

All (including product development),
I think there are missing indexes in wwsec_flat$ and wwsec_sys_priv$. Anyway, I'd like assistance on fixing the critical performance problems I see, properly. Read on...
During and after bulk load of a few (about 500) users and groups from an external database, it becomes evident that there's a performance problem somewhere. Many of the calls to wwsec_api.addGroupToList took several minutes to finish. Afterwards the machine went 100% CPU just from logging in with the portal30 user (which happens to be group owner for all the groups).
Running SQL trace points in the directions of the following SQL statement:
SELECT ID,PARENT_ID,NAME,TITLE_ID,TITLEIMAGE_ID,ROLLOVERIMAGE_ID,
DESCRIPTION_ID,LAYOUT_ID,STYLE_ID,PAGE_TYPE,CREATED_BY,CREATED_ON,
LAST_MODIFIED_BY,LAST_MODIFIED_ON,PUBLISHED_ON,HAS_BANNER,HAS_FOOTER,
EXPOSURE,SHOW_CHILDREN,IS_PUBLIC,INHERIT_PRIV,IS_READY,EXECUTE_MODE,
CACHE_MODE,CACHE_EXPIRES,TEMPLATE FROM
WWPOB_PAGE$ WHERE ID = :b1
I checked the existing indexes, and see that the following ones are missing (I'm about to test with these, but have not yet done so):
CREATE UNIQUE INDEX "PORTAL30"."WWSEC_FLAT_IX_GROUP_ID"
ON "PORTAL30"."WWSEC_FLAT$"("GROUP_ID")
TABLESPACE "PORTAL" PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 160K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING
CREATE UNIQUE INDEX "PORTAL30"."WWSEC_FLAT_IX_PERSON_ID"
ON "PORTAL30"."WWSEC_FLAT$"("PERSON_ID")
TABLESPACE "PORTAL" PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 160K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING
CREATE UNIQUE INDEX "PORTAL30"."WWSEC_SYS_PRIV_IX_PATCH1"
ON "PORTAL30"."WWSEC_SYS_PRIV$"("OWNER", "GRANTEE_GROUP_ID",
"GRANTEE_TYPE", "OWNER", "NAME", "OBJECT_TYPE_NAME")
TABLESPACE "PORTAL" PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 80K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING
Note that when I deleted the newly inserted groups, the CPU consumption immediately went down from 100% to some 2-3%.
This behaviour has been observed on a Sun Solaris system, but I think it's the same on NT (I have observed it during the bulk load on my NT laptop, but so far have not had the time to test further.).
Also note: In the call to addGroupToList, I set owner to true for all groups.
Also note: During loading of the groups, I logged a few errors, all of the same type ("PORTAL30.WWSEC_API", line 2075), as follows:
Error: Problem calling addGroupToList for child group'Marketing' (8030), list 'NO_OSL_Usenet'(8017). Reason: java.sql.SQLException: ORA-06510: PL/SQL: unhandled user-defined exception ORA-06512: at "PORTAL30.WWSEC_API", line 2075
Please help. If you like, I may supply the tables and the java program that I use. It's fully reproducable.
Thanks,
Erik Hagen (you may call me on +47 90631013)
null

YES!
I have now tested with insertion of the missing indexes. It seems the call to addGroupToList takes just as long time as before, but the result is much better: WITH THE INDEXES DEFINED, THERE IS NO LONGER A PERFORMANCE PROBLEM!! The index definitions that I used are listed below (I added these to the ones that are there in Portal 3.0.8, but I guess some of those could have been deleted).
About the info at http://technet.oracle.com:89/ubb/Forum70/HTML/000894.html: Yes! Thanks! Very interesting, and I guess you found the cause for the error messages and maybe also for the performance problem during bulk load (I'll look into it as soon as possible anbd report what I find.).
Note: I have made a pretty foolproof and automated installation script (or actually, it's part of my Java program), that will let anybody interested recreate the problem. Mail your interest to [email protected].
============================================
CREATE INDEX "PORTAL30"."LDAP_WWSEC_PERS_IX1"
ON "PORTAL30"."WWSEC_PERSON$"("MANAGER")
TABLESPACE "PORTAL" PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 32K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING;
CREATE INDEX PORTAL30.LDAP_WWSEC_PERS_IX2
ON PORTAL30.WWSEC_PERSON$('ORGANIZATION')
TABLESPACE PORTAL PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 32K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING;
CREATE INDEX PORTAL30.LDAP_WWSEC_PERS_PK
ON PORTAL30.WWSEC_PERSON$('ID')
TABLESPACE PORTAL PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 32K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING;
CREATE INDEX PORTAL30.LDAP_WWSEC_PERS_UK
ON PORTAL30.WWSEC_PERSON$('USER_NAME')
TABLESPACE PORTAL PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 32K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING;
CREATE INDEX PORTAL30.LDAP_WWWSEC_FLAT_UK
ON PORTAL30.WWSEC_FLAT$("GROUP_ID", "PERSON_ID",
"SPONSORING_MEMBER_ID")
TABLESPACE PORTAL PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 256K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 0 FREELISTS 1);
CREATE INDEX PORTAL30.LDAP_WWWSEC_FLAT_PK
ON PORTAL30.WWSEC_FLAT$("ID")
TABLESPACE PORTAL PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 256K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 0 FREELISTS 1);
CREATE INDEX PORTAL30.LDAP_WWWSEC_FLAT_IX5
ON PORTAL30.WWSEC_FLAT$("GROUP_ID", "PERSON_ID")
TABLESPACE PORTAL PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 256K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 0 FREELISTS 1);
CREATE INDEX PORTAL30.LDAP_WWWSEC_FLAT_IX4
ON PORTAL30.WWSEC_FLAT$("SPONSORING_MEMBER_ID")
TABLESPACE PORTAL PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 256K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 0 FREELISTS 1);
CREATE INDEX PORTAL30.LDAP_WWWSEC_FLAT_IX3
ON PORTAL30.WWSEC_FLAT$("GROUP_ID")
TABLESPACE PORTAL PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 256K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 0 FREELISTS 1);
CREATE INDEX PORTAL30.LDAP_WWWSEC_FLAT_IX2
ON PORTAL30.WWSEC_FLAT$("PERSON_ID")
TABLESPACE PORTAL PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 256K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 0 FREELISTS 1);
CREATE INDEX "PORTAL30"."LDAP_WWSEC_SYSP_IX1"
ON "PORTAL30"."WWSEC_SYS_PRIV$"("GRANTEE_GROUP_ID")
TABLESPACE "PORTAL" PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 56K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING;
CREATE INDEX "PORTAL30"."LDAP_WWSEC_SYSP_IX2"
ON "PORTAL30"."WWSEC_SYS_PRIV$"("GRANTEE_USER_ID")
TABLESPACE "PORTAL" PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 56K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING;
CREATE INDEX "PORTAL30"."LDAP_WWSEC_SYSP_IX3"
ON "PORTAL30"."WWSEC_SYS_PRIV$"("OBJECT_TYPE_NAME", "NAME")
TABLESPACE "PORTAL" PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 56K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING;
CREATE INDEX "PORTAL30"."LDAP_WWSEC_SYSP_PK"
ON "PORTAL30"."WWSEC_SYS_PRIV$"("ID")
TABLESPACE "PORTAL" PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 56K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING;
CREATE INDEX "PORTAL30"."LDAP_WWSEC_SYSP_UK"
ON "PORTAL30"."WWSEC_SYS_PRIV$"("OBJECT_TYPE_NAME",
"NAME", "OWNER", "GRANTEE_TYPE", "GRANTEE_GROUP_ID",
"GRANTEE_USER_ID")
TABLESPACE "PORTAL" PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE ( INITIAL 32K NEXT 88K MINEXTENTS 1 MAXEXTENTS 4096
PCTINCREASE 1 FREELISTS 1)
LOGGING;
==================================
Thanks,
Erik Hagen
null

How to improve performance for Azure Table Storage bulk loads

Hello all,
Would appreciate your help as we are facing a challenge.
We are tried to bulk load Azure table storage. We have a file that contains nearly 2 million rows.
We would need to reach a point where we could bulk load 100000-150000 entries per minute. Currently, it takes more than 10 hours to process the file..
We have tried Parallel.Foreach but it doesn't help. Today I discovered Partitioning in PLINQ. Would that be the way to go??
Any ideas? I have spent nearly two days in trying to optimize it using PLINQ, but still I am not sure what is the best thing to do.
Kindly, note that we shouldn't be using SQL/Azure SQL for this.
I would really appreciate your help.
Thanks

I'd think you're just pooling the parallel connections to Azure, if you do it on one system. You'd also have a bottleneck of round trip time from you, through the internet to Azure and back again.
You could speed it up by moving the data file to the cloud and process it with a Cloud worker role. That way you'd be in the datacenter (which is a much faster, more optimized network.)
Or, if that's not fast enough - if you can split the data so multiple WorkerRoles could each process part of the file, you can use the VM's scale to put enough machines to it that it gets done quickly.
Darin R.

Unable to perform bulk load in BODS 3.2

Hi
We have upgraded our Development server from BODS 3.0 to BODS 3.2. There is a dataflow wherein the job uses the Bulk load option. The job is giving warnings at that dataflow and all the data is shown as warnings in the log. No data is loaded to the Target table. We have recently migrated SQL Server 2005 to SQL Server 2008. Will someone let me know why the Bulk load option is not working in BODS 3.2
Kind Regards,
Mahesh

Hi,
I want to upgrade SQL Server 2005 to SQL server 2008 with BODS 4.0.
I want to know the recommandations for do it.
- How to use SQL Server 2008 with Bods?
- What are the performece on SQL server 2008?
- What are the things to evaluate?
- Is it necessary migrate with BackUp restore mode ?
- What are the step of migration?
- Can we merge the disabled in BODS?

Bulk loading BLOBs using PL/SQL - is it possible?

Hi -
Does anyone have a good reference article or example of how I can bulk load BLOBs (videos, images, audio, office docs/pdf) into the database using PL/SQL?
Every example I've ever seen in PL/SQL for loading BLOBs does a commit; after each file loaded ... which doesn't seem very scalable.
Can we pass in an array of BLOBs from the application, into PL/SQL and loop through that array and then issue a commit after the loop terminates?
Any advice or help is appreciated. Thanks
LJ

It is easy enough to modify the example to commit every N files. If you are loading large amounts of media, I think that you will find that the time to load the media is far greater than the time spent in SQL statements doing inserts or retrieves. Thus, I would not expect to see any significant benefit to changing the example to use PL/SQL collection types in order to do bulk row operations.
If your goal is high performance bulk load of binary content then I would suggest that you look to use Sqlldr. A PL/SQL program loading from BFILEs is limited to loading files that are accessible from the database server file system. Sqlldr can do this but it can also load data from a remote client. Sqlldr has parameters to control batching of operations.
See section 7.3 of the Oracle Multimedia DICOM Developer's Guide for the example Loading DICOM Content Using the SQL*Loader Utility. You will need to adapt this example to the other Multimedia objects (ORDImage, ORDAudio .. etc) but the basic concepts are the same.
Once the binary content is loaded into the database, you will need a to write a program to loop over the new content and initialize the Multimedia objects (extract attributes). The example in 7.3 contains a sample program that does this for the ORDDicom object.

PL/SQL Bulk Loading

Hello,
I have one question regarding bulk loading. I did lot of bulk loading.
But my requirement is to call function which will do some DML operation and give ref key so that i can insert to fact table.
Because i can't use DML function in select statement. (which will give error). otherway is using autonomous transaction. which i tried working but performance is very slow.
How to call this function inside bulk loading process.
Help !!
xx_f is function which is using autonmous transction,
See my sample code
declare
cursor c1 is select a,b,c from xx;
type l_a is table of xx.a%type;
type l_b is table of xx.b%type;
type l_c is table of xx.c%type;
v_a l_a;
v_b l_b;
v_c l_c;
begin
open c1;
loop
fetch c1 bulk collect into v_a,v_b,v_c limit 1000;
exit when c1%notfound;
begin
forall i in 1..v_a.count
insert into xxyy
(a,b,c) values (xx_f(v_a(i),xx_f(v_b(i),xx_f(v_c(i));
commit;
end bulkload;
end loop;
close c1;
end;
I just want to call xx_f function without autonoumous transaction.
but with bulk loading. Please let me if you need more details
Thanks
yreddyr

Can you show the code for xx_f? Does it do DML, or just transformations on the columns?
Depending on what it does, an alternative could be something like:
DECLARE
   CURSOR c1 IS
      SELECT xx_f(a), xx_f(b), xx_f(c) FROM xx;
   TYPE l_a IS TABLE OF whatever xx_f returns;
   TYPE l_b IS TABLE OF whatever xx_f returns;
   TYPE l_c IS TABLE OF whatever xx_f returns;
   v_a l_a;
   v_b l_b;
   v_c l_c;
BEGIN
   OPEN c1;
   LOOP
      FETCH c1 BULK COLLECT INTO v_a, v_b, v_c LIMIT 1000;
      BEGIN
         FORALL i IN 1..v_a.COUNT
            INSERT INTO xxyy (a, b, c)
            VALUES (v_a(i), v_b(i), v_c(i));
      END;
      EXIT WHEN c1%NOTFOUND;
   END LOOP;
   CLOSE c1;
END;John

Bulk load in OIM 11g enabled with LDAP sync

Have anyone performed bulk load of more than 100,000 users using bulk load utility in OIM 11g ?
The challenge here is we have OIM 11.1.1.5.0 environment enabled with LDAP sync.
We are trying to figure out some performance factors and best way to achieve our requirement
1.Have you performed any timings around use of Bulk Load tool. Any idea how long will it take to LDAP sync more than 100,000 users into OID. What are the problems that we could encounter during this flow ?
2.Is it possible we could migrate users into another environment and then swap this database for the OIM database? Also is there any effective way to load into OID directly ?
3.We also have some custom Scheduled Task to modify couple of user attributes (using update API) from the flat file. Have you guys tried such scenario after the bulk load ? And did you face any problem while doing so ?
Thanks
DK

to Update a UDF you must assign a copy value adpter in Lookup.USR_PROCESS_TRIGGERS（design console / lookup definition）
eg.
CODE --------------------------DECODE
USR_UDF_MYATTR1----- Change MYATTR1
USR_UDF_MYATTR2----- Change MYATTR2
Edited by: Lighting Cui on 2011-8-3 上午12:25

Retry "Bulk Load Post Process" batch

Hi,
First question, what is the actual use of the scheduled task "Bulk Load Post Process"? If I am not sending out email notification, nor LDAP syncing nor generating the password do I still need to run this task after performing a bulk load through the utility?
Also, I ran this task, now there are some batches which are in the "READY FOR PROCESSING" state. How do I re-run these batches?
Thanks,
Vishal

The scheduled task carries out post-processing activities on the users imported through the bulk load utility.

Anyone know setting primary key deferred help in the bulk loading

Hi,
Anyone know by setting primary key deferred help in the bulk loading in term of performance..cos i do not want to disable the index, cos when user query the existing records in the table, it will affect the search query.
Thank You...

In the Oracle 8.0 documentation when deferred constraints were introduced Oracle stated that defering testing the PK constraint until commit time was more efficient than testing each constraint at the time of insert.
I have never tested this assertion.
In order to create a deferred PK constraint the index used to support the PK must be created as non-unique.
HTH -- Mark D Powell --

Bulk loading in 11.1.0.6

Hi,
I'm using bulk load to load about 200 million triples into one model in 11.1.0.6. The data is splitted into about 60 files with around 3 millions triples in each file. I have a script file which has
host sqlldr ...FILE1;
exec sem_apis.bulk_load_from_staging_table(...);
host sqlldr ...FILE2;
exec sem_apis.bulk_load_from_staging_table(...);
for every file to load.
When I run the script from command line, it looks that the time needed for the loading grows as more files are loaded. The first file took about 8 min to load, the second file took about 25 min,... It's now taking 2 and half hour to load one file after completing loading 14 files.
Is index rebuild causing this behavior? If that's the case is there any way to turn off the index during bulk loading? If the index rebuild is not the case what other parameters can we adjust to speed up the bulk loading?
Thanks,
Weihua

Bulk-append is slower than bulk-load because of incremental index maintenance. The uniqueness constraint enforcing index cannot be disabled. I'd suggest moving to 11.1.0.7 and then installing patch 7600122 to be able to make use of enhanced bulk-append that performs much better than in 11.1.0.6.
The best way to load 200 million rows in 11.1.0.6 would be to load into an empty RDF model via a single bulk-load. You can do it as follows (assuming the filenames are f1.nt thru f60.nt):
- [create a named pipe] mkfifo named_pipe.nt
- cat f*.nt > named_pipe.nt
on a different window:
- run sqlldr with named_pipe.nt as the data file to load all 200 million rows into a staging table (you could create staging table with COMPRESS option to keep the size down)
- next, run exec sem_apis.bulk_load_from_staging_table(...);
(I'd also suggest use of COMPRESS for the application table.)

Bulk Load API Status

Hi,
I'm using Oracle Endeca 2.3.
I encountered a problem in data integrator, Some batch of records were missing in the Front end and when I checked the status of Graph , It Showed "Graph Executed sucessfully".
So, I've connected the Bulk loader to "Universal data writer" to see the data domain status of the bulk load.
I've listed the results below, However I'm not able to interpret the information from the status and I've looked up the documentation but I found nothing useful.
0|10000|0|In progress
0|11556|0|In progress
0|20000|0|In progress
0|30000|0|In progress
0|39891|0|In progress
0|39891|0|In progress
0|39891|0|In progress
0|39891|0|In progress
0|39891|0|In progress
0|39891|0|In progress
40009|-9|0|In progress
40009|9991|0|In progress
40009|19991|0|In progress
40009|20846|0|In progress
Could anyone enlighten me more about this status.
Also,Since these messages are a part of "Post load", I'm wondering why is it still showing "In-Progress".
Cheers,
Khurshid

I assume there was nothing of note in the dgraph.log?
The other option is to see what happens when you either:
A) filter your data down to the records that are missing prior to the load and see what happens
Or
B) use the regular data ingest API rather than the bulk.
Option b will definitely perform much worse on 2.3 so it may not be feasible.
The other thing to check is that your record spec is truly unique. The only time I can remember seeing an issue like this was loading a record, then loading a different record with the same spec value. The first record would get in and then be overwritten by the second record making it seem like the first record was dropped. Figured it would be worth checking.
Patrick Rafferty
Branchbird

Bulk load issue

Hi there
Just wanted to know is it a bug or feature - in case if Column store table has non capital letters in its name, then bulk load does not work and performance is ruined?
Mike

it looks like we're having performance issues here and bulk load is failing because of connection method which is being used with Sybase RS. If we use standard ODBC then everything is as it should be, but as soon as we swith to .NET world then nothing happens, silngle inserts/updates are ok
So, we have Application written in mixed J2ee/.NET and we use HANA applience as host for tables, Procedures and views.
This issues hs been sent to support, will update as soon as i get smth from them

Bulk Load into SAP system from external application

Hi,
Is there a way to perform a bulk load of data into a SAP system from an external application?
Thanks
Simon

Hello,
My external application is a C program and I think I want to use IDocs and RFC to communicate with the SAP system.
Simon

Bulk Loading vs Ingest Record Adding(Updating)

Hi
We are integrating Oracle Endeca Server with our application. We already have set of interfaces which extracts record definitions and records from our database.
I created necessary classes to create record definitions in Oracle Endeca DataSource, but I have difficulties with selection of right approach with loading of records into DataSource. I wonder that is better to use Ingest or Bulk Loading.
We have constantly changing data in our application and time to time we had to update existing records in DataRecords. From point of view of our API there is no difference between adding new records and updating existing records. As I learned from documentation Oracle Endeca Server performs updating multivalued properties by adding new data to them and by ignoring changes in single-valued properties. In our case we need a replacement existing values. I figured out that I will need to use addAssignments and deleteRecords in ingestRecords operation. In this case existing record will be deleted and new will be created. However what happens if record is not exist? I guess deleteRecords will fail, but will engine still execute addAssignments?
Considering usage of Bulk Loading I also have some questions.
What Bulk Loading will do if record it's trying to insert is already exist?
If I have attribute definition of type boolean could I use
// entryCur.getKey() is String
// entryCur.getValue() is String
assignmentBuilder.setName(entryCur.getKey()).setDataType(Data.Assignment.DataType.STRING).setStringValue(entryCur.getValue()) instead of setting to BOOLEAN?
I guess Builder.set<Type>Value allows to set up multivalued attributes with multi calls?
Thanks
Eugene.

Hi Eugene,
I am an Endeca developer working on ingest. The situation you describe is perfectly suited by bulk ingest; it is exactly this use pattern that bulk ingest was designed for. If the record to put duplicates the spec of an already-existing record, the record currently in the data store is replaced by the incoming record.
If you still wish to use the data ingest web service (DIWS), this is still an option. The pattern you describe is correct. If you attempt to delete the record that does not exist, Endeca Server will do nothing; again, this is the use case it is designed for.
Hope this helps,
Dave

Slow bulk load

I have a IOT table and am trying to bulk load 500M rows into it. Using sqlldr it looks like it will take a week while I was able to load MySQL in 3 hours. If anyone has any helpful advice to speed up the load it would be greatly appreciated.
My control file contains
OPTIONS (DIRECT=TRUE, ERRORS=5000000)
UNRECOVERABLE LOAD DATA
APPEND
into table applications
fields terminated by "|"
TRAILING NULLCOLS
( list of my 37 columns... )
And I run the command
sqlldr mydatabase control=loader.ctl skip_index_maintenance=true log=log.out data=mydata.dat
I tried using SORTED INDEXES in the control file however sqlldr would complain records were not in sorted order though they are. I read the error is caused by crossing extents. Should I just set the table to have a 200GB extent?

The following may be of assistance:
HW enqueue contention with LOB
As a workaround is it possible to reduce the number of concurrent updates to the LOBs (say reduce 8 to 4)? Also with a PCTVERSION of 0 there is a possibility you may get a snapshot too old error unless you've set RETENTION to some value (really depends on workload).
You've probably seen this although not much help for your problem:
LOB Performance Guidelines

Bulk-loading performance

Similar Messages

Maybe you are looking for