Query performance problem - events 2505-read cache and 2510-write cache

Hi,
I am experiencing severe performance problems with a query, specifically with events 2505 (Read Cache) and 2510 (Write Cache) which went up to 11000 seconds on some executions. Data Manager (400 s), OLAP data selection (90 s) and OLAP user exit (250 s) are other the other event with noticeable times. All other events are very quick.
The query settings (RSRT) are
persistent cache across each app server -> cluster table,
update cache in delta process is checked ->group on infoprovider type
use cache despite virtual characteristics/key figs checked (one info-cube has1 virtual key figure which should have a static result for a day)
=>Do you know how I can get more details than what's in 0TCT_C02 to break down the read and write cache events time or do you have any recommandation?
I have checked and no dataloads were in progres on the info-providers and no master data loads (change run). Overall system performance was acceptable for other queries.
Thanks

Hi,
Looks like you're using BDB, not BDB JE, and this is the BDB JE forum. Could you please repost here?:
Berkeley DB
Thanks,
mark

Similar Messages

LOCAL OLAP CACHE AND GLOBAL OLAP CACHE

what is local olap cache and global olap cache...
what is the difference between ....can you explain scenario plz...
will reward with points
thanks in advance

Hello GURU
Local cache is specific to a user, before BW 3.0 it was only local cache available...if a user run the query data will come to cache from infoprovider and next time same query will not go to Data base instead it will fetch data from cache memory...this cache will be used only for that particular user...if some other user try the same query it will not pick up dta from cache.....
BW 3.0 onward we have global cache which means several user can access same cache for the same query or data which is related in cache...
Thanks
Tripple k

Difference between Presentation Server Cache and BI Server Cache

Hello Experts,
What is the Diff b/w Presentation Server Cache and BI Server Cache
Thanks,
S Gouda

Hello,
Okay, what do you want to do about caching at BI server and Presentation server.
A nQSXXXX.tmp is a temporary cache file which maintained by the BI Server for an analysis request by a user and is kind of shared data between the OBI Server and the OBI Presentation server. This is refereed as the 'Cursor Cache' which could be managed by going thru Administration> Manage Sessions > Clear Cursor Cache.These .tmp files are it is not related to BI Server cache.
By Default, the BI Server Cache is stored in the and stored as NQSxxxxx.tbl files. [middlware_home]/instances/instance1/bifoundation/OracleBIServerComponent/coreapplication_obis1/cache]
Caching occurs by default at the subrequest level, which results in multiple cache entries for some SQL statements. Caching subrequests improves performance and the cache hit ratio, especially for queries that combine real-time and historical data.
Below are some useful links for cache management in OBIEE 11g.
http://oraclebisolutions.blogspot.com/2013/02/obiee-11g-obi-server-and-presentation.html
http://drazda.blogspot.com/2012/10/obiee-11g-cache-management.html
http://allaboutobiee.blogspot.in/2012/03/cache-management-purging-cache.html
Pls mark itf this helps. Else post the exact questions you have about this post.
Thanks,
SVS

Performance Problems on Faces Navigation Diagram and Hyperthreading query

Am I the only one having performance problems when dealing with Faces-Config Diagrams of about 35 JSPs displayed on the sheet. using Jdev 10.1.3 It's taking my workstation about a full minute and a half to update the name of an arrow. The most stressed component during this task seems to be the CPU.
And just another question has anybody investigated how is the performance of Jdev affected by either enabling or disabling hyperthreading? In my case my CPU usage manages to reach only 50%. I'm tempted to switch HT off to let JDev use all the cpu power. if that would be the case.

Hello Diego,
you mentioned that you compared a BEx Query with the Web INtelligence report. Could you provide more details here ?
- what are the elements in the rows, columns and free characterisitcs in the BEx Query ?
- was the query execute as designed in the BEx Query Designer with BEx Web Reporting ?
- what are the elements in the WebIntelligence Query panel ?
thanks
Ingo

Fuzzy searching and concatenated datastore query performance problems.

I am using the concatenated datastore and indexing two columns.
The query I am executing includes an exact match on one column and a fuzzy match on the second column.
When I execute the query, performance should improve as the exact match column is set to return less values.
This is the case when we execute an exact match search on both columns.
However, when one column is an exact match and the second column is a fuzzy match this is not true.
Is this normal processing??? and why??? Is this a bug??
If you need more information please let me know.
We are under a deadline and this is our final road block.
TIA
Colleen GEislinger

I see that you have posted the message in the Oracle text forum, good! You should get a better, more timely answer there.
Larry

Performance Problems with "For all Entries" and a big internal table

We have big Performance Problems with following Statement:
SELECT * FROM zeedmt_zmon INTO TABLE gt_zmon_help
FOR ALL ENTRIES IN gt_zmon_help
    WHERE
    status = 'IAI200' AND
    logdat IN gs_dat AND
    ztrack = gt_zmon_help-ztrack.
In the internal table gt_zmon_help are over 1000000 entries.
Anyone an Idea how to improve the Performance?
Thank you!

>
Matthias Weisensel wrote:
> We have big Performance Problems with following Statement:
>
>
SELECT * FROM zeedmt_zmon INTO TABLE gt_zmon_help
>   FOR ALL ENTRIES IN gt_zmon_help
>     WHERE
>     status = 'IAI200' AND
>     logdat IN gs_dat AND
>     ztrack = gt_zmon_help-ztrack.
>
> In the internal table gt_zmon_help are over 1000000 entries.
> Anyone an Idea how to improve the Performance?
>
> Thank you!
You can't expect miracles. With over a million entries in your itab any select is going to take a bit of time. Do you really need all these records in the itab? How many records is the select bringing back? I'm assuming that you have got and are using indexes on your ZEEDMT_ZMON table.
In this situation, I'd first of all try to think of another way of running the query and restricting the amount of data, but if this were not possible I'd just run it in the background and accept that it is going to take a long time.

Query performance problem

I am having performance problems executing a query.
System:
Windows 2003 EE
Oracle 9i version 9.2.0.6
DETAIL table with 120Million rows partitioned in 19 partitions by SD_DATEKEY field
We are trying to retrieve the info from an account (SD_KEY) ordered by date (SD_DATEKEY). This account has about 7000 rows and it takes about 1 minute to return the first 100 rows ordered by SD_DATEKEY. This time should be around 5 seconds to be acceptable.
There is a partitioned index by SD_KEY and SD_DATEKEY.
This is the query:
SELECT * FROM DETAIL WHERE SD_KEY = 'xxxxxxxx' AND ROWNUM < 101 ORDER BY SD_DATEKEY
The problem is that all the 7000 rows are read prior to be ordered. I think that it is not necessary for the optimizer to access all the partitions to read all the rows because only the first 100 are needed and the partitions are bounded by SD_DATEKEY.
Any idea to accelerate this query? I know that including a WHERE clause for SD_DATEKEY will increase the performance but I need the first 100 rows and I don't know the date to limit the query.
Anybody knows if this time is a normal response time for tis query or should it be improved?
Thank to all in advance for the future help.

Thank to all for the replies.
- We have computed statistics and no changes in the response time.
- We are discussing about restrict the query to some partitions but for the moment this is not the best solution because we don't know where are the latest 100 rows.
- The query from Maurice had the same response time (more or less)
select * from
(SELECT * FROM DETAIL WHERE SD_KEY = 'xxxxxxxx' ORDER BY SD_DATEKEY)
where ROWNUM < 101
- We have a local index on SD_DATEKEY. Do we need another one on SD_KEY? Should it be created as BITMAP?
I can't test inmediately your sugestions because this is a problem with one of our customers. In our test system (that has only 10Million records) the indexes accelerate the query but this is not the same in the customer system. I think the problem is the total records on the table.

Query Performance Problem!! Oracle 25 minutes || SQLServer 3 minutes

Hi all,
I'm having a performance problem with this query bellow. It runs in 3 minutes on SQLServer and 25 minutes in Oracle.
SELECT
CASE WHEN (GROUPING(a.estado) = 1) THEN 'TOTAL'
ELSE ISNULL(a.estado, 'UNKNOWN')
END AS estado,
CASE WHEN (GROUPING(m.id_plano) = 1) THEN 'GERAL'
ELSE ISNULL(m.id_plano, 'UNKNOWN')
END AS id_plano,
sum(m.valor_2s_parcelas) valor_2s_parcelas,
convert(decimal(15,2),convert(int,sum(convert(int,(m.valor_2s_parcelas+.0000000001)*100)*
isnull(e.percentual,0.0))/100.0+.0000000001))/100 BB_Educar
FROM
movimento_dco m ,
evento_plano e,
agencia_tb a
WHERE
m.id_plano = e.id_plano
AND m.agencia *= a.prefixo
--AND m.id_plano LIKE 'pm60%'
AND m.data_pagamento >= '20070501'
AND m.data_pagamento <= '20070531'
AND m.codigo_retorno = '00'
AND m.id_parcela > 1
AND m.valor_2s_parcelas > 0.
AND e.id_evento = 'BB-Educar'
AND a.banco_id = '001'
AND a.ordem = '00'
group by m.id_plano, a.estado WITH ROLLUP
order by a.estado, m.id_plano DESC
Can anyone help me with this query?

What version of Oracle, what version of SQL? Are the tables the same exact size? are they both indexed the same? Are you running on the some or similar hardware? Are the Oracle parameters similar like SGA size and PGA_AGGREGATE Target? Did you run statistics in Oracle?
Did you compare execution plans in SQL Server vs Oracle to see if SQl Servers execution plan is more superior than the one Oracle is trying to use? (most likely stale statistics).
There are many variables and we need more information than just the Query : ).

Performance problem with Integration with COGNOS and Bex

Hi Gems
I have a performance problem with some of my queries when integrating with the COGNOS
My query is simple which gets the data for the date interval : "
From Date: 20070101
To date:20070829
When executing the query in the Bex it takes 2mins but when it is executed in the COGNOS it takes almost 10mins and above..
Any where can we debug the report how the data is sending to the cognos. Like debugging the OLEDB ..
and how to increase the performance.. of the query in the Cognos ..
Thanks in Advance
Regards
AK

Hi,
Please check the following CA Unicenter config files on the SunMC server:
- is the Event Adapter (ea-start) running ?, without these daemon no event forwarding is done the CA Unicenter nor discover from Ca unicenter is working.
How to debug:
- run ea-start in debug mode:
# /opt/SUNWsymon/SunMC-TNG/sbin/ea-start -d9
- check if the Event Adaptor is been setup,
# /var/opt/SUNWsymon/SunMC-TNG/cfg_sunmctotng
- check the CA log file
# /var/opt/SUNWsymon/SunMC-TNG/SunMCToTngAdaptorMain.log
After that is all fine check this side it explains how to discover an SunMC agent from CA Unicenter.
http://docs.sun.com/app/docs/doc/817-1101/6mgrtmkao?a=view#tngtrouble-6
Kind Regards

Performance problem between Oracle.DataAccess v1 and v2

Hi, I have serious performance problem with OracleDataReader when I use the GetValues method.
My server is Oracle 9.2.0.7, and i use ODAC v10.2.0.221
I create a dummy table for benchmark :
create table test (a varchar2(50), b number)
begin
for i in 1..62359 loop
insert into test values ('Values ' || i, i);
end loop;
commit;
end;
I use the same code for benchmark Framework v1 and Framework v2.
Code :
try {
OracleConnection c = new OracleConnection("user id=saturne_dbo;password=***;data source=satedfx;");
c.Open();
go(c);
c.Close();
catch (Exception ex) {
MessageBox.Show(ex.Message);
private void go(IDbConnection c) {
IDbCommand cmd = c.CreateCommand();
cmd.CommandText = "select * from test";
cmd.CommandType = CommandType.Text;
DateTime dt = DateTime.Now;
IDataReader reader = cmd.ExecuteReader();
int count = 0;
while (reader.Read()) {
object[] fields = new object[reader.FieldCount];
reader.GetValues(fields);
count++;
reader.Close();
TimeSpan eps = DateTime.Now - dt;
MessageBox.Show("Time " + count + " : " + eps.TotalSeconds);
Result are :
Framework v1 with OracleDataAccess 1.10.2.2.20 "Time 62359 : 0.5"
Framework v2 with OracleDataAccess 2.10.2.2.20 "Time 62359 : 3.57" FACTOR 6 !!!!!
I notice same problem with oleDb provider and Microsoft Oracle Client provider..
It's a serious problem for my production server, the time calculation explode...
Where is the explication ?
Do u know solution ?

Can you please try out following -
1. Create a .NET 1.x DLL with your benchmark code. This will obviously use ODP.NET for .NET 1.x.
2. Call this assembly routine from a .NET 1.x executable and note the results.
3. Now call this assembly routine from a .NET 2.0 executable and note the results.
The idea is to always use "ODP.NET for .NET 1.x" even in .NET 2.0 runtime. This will tell us whether the performance degradation is a runtime issue.

Spaces in URLs problem w/ new Reader X and XI updates

Can anyone else confirm that with the new Reader X and XI updates, spaces in URLs get encoded as %2520 instead of %20 causing errors when the links are followed? This was/is not a problem with the earlier version of X nor with other pdf viewers. I believe that this is a bug.

I am having the same problem. Details ....
I have created two PDFs in the same folder (I used Power Point to create the contents, and saved as PDF). I want to be able to click a link in the first PDF and cause the second PDF to be opened.
Everything works fine if there are no blank spaces in the filename of the second PDF (i.e., the target of the link in the first PDF), and no blank spaces in any part of the full path to the second PDF.
The problem occurs if there are blank spaces in the second PDF's filenames or full path.
If there are blank spaces (in either filename or path), then when I open the first PDF and hover over the link to the second (target) .... I can see that FOR THE PATH, each blank space has been converted to a %20. FOR THE FILENAME of the target PDF, the blank space shows as a blank space (i.e., not shown as a %20).
When I click the link in the first PDF, my default browser (Firefox 19.0.2) opens and displays a dialog box tells me that it cannot find the file in question (i.e., the second PDF).
The file name in the dialog box has each blank space (in both filename and path) converted to a %20 .... HOWEVER, the contents of the browser's address bar has each blank space converted to a %2520.
It is that address of the second PDF shown in the browser's address bar (with each blank replaced by a %2520) that cannot be found. If I manually alter the contents of the address bar, and remove each 25 (so that each blank is replaced only by a %20) and then hit the Enter key, the second PDF is successfully displayed in the browser.
Firefox now has it's own PDF viewer. However, I disabled the internal Firefox PDF viewer, and set the default PDF viewer to the Adobe viewer. Problem did not go away. So, I do not think the fault is in the Firefox PDF viewer.
The previous post stated that Adobe has acknowledged this as a bug. Any word on when/if the bug will be fixed.
The only work-around I have found is to use filenames and paths that have no blanks. But if one is creating PDFs for customers, then while one might be able to use filenames with no blanks, one cannot insure that customers will not place the PDFs into folders that have noblanks in their paths.
Any other suggestions while we wait for a fix?
EDIT: I found a similar thread here: http://forums.adobe.com/thread/1053575?tstart=0

Hibernate's dual-layer cach and TopLink's caching strategy

Dear members,
I understand that caching between hibernate and toplink is implemented (or utilized) differently. Hibernate seems to have 'dual-layer caching' (which may imply they have two layers of cache) whereas TopLink has session cache and shared cache. The way I see it, they seem to be aiming for the same thing. Are there any differences between (obviously there are, only that I do not know them) those two caching architectures, and how different are they?
Howard

Yes there are differences :) For details check out
TopLink vs Hibernate... revisited... again :)
and
Indirection - how are references resolved after session has been closed?

Acrobat Reader 9 and Acrobat Writer 7

Good morning
I have a nice issue with these 2 adobe products
We have 2 products installed in our pc: Adobe Reader 9.x and Adobe Writer 7
We have developped a SAP-Procedure who load a PDF form in wich users can modify some data
The procedure need Adobe 8.1.x (up) but the presence of Reader 9 and Writer 7 will generate an error loading the form
IS it possible to unregister the Adobe 7 module loaded in IE and re-load Adobe Reader 9 activex modules?
Best regards
Nicola G.

Thanks for the answer. I've found a little workaround... The only way is
uninstall Adobe Reader 9.x and re-install SO the program could
re-register AR components in IE. This because AA7 lock some keys and cannot unregister them manually!

Write-Behind Caching and Limited Internal Cache Size

Let's say I have a write-behind cache and configure its internal cache to be of a fixed limited size, e.g. 10000 units. What would happen if more than 10000 units are added to the write-behind cache within the write-delay period? Would my CacheStore's storeAll() get all of the added values or would some of the values be missed because of the internal cache size limitation?

Hi Denis,     >
     > If an entry is removed while it is still in the
     > write-behind queue, it will be removed from the queue
     > and CacheStore.store(oKey, oValue) will be invoked
     > immediately.
     >
     > Regards,
     > Dimitri
     Dimitri,
     Just to confirm, that I understand it right if there is a queued update to a key which is then remove()-ed from the cache, then the following happens:
     First CacheStore.store(key, queuedUpdateValue) is invoked.
     Afterwards CacheStore.erase(key) is invoked.
     Both synchronously to the remove() call.
     I expected only erase will be invoked.
     BR,
     Robert

BDB read performance problem: lock contention between GC and VM threads

Problem: BDB read performance is really bad when the size of the BDB crosses 20GB. Once the database crosses 20GB or near there, it takes more than one hour to read/delete/add 200K keys.
After a point, of these 200K keys there are about 15-30K keys that are new and this number eventually should come down and there should not be any new keys after a point.
Application:
Transactional Data Store application. Single threaded process, that's trying to read one key's data, delete the data and add new data. The keys are really small (20 bytes) and the data is large (grows from 1KB to 100KB)
On on machine, I have a total of 3 processes running with each process accessing its own BDB on a separate RAID1+0 drive. So, according to me there should really be no disk i/o wait that's slowing down the reads.
After a point (past 20GB), There are about 4-5 million keys in my BDB and the data associated with each key could be anywhere between 1KB to 100KB. Eventually every key will have 100KB data associated with it.
Hardware:
16 core Intel Xeon, 96GB of RAM, 8 drive, running 2.6.18-194.26.1.0.1.el5 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
BDB config: BTREE
bdb version: 4.8.30
bdb cache size: 4GB
bdb page size: experimented with 8KB, 64KB.
3 processes, each process accesses its own BDB on a separate RAIDed(1+0) drive.
envConfig.setAllowCreate(true);
envConfig.setTxnNoSync(ourConfig.asynchronous);
envConfig.setThreaded(true);
envConfig.setInitializeLocking(true);
envConfig.setLockDetectMode(LockDetectMode.DEFAULT);
When writing to BDB: (Asynchrounous transactions)
TransactionConfig tc = new TransactionConfig();
tc.setNoSync(true);
When reading from BDB (Allow reading from Uncommitted pages):
CursorConfig cc = new CursorConfig();
cc.setReadUncommitted(true);
BDB stats: BDB size 49GB
$ db_stat -m
3GB 928MB Total cache size
1 Number of caches
1 Maximum number of caches
3GB 928MB Pool individual cache size
0 Maximum memory-mapped file size
0 Maximum open file descriptors
0 Maximum sequential buffer writes
0 Sleep after writing maximum sequential buffers
0 Requested pages mapped into the process' address space
2127M Requested pages found in the cache (97%)
57M Requested pages not found in the cache (57565917)
6371509 Pages created in the cache
57M Pages read into the cache (57565917)
75M Pages written from the cache to the backing file (75763673)
60M Clean pages forced from the cache (60775446)
2661382 Dirty pages forced from the cache
0 Dirty pages written by trickle-sync thread
500593 Current total page count
500593 Current clean page count
0 Current dirty page count
524287 Number of hash buckets used for page location
4096 Assumed page size used
2248M Total number of times hash chains searched for a page (2248788999)
9 The longest hash chain searched for a page
2669M Total number of hash chain entries checked for page (2669310818)
0 The number of hash bucket locks that required waiting (0%)
0 The maximum number of times any hash bucket lock was waited for (0%)
0 The number of region locks that required waiting (0%)
0 The number of buffers frozen
0 The number of buffers thawed
0 The number of frozen buffers freed
63M The number of page allocations (63937431)
181M The number of hash buckets examined during allocations (181211477)
16 The maximum number of hash buckets examined for an allocation
63M The number of pages examined during allocations (63436828)
1 The max number of pages examined for an allocation
0 Threads waited on page I/O
0 The number of times a sync is interrupted
Pool File: lastPoints
8192 Page size
0 Requested pages mapped into the process' address space
2127M Requested pages found in the cache (97%)
57M Requested pages not found in the cache (57565917)
6371509 Pages created in the cache
57M Pages read into the cache (57565917)
75M Pages written from the cache to the backing file (75763673)
$ db_stat -l
0x40988 Log magic number
16 Log version number
31KB 256B Log record cache size
0 Log file mode
10Mb Current log file size
856M Records entered into the log (856697337)
941GB 371MB 67KB 112B Log bytes written
2GB 262MB 998KB 478B Log bytes written since last checkpoint
31M Total log file I/O writes (31624157)
31M Total log file I/O writes due to overflow (31527047)
97136 Total log file flushes
686 Total log file I/O reads
96414 Current log file number
4482953 Current log file offset
96414 On-disk log file number
4482862 On-disk log file offset
1 Maximum commits in a log flush
1 Minimum commits in a log flush
160KB Log region size
195 The number of region locks that required waiting (0%)
$ db_stat -c
7 Last allocated locker ID
0x7fffffff Current maximum unused locker ID
9 Number of lock modes
2000 Maximum number of locks possible
2000 Maximum number of lockers possible
2000 Maximum number of lock objects possible
160 Number of lock object partitions
0 Number of current locks
1218 Maximum number of locks at any one time
5 Maximum number of locks in any one bucket
0 Maximum number of locks stolen by for an empty partition
0 Maximum number of locks stolen for any one partition
0 Number of current lockers
8 Maximum number of lockers at any one time
0 Number of current lock objects
1218 Maximum number of lock objects at any one time
5 Maximum number of lock objects in any one bucket
0 Maximum number of objects stolen by for an empty partition
0 Maximum number of objects stolen for any one partition
400M Total number of locks requested (400062331)
400M Total number of locks released (400062331)
0 Total number of locks upgraded
1 Total number of locks downgraded
0 Lock requests not available due to conflicts, for which we waited
0 Lock requests not available due to conflicts, for which we did not wait
0 Number of deadlocks
0 Lock timeout value
0 Number of locks that have timed out
0 Transaction timeout value
0 Number of transactions that have timed out
1MB 544KB The size of the lock region
0 The number of partition locks that required waiting (0%)
0 The maximum number of times any partition lock was waited for (0%)
0 The number of object queue operations that required waiting (0%)
0 The number of locker allocations that required waiting (0%)
0 The number of region locks that required waiting (0%)
5 Maximum hash bucket length
$ db_stat -CA
Default locking region information:
7 Last allocated locker ID
0x7fffffff Current maximum unused locker ID
9 Number of lock modes
2000 Maximum number of locks possible
2000 Maximum number of lockers possible
2000 Maximum number of lock objects possible
160 Number of lock object partitions
0 Number of current locks
1218 Maximum number of locks at any one time
5 Maximum number of locks in any one bucket
0 Maximum number of locks stolen by for an empty partition
0 Maximum number of locks stolen for any one partition
0 Number of current lockers
8 Maximum number of lockers at any one time
0 Number of current lock objects
1218 Maximum number of lock objects at any one time
5 Maximum number of lock objects in any one bucket
0 Maximum number of objects stolen by for an empty partition
0 Maximum number of objects stolen for any one partition
400M Total number of locks requested (400062331)
400M Total number of locks released (400062331)
0 Total number of locks upgraded
1 Total number of locks downgraded
0 Lock requests not available due to conflicts, for which we waited
0 Lock requests not available due to conflicts, for which we did not wait
0 Number of deadlocks
0 Lock timeout value
0 Number of locks that have timed out
0 Transaction timeout value
0 Number of transactions that have timed out
1MB 544KB The size of the lock region
0 The number of partition locks that required waiting (0%)
0 The maximum number of times any partition lock was waited for (0%)
0 The number of object queue operations that required waiting (0%)
0 The number of locker allocations that required waiting (0%)
0 The number of region locks that required waiting (0%)
5 Maximum hash bucket length
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock REGINFO information:
Lock Region type
5 Region ID
__db.005 Region name
0x2accda678000 Region address
0x2accda678138 Region primary address
0 Region maximum allocation
0 Region allocated
Region allocations: 6006 allocations, 0 failures, 0 frees, 1 longest
Allocations by power-of-two sizes:
1KB 6002
2KB 0
4KB 0
8KB 0
16KB 1
32KB 0
64KB 2
128KB 0
256KB 1
512KB 0
1024KB 0
REGION_JOIN_OK Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock region parameters:
524317 Lock region region mutex [0/9 0% 5091/47054587432128]
2053 locker table size
2053 object table size
944 obj_off
226120 locker_off
0 need_dd
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock conflict matrix:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by lockers:
Locker Mode Count Status ----------------- Object ---------------
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Locks grouped by object:
Locker Mode Count Status ----------------- Object ---------------
Diagnosis:
I'm seeing way to much lock contention on the Java Garbage Collector threads and also the VM thread when I strace my java process and I don't understand the behavior.
We are spending more than 95% of the time trying to acquire locks and I don't know what these locks are. Any info here would help.
Earlier I thought the overflow pages were the problem as 100KB data size was exceeding all overflow page limits. So, I implemented duplicate keys concept by chunking of my data to fit to overflow page limits.
Now I don't see any overflow pages in my system but I still see bad bdb read performance.
$ strace -c -f -p 5642 --->(607 times the lock timed out, errors)
Process 5642 attached with 45 threads - interrupt to quit
% time     seconds usecs/call     calls    errors syscall
98.19    7.670403        2257      3398       607 futex
0.84    0.065886           8      8423           pread
0.69    0.053980        4498        12           fdatasync
0.22    0.017094           5      3778           pwrite
0.05    0.004107           5       808           sched_yield
0.00    0.000120          10        12           read
0.00    0.000110           9        12           open
0.00    0.000089           7        12           close
0.00    0.000025           0      1431           clock_gettime
0.00    0.000000           0        46           write
0.00    0.000000           0         1         1 stat
0.00    0.000000           0        12           lseek
0.00    0.000000           0        26           mmap
0.00    0.000000           0        88           mprotect
0.00    0.000000           0        24           fcntl
100.00    7.811814                 18083       608 total
The above stats show that there is too much time spent locking (futex calls) and I don't understand that because
the application is really single-threaded. I have turned on asynchronous transactions so the writes might be
flushed asynchronously in the background but spending that much time locking and timing out seems wrong.
So, there is possibly something I'm not setting or something weird with the way JVM is behaving on my box.
I grep-ed for futex calls in one of my strace log snippet and I see that there is a VM thread that grabbed the mutex
maximum number(223) of times and followed by Garbage Collector threads: the following is the lock counts and thread-pids
within the process:
These are the 10 GC threads (each thread has grabbed lock on an avg 85 times):
  86 [8538]
  85 [8539]
  91 [8540]
  91 [8541]
  92 [8542]
  87 [8543]
  90 [8544]
  96 [8545]
  87 [8546]
  97 [8547]
  96 [8548]
  91 [8549]
  91 [8550]
  80 [8552]
VM Periodic Task Thread" prio=10 tid=0x00002aaaf4065000 nid=0x2180 waiting on condition (Main problem??)
223 [8576] ==> grabbing a lock 223 times -- not sure why this is happening…
"pool-2-thread-1" prio=10 tid=0x00002aaaf44b7000 nid=0x21c8 runnable [0x0000000042aa8000] -- main worker thread
   34 [8648] (main thread grabs futex only 34 times when compared to all the other threads)
The load average seems ok; though my system thinks it has very less memory left and that
I think is because its using up a lot of memory for the file system cache?
top - 23:52:00 up 6 days, 8:41, 1 user, load average: 3.28, 3.40, 3.44
Tasks: 229 total, 1 running, 228 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.2%us, 0.9%sy, 0.0%ni, 87.5%id, 8.3%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 98999820k total, 98745988k used, 253832k free, 530372k buffers
Swap: 18481144k total, 1304k used, 18479840k free, 89854800k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8424 rchitta 16 0 7053m 6.2g 4.4g S 18.3 6.5 401:01.88 java
8422 rchitta 15 0 7011m 6.1g 4.4g S 14.6 6.5 528:06.92 java
8423 rchitta 15 0 6989m 6.1g 4.4g S 5.7 6.5 615:28.21 java
$ java -version
java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
Maybe I should make my application a Concurrent Data Store app as there is really only one thread doing the writes and reads. But I would like
to understand why my process is spending so much time in locking.
Can I try any other options? How do I prevent such heavy locking from happening? Has anyone seen this kind of behavior? Maybe this is
all normal. I'm pretty new to using BDB.
If there is a way to disable locking that would also work as there is only one thread that's really doing all the job.
Should I disable the file system cache? One thing is that my application does not utilize cache very well as once I visit a key, I don't visit that
key again for a very long time so its very possible that the key has to be read again from the disk.
It is possible that I'm thinking this completely wrong and focussing too much on locking behavior and the problem is else where.
Any thoughts/suggestions etc are welcome. Your help on this is much appreciated.
Thanks,
Rama

Hi,
Looks like you're using BDB, not BDB JE, and this is the BDB JE forum. Could you please repost here?:
Berkeley DB
Thanks,
mark

Query performance problem - events 2505-read cache and 2510-write cache

Similar Messages

Maybe you are looking for