Environment DB_PRIVATE flag

Hello
I've got a question: Is it possible to open several process-private environments that would share their regions, i.e. would have the same effect as opening the environment in some directory but would not create any files?
And one more question: Do I have to enable locking if I have concurrent accesses to the environment, but not to the underlying databases. I mean, is access to the memory pool and other environment regions already synchronized, without specifying DB_INIT_LOCK?
Thanks, Giorgi.
Edited by: _Hitman47 on Dec 10, 2010 1:53 AM

Yeah, I've read documentation. My question was following: Can I open several private environment handles per process (one per thread) which would use the same memory pool? I can make a private environment handle thread safe by specifying DB_THREAD flag and use that single handle in all threads, but performance overhead of using DB_THREAD flag is quite big - about 10%.
My idea is to use one environment handle per thread, with only DB_THREAD and DB_PRIVATE flags specified, but those environment handles MUST use the same memory pool.
In my application, each thread opens a set of databases, do some operations on them and close them. My application guarantees that no database will be opened simultaneously by more than one thread at one moment of time.
The problems begin when one thread opens a database in its own private environment, which has been previously opened(and closed!) by another thread. It seems that closing database in environment still leaves some pages unflushed to hard disk, even after calling DB_ENV->memp_sync() function. So when another thread tries to load that database, ot is not consistent and errors start to appear. If I use one single private environment handle for all threads, or open nonprivate(file-backed) environment handles in each thread, the application works as it should.
I hope I expressed myself clear enough.
Thanks, Giorgi.

Similar Messages

Environment open flags and multithreading

Do I actually need to create an environment explicitly if I want to read only from a database in a multithreaded program (no other process, no write operations)? I don't need to use DB_INIT_LOCK as there is no thread writing to the database. And I don't need to use DB_INIT_MPOOL as there is no other process using the very same database. As the database is read-only the open flags DB_INIT_LOG, DB_INIT_TXN and DB_RECOVER don't provide any benefit either. Is this all correct? I don't need to use DB_THREAD either (which I found in the documentation of DbEnv)?
Boris

LaurenFoutz wrote:
You are correct. Using XmlManager without an environment will automatically create an environment that can support multithreaded access (but not multiprocess). So for your multithreaded single-process read only application an XmlManager without an explicit environment should be sufficient.I started to test some code. Unfortunately it always crashs if I use more than one thread. I get different error messages when it crashs - here are a few:
{font:Courier}BDB XML: page 0: illegal page type or format
BDB XML: PANIC: Invalid argument
BDB XML: assert failure: ..\..\db-4.6.21\db\db_cam.c/92: "F_ISSET(dbc, DBC_ACTIVE)"
BDB XML: PANIC: fatal region error detected; run recovery{font}
{font:Courier}BDB XML: assert failure: ..\..\db-4.6.21\mp\mp_alloc.c/564: "(bhp == first_bhp)? priority == last_priority : priority >= last_priority"{font}
{font:Courier}BDB XML: test.bdbxml: more pages returned than retrieved
BDB XML: PANIC: Permission denied
BDB XML: PANIC: fatal region error detected; run recovery{font}
I still have to debug the code to find out if I am doing anything wrong. But are there any known issues that it makes sense if I try to use some flags and maybe create the environment explicitly? Currently I only use DB_RDONLY to open the container.
Boris
PS: I just saw that the API reference contains a flag DB_THREAD. If I open the container with DB_THREAD though I get another error message:
{font:Courier}BDB XML: environment not created using DB_THREAD{font}
I'm a bit confused now: Either DB_THREAD is not required for thread-safe access or an environment must be created explicitly in order to use DB_THREAD?
Edited by: Boris Schaeling on Jul 31, 2009 2:11 AM

Why environment created failed ?

I work bdb at vxworks V5.5.1. I use bdb of V4.75 about C.
I create env by the following flag:
     open_flag = DB_CREATE | DB_RECOVER | DB_INIT_LOCK
     | DB_INIT_LOG | DB_INIT_TXN | DB_INIT_MPOOL | DB_THREAD;
     But result is failed, error code = 0xc80003.
     So I delete DB_RECOVER from open_flag. The result is OK.
     Env was created ok! So I don't know what happened.
     I trace the source code. I find if I "#undef HAVE_REPLICATION"
     in db_config.h. Then all seems to be ok. But When I reboot os,
     error just said: "/flash/soe/log.0000000017: log file unreadable"
     But I do set:
     p_env->log_set_config(p_env, DB_LOG_AUTO_REMOVE, 1);
     Before I reboot os, I find log file is just "log.0000000015" and
     "log.0000000016" left. So "log.0000000017" has not created, why error
     just said "log.0000000017 unreadable"???
     I don't know what wrong?     Can anyone give some advice?
     Thanks

Thanks Debsubhra Roy:
1. you mention: "You may have some privilege issues on the system with the OS you are using."
Can you explain some more detail? I work at vxworks 5.5.1 and file system is yaffs.
2. I'm not using replication or in-memory log. But I create environment by flags:
DB_USE_ENVIRON | DB_RECOVER | DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_TXN | DB_INIT_MPOOL | DB_THREAD | DB_PRIVATE;
Oh, yes, I set "#undef HAVE_REPLICATION" in "db_config.h". I compile all source, but it seems no pass, just saied "__log_rep_split function no found". So I find in "log_get.c" about 1326 line before __log_rep_split, I delete "#ifdef HAVE_REPLICATION", then compile ok. Maybe some wrong.
Is there any wrong about the flags? By the way, I work at multithread not multiprocess.
In "db_config.h", I #undef HAVE_FTRUNCATE 1, I'm not using posix ftruncate.
3. If I delete the code, the result is when a log switch happened, will there be some wrong happen?
4. Sorry for my incaution. I test once again, not environment created failed, just db created failed on the environment. Just the following:
ret = db_create(&p_db, p_env, 0);
the ret is not zero, is just "DB_RUNRECOVERY: Fatal error, run database recovery". All information is the following:
*/flash/soe/work/log.0000000002: log file unreadable: errno = 0xc80003*
PANIC: errno = 0xc80003
PANIC: fatal region error detected; run recovery
PANIC: fatal region error detected; run recovery
I know errno = 0xc80003 is FILE NOT FOUND in yaffs system.
So question should be why db created failed ? Sorry.
5. I find some simular questions about bdb on vxworks via google.
Re: Berkeley DB High Availability Work Error (VxWorks)
Re: BDB vxworks 6.6 kernel port error
All wrongs happened just at os reboot. I don't know whether vxworks has something
against bdb. I approciate your help.
Edited by: being on 2009-3-6 下午7:28
Edited by: being on 2009-3-6 下午7:50
Edited by: being on 2009-3-6 下午10:19
Edited by: being on 2009-3-6 下午11:21

Opening a db environment and preventing corruption?

Greetings,
Sorry for the long post, but I’m having some problems and I’m not sure where to even begin. I also have a deadline looming and many people looking to me for fixes.
I’m working on a project and we’re trying to use the BDB for looking up data (read only tables). Things run fine when testing one process, but in production there will need to be 20 or more processes, and we cannot keep any of them running very long due to environment file corruption. Also, the processes do have some known memory leaks and crash occasionally, this problem cannot be corrected at this time and must be worked around (currently the processes are all terminated and restarted nightly).
1. How should the environment be set up / opened per process? Currently each process has code that calculates the environment cache size (based on calculations from the docs) and opens the databases (currently 15 .db files). Is this the correct method? Should every process perform all the environment settings, or should an existing environment be checked for first somehow? Every process has code that calls:
db_env_create()
set_cachesize(<size calculated to be large enough for all .db files>)
open(DB_CREATE | DB_THREAD | DB_INIT_MPOOL)
Each database is opened with DB_RDONLY | DB_THREAD options.
2. The environment cache size is calculated to be large enough to hold all the .db files in memory (currently about 1.08G), will that RAM be shared between processes, or does each process need the physical RAM available?
3. I have noticed that processes compiled with debugging symbols modify the environment such that when you try to run a program compiled with optimizations (and no debugging symbols), the program will hang trying to open the environment. Is this to be expected?
4. Once the environment files are created (currently any process that starts will create the __db.* files if they do not exist), it is possible to mark them read-only in an attempt to prevent corruption? We are only using the database files for read-only access.
Once the environment files are corrupt, there are generally two types of problems we encounter. One is when a process is trying to start and fails because the environment or a database cannot be opened:
In bdbdb_open(), DB->open(bdb_ru_rule.db) failed: Not enough space
unable to allocate memory for mutex; resize mutex region
The other problem is that a currently running process will just hang. This is a stack trace:
(dbx) where
current thread: t@1
[1] ___lwp_mutex_timedlock(0xf5a36028, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfe241314
[2] __db_pthread_mutex_lock(0x10bd08, 0x2d798, 0x2118000, 0x1c8, 0x10c190, 0x0), at 0xfdd11fac
[3] __db_tas_mutex_lock(0xf6de0000, 0x2d798, 0xf5a36050, 0x4000, 0xafc00000, 0xb02806c0), at 0xfdd11d14
[4] __memp_fget(0x0, 0xffbfd4d0, 0x0, 0x34036000, 0xced56bc0, 0x0), at 0xfde3f7ec
[5] __bam_get_root(0x10ccb8, 0x0, 0x1, 0x0, 0xffbfd5b8, 0x0), at 0xfdd2c318
[6] __bam_search(0x10ccb8, 0x0, 0x10ce24, 0x10ce24, 0x1, 0x0), at 0xfdd2c8c4
[7] __bamc_search(0x10ccb8, 0x0, 0xffbfd8e8, 0x1a, 0xffbfd6a4, 0x13), at 0xfdd1a420
[8] __bamc_get(0x10ccb8, 0xffbfd8e8, 0xffbfd8cc, 0x1a, 0xffbfd758, 0xfdd15b08), at 0xfdd16104
[9] __dbc_get(0x10ccb8, 0xffbfd8e8, 0xffbfd8cc, 0x1a, 0x14, 0x0), at 0xfdde145c
[10] __db_get(0x10c900, 0x50000000, 0xffbfd8e8, 0xffbfd8cc, 0x1a, 0x10c938), at 0xfddee648
[11] __db_get_pp(0x10bd08, 0x0, 0xffbfd8e8, 0xffbfd8cc, 0x8000000, 0x10c880), at 0xfddee38c
=>[12] bdb_lookup(i_bdb = 1, vp_key = 0xffbfd9a0, vp_data = 0xffbfd98c), line 334 in "bdb_base.c"
Any insights into these problems would be greatly appreciated!
Thanks,
Matthew

Hi Matthew,
I will try to answer your questions.
1. How should the environment be set up / opened
per process?
As documented in the flags section here:
http://www.oracle.com/technology/documentation/berkele
y-db/db/api_c/env_open.html
You should open the process you want to create the
environment with all of the configuration settings.
Then each subsequent environment open should be done
with no flags set.
Is there a way to detect if the environment is already opened so the first process that starts can open the environment with flags?
What about the settings done with function calls between env->create() and env->open(), like setting the cache size? Should those be skipped as well if another process already has the environment open?
What if the __db* files already exist, then does the first process still need to open the environment with flags set, or are the environment's settings stored in the __db* files directly?
3. I have noticed that processes compiled with
debugging symbols modify the environment such that
when you try to run a program compiled with
optimizations (and no debugging symbols), the program
will hang trying to open the environment. Is this to
be expected?
There is a slight distinction here. There are two
different flags (called slightly different things
depending on your platform).
Environments are not compatible if they have
different diagnostic settings. Diagnostic mode
(--enable-diagnostic using configure, or DIAGNOSTIC
preprocessor define in a Visual Studio project).
You can do a debug build without diagnostics enabled.
On systems using configure, just build without
--enable-diagnostic. On Windows, alter the Debug
project build files to remove the DIAGNOSTIC
pre-processor define.
Sorry for not being more clear, I was referring to the application I'm writing, not compiling the BDB. For example, if I compile with -g (Solaris CC) and run the process, then recompile with -xO4 (and remove -g), then the process will not be able to open the environment and will either terminate with an error or hang in the mutex lock.
>
5. In bdbdb_open(), DB->open(bdb_ru_rule.db)
failed: Not enough space
unable to allocate memory for mutex; resize mutex
region
If you are encountering this problem, you should read
the reference guide section on configuring Locking.
http://www.oracle.com/technology/documentation/berkele
y-db/db/ref/lock/max.html
6. [2] __db_pthread_mutex_lock(0x10bd08, 0x2d798,
0x2118000, 0x1c8, 0x10c190, 0x0), at 0xfdd11fac
Are you running deadlock detection? Is it possible
that a process has died while holding locks open on
the database?
I currently do not have deadlock detection enabled for two reasons:
1. All the databases are opened as read-only, so why is any locking happeneing? In the C-API docs for env->open(), the DB_INIT_LOCK description specifically says: "If all threads are accessing the database(s) read-only, locking is unnecessary."
2. When I find processes that have failed waiting on some mutex, running db_stat does not indicate that any processes are deadlocked and running db_deadlock hangs.
Thanks,
Matthew

Most simple code always recovers the environment

Hi,
The code below recovers the environment on every run. This happens, if
I specify DB_REGISTER without DB_RECOVER.
Version: 4.6.21
#include <stdio.h>
#include <db.h>
#ifdef WIN32
#define DIRECTORY_ENVIRONMENT "C:\\Temp"
#else
#define DIRECTORY_ENVIRONMENT "/tmp"
#endif
static DB_ENV *environment = NULL;
static int connect( int flags )
db_env_create( &environment, 0 );
environment->set_errfile( environment, stderr );
return environment->open( environment, DIRECTORY_ENVIRONMENT, flags, 0644 );
int main( int argc, char *argv[] )
int status = connect( DB_CREATE | DB_INIT_MPOOL | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_TXN | DB_REGISTER );
if( status == DB_RUNRECOVERY )
printf("Run recovery...\n");
status = connect( DB_CREATE | DB_INIT_MPOOL | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_TXN | DB_RECOVER ); // "DB_REGISTER" fails
status = environment->close( environment, 0 );
return 0;
}

Hello,
Please let me know if I am misunderstanding your question.
Using DB_REGISTER will check to see if recovery needs to be performed
before opening the database environment. If recovery needs to be
performed for any reason (including the initial use of the DB_REGISTER flag),
and DB_RECOVER is also specified, recovery will be performed and the
open will proceed normally. If recovery needs to be performed and DB_RECOVER
is not specified, DB_RUNRECOVERY will be returned. If recovery does not need
to be performed, the DB_RECOVER flag will be ignored.
Thanks,
Sandra

Need help with Berkeley XML DB Performance

We need help with maximizing performance of our use of Berkeley XML DB. I am filling most of the 29 part question as listed by Oracle's BDB team.
Berkeley DB XML Performance Questionnaire
1. Describe the Performance area that you are measuring? What is the
current performance? What are your performance goals you hope to
achieve?
We are measuring the performance while loading a document during
web application startup. It is currently taking 10-12 seconds when
only one user is on the system. We are trying to do some testing to
get the load time when several users are on the system.
We would like the load time to be 5 seconds or less.
2. What Berkeley DB XML Version? Any optional configuration flags
specified? Are you running with any special patches? Please specify?
dbxml 2.4.13. No special patches.
3. What Berkeley DB Version? Any optional configuration flags
specified? Are you running with any special patches? Please Specify.
bdb 4.6.21. No special patches.
4. Processor name, speed and chipset?
Intel Xeon CPU 5150 2.66GHz
5. Operating System and Version?
Red Hat Enterprise Linux Relase 4 Update 6
6. Disk Drive Type and speed?
Don't have that information
7. File System Type? (such as EXT2, NTFS, Reiser)
EXT3
8. Physical Memory Available?
4GB
9. Are you using Replication (HA) with Berkeley DB XML? If so, please
describe the network you are using, and the number of Replica’s.
No
10. Are you using a Remote Filesystem (NFS) ? If so, for which
Berkeley DB XML/DB files?
No
11. What type of mutexes do you have configured? Did you specify
–with-mutex=? Specify what you find inn your config.log, search
for db_cv_mutex?
None. Did not specify -with-mutex during bdb compilation
12. Which API are you using (C++, Java, Perl, PHP, Python, other) ?
Which compiler and version?
Java 1.5
13. If you are using an Application Server or Web Server, please
provide the name and version?
Oracle Appication Server 10.1.3.4.0
14. Please provide your exact Environment Configuration Flags (include
anything specified in you DB_CONFIG file)
Default.
15. Please provide your Container Configuration Flags?
final EnvironmentConfig envConf = new EnvironmentConfig();
envConf.setAllowCreate(true); // If the environment does not
// exist, create it.
envConf.setInitializeCache(true); // Turn on the shared memory
// region.
envConf.setInitializeLocking(true); // Turn on the locking subsystem.
envConf.setInitializeLogging(true); // Turn on the logging subsystem.
envConf.setTransactional(true); // Turn on the transactional
// subsystem.
envConf.setLockDetectMode(LockDetectMode.MINWRITE);
envConf.setThreaded(true);
envConf.setErrorStream(System.err);
envConf.setCacheSize(1024*1024*64);
envConf.setMaxLockers(2000);
envConf.setMaxLocks(2000);
envConf.setMaxLockObjects(2000);
envConf.setTxnMaxActive(200);
envConf.setTxnWriteNoSync(true);
envConf.setMaxMutexes(40000);
16. How many XML Containers do you have? For each one please specify:
One.
1. The Container Configuration Flags
          XmlContainerConfig xmlContainerConfig = new XmlContainerConfig();
          xmlContainerConfig.setTransactional(true);
xmlContainerConfig.setIndexNodes(true);
xmlContainerConfig.setReadUncommitted(true);
2. How many documents?
Everytime the user logs in, the current xml document is loaded from
a oracle database table and put it in the Berkeley XML DB.
The documents get deleted from XML DB when the Oracle application
server container is stopped.
The number of documents should start with zero initially and it
will grow with every login.
3. What type (node or wholedoc)?
Node
4. Please indicate the minimum, maximum and average size of
documents?
The minimum is about 2MB and the maximum could 20MB. The average
mostly about 5MB.
5. Are you using document data? If so please describe how?
We are using document data only to save changes made
to the application data in a web application. The final save goes
to the relational database. Berkeley XML DB is just used to store
temporary data since going to the relational database for each change
will cause severe performance issues.
17. Please describe the shape of one of your typical documents? Please
do this by sending us a skeleton XML document.
Due to the sensitive nature of the data, I can provide XML schema instead.
18. What is the rate of document insertion/update required or
expected? Are you doing partial node updates (via XmlModify) or
replacing the document?
The document is inserted during user login. Any change made to the application
data grid or other data components gets saved in Berkeley DB. We also have
an automatic save every two minutes. The final save from the application
gets saved in a relational database.
19. What is the query rate required/expected?
Users will not be entering data rapidly. There will be lot of think time
before the users enter/modify data in the web application. This is a pilot
project but when we go live with this application, we will expect 25 users
at the same time.
20. XQuery -- supply some sample queries
1. Please provide the Query Plan
2. Are you using DBXML_INDEX_NODES?
Yes.
3. Display the indices you have defined for the specific query.
     XmlIndexSpecification spec = container.getIndexSpecification();
     // ids
     spec.addIndex("", "id", XmlIndexSpecification.PATH_NODE | XmlIndexSpecification.NODE_ATTRIBUTE | XmlIndexSpecification.KEY_EQUALITY, XmlValue.STRING);
     spec.addIndex("", "idref", XmlIndexSpecification.PATH_NODE | XmlIndexSpecification.NODE_ATTRIBUTE | XmlIndexSpecification.KEY_EQUALITY, XmlValue.STRING);
     // index to cover AttributeValue/Description
     spec.addIndex("", "Description", XmlIndexSpecification.PATH_EDGE | XmlIndexSpecification.NODE_ELEMENT | XmlIndexSpecification.KEY_SUBSTRING, XmlValue.STRING);
     // cover AttributeValue/@value
     spec.addIndex("", "value", XmlIndexSpecification.PATH_EDGE | XmlIndexSpecification.NODE_ATTRIBUTE | XmlIndexSpecification.KEY_EQUALITY, XmlValue.STRING);
     // item attribute values
     spec.addIndex("", "type", XmlIndexSpecification.PATH_EDGE | XmlIndexSpecification.NODE_ATTRIBUTE | XmlIndexSpecification.KEY_EQUALITY, XmlValue.STRING);
     // default index
     spec.addDefaultIndex(XmlIndexSpecification.PATH_NODE | XmlIndexSpecification.NODE_ELEMENT | XmlIndexSpecification.KEY_EQUALITY, XmlValue.STRING);
     spec.addDefaultIndex(XmlIndexSpecification.PATH_NODE | XmlIndexSpecification.NODE_ATTRIBUTE | XmlIndexSpecification.KEY_EQUALITY, XmlValue.STRING);
     // save the spec to the container
     XmlUpdateContext uc = xmlManager.createUpdateContext();
     container.setIndexSpecification(spec, uc);
4. If this is a large query, please consider sending a smaller
query (and query plan) that demonstrates the problem.
21. Are you running with Transactions? If so please provide any
transactions flags you specify with any API calls.
Yes. READ_UNCOMMITED in some and READ_COMMITTED in other transactions.
22. If your application is transactional, are your log files stored on
the same disk as your containers/databases?
Yes.
23. Do you use AUTO_COMMIT?
     No.
24. Please list any non-transactional operations performed?
No.
25. How many threads of control are running? How many threads in read
only mode? How many threads are updating?
We use Berkeley XML DB within the context of a struts web application.
Each user logged into the web application will be running a bdb transactoin
within the context of a struts action thread.
26. Please include a paragraph describing the performance measurements
you have made. Please specifically list any Berkeley DB operations
where the performance is currently insufficient.
We are clocking 10-12 seconds of loading a document from dbd when
five users are on the system.
getContainer().getDocument(documentName);
27. What performance level do you hope to achieve?
We would like to get less than 5 seconds when 25 users are on the system.
28. Please send us the output of the following db_stat utility commands
after your application has been running under "normal" load for some
period of time:
% db_stat -h database environment -c
% db_stat -h database environment -l
% db_stat -h database environment -m
% db_stat -h database environment -r
% db_stat -h database environment -t
(These commands require the db_stat utility access a shared database
environment. If your application has a private environment, please
remove the DB_PRIVATE flag used when the environment is created, so
you can obtain these measurements. If removing the DB_PRIVATE flag
is not possible, let us know and we can discuss alternatives with
you.)
If your application has periods of "good" and "bad" performance,
please run the above list of commands several times, during both
good and bad periods, and additionally specify the -Z flags (so
the output of each command isn't cumulative).
When possible, please run basic system performance reporting tools
during the time you are measuring the application's performance.
For example, on UNIX systems, the vmstat and iostat utilities are
good choices.
Will give this information soon.
29. Are there any other significant applications running on this
system? Are you using Berkeley DB outside of Berkeley DB XML?
Please describe the application?
No to the first two questions.
The web application is an online review of test questions. The users
login and then review the items one by one. The relational database
holds the data in xml. During application load, the application
retrieves the xml and then saves it to bdb. While the user
is making changes to the data in the application, it writes those
changes to bdb. Finally when the user hits the SAVE button, the data
gets saved to the relational database. We also have an automatic save
every two minues, which saves bdb xml data and saves it to relational
database.
Thanks,
Madhav
[email protected]

Could it be that you simply do not have set up indexes to support your query? If so, you could do some basic testing using the dbxml shell:
milu@colinux:~/xpg > dbxml -h ~/dbenv
Joined existing environment
dbxml> setverbose 7 2
dbxml> open tv.dbxml
dbxml> listIndexes
dbxml> query     { collection()[//@date-tip]/*[@chID = ('ard','zdf')] (: example :) }
dbxml> queryplan { collection()[//@date-tip]/*[@chID = ('ard','zdf')] (: example :) }Verbosity will make the engine display some (rather cryptic) information on index usage. I can't remember where the output is explained; my feeling is that "V(...)" means the index is being used (which is good), but that observation may not be accurate. Note that some details in the setVerbose command could differ, as I'm using 2.4.16 while you're using 2.4.13.
Also, take a look at the query plan. You can post it here and some people will be able to diagnose it.
Michael Ludwig

Extremely Poor Read Performance

Hey guys,
For a work project, I have been instructed to use a Berkeley DB as our data storage mechanism. Admittedly, I know little about BDB, but I've been learning more in the past day as I am reading up on it. I'm hoping, though, that even if no one can help me with the problem I am having, they can at least tell me if what I am seeing is typical/expected, or definitely wrong.
Here's what I got:
- Parent table A - Has 0 or 1 key for table B, and 0 or 1 key for table C
- Table B
- Table C
For purpose of discussion, let's ignore table C as it is logically the same as Table B.
Table B has 25 million rows, keyed by a 34-36 digit string, and a payload of 500-1000 bytes.
Table A has 26 million rows, 25 million of which reference the 25 million rows in Table B.
My question is not on the merits of why the data is structured the way it is, but rather about the performance I am seeing, so please refrain from questions such as "why is your data structured that way - can you structure it another way?" I know I can do that - again I just want to know what other people are experiencing for performance.
Anyway, what's happening is this - my program runs a cursor on Table A to get all records. As it gets each record in Table A, it retrieves the referenced records in Table B. So, the cursor on table A represents sequential key access. The subsequent retrievals from Table B represent "random" retrievals - i.e. the key may be anywhere in the index, and is not at all related to the previous retrieval.
Cruising the cursor on Table A, I am seeing performance of about 100,000 records per 2 seconds. However, when I add in the retrievals from Table B, performance stoops all the way down to 100,000 records per 1000 seconds, or better put 100 per second. At this rate, it will take nearly 70 hours to traverse my entire data set.
My question is, am I simply running into a fundamental hardware issue in that I am doing random retrievals from Table B, and I cannot expect to see better performance than 100 retrievals per second due to all of the disk reads? Being that the DB is 20 GB in size, I cannot cache the entire table in memory, so does that mean that reading the data in this fashion is simply not feasible?
If it isn't feasible, does anyone have a suggestion on a different way to read the data, without changing the table relationship as it currently stands? Considering Table B has a reverse reference to table A, I've considered putting a secondary index on table B so that instead of doing random seeks into table B, I can run a cursor on the secondary index of table B at the same time I run the cursor on table A. Then, for each record in table A that has a reference to table B, the first record in the cursor for table B should be the one I need. However, reading about secondary indexes, it looks like all a secondary index does is give a reference to the key to the table. Thus, my concern is that running a cursor on the secondary index of table B will really be no different than randomly selecting the records from table B, as it will be doing just that in the background anyway. Am I understanding this correctly, or would a secondary index really help in my case?
Thank you for your time.
-Brett

Hi Brett,
Is the sorting order the same between the two databases, A and B, that is, are the keys ordered in the same way? For example, to key N in database A, key N in database B is referred.
I would guess not, because you mention the "randomness" in retrieving from B when doing the cursor sequential traversal of A, and the 34-36 digit keys in B are probably randomly generated.
With B as a secondary database, associated to A as the primary database, it would make sense having a cursor on secondary database B to iterate, if you expect that the same ordering of keys in A (as mentioned in the beginning of this post). For example, you would use DBcursor->get to iterate in the secondary database B, or DBcursor->pget if you also want to retrieve the key from the primary database A: DBcursor->get(), DBcursor->pget()
Basically secondary indexes allow for accessing records in a database (primary db) based on a different piece of information other than the primary key:
Secondary indexes
So, when you iterate with a cursor in B you would retrieve the data from A (and in addition the key from A) in the order given by the keys (secondary keys) in B.
However, a secondary database does not seem to me feasible in your case. You seem to have about 1 mil records in primary db A for which you would not have to generate a secondary key, so you would have to return DB_DONOTINDEX from the secondary callback: DB->associate()
(it may be difficult to account exactly for the records in A for which you do not want to generate secondary keys)
Also, the secondary key, the 34-36 digit string, would have to somehow be derived from the primary key and data in A.
If the ordering is not similar (in the sense explained at the beginning of the post) between A and B, then having the secondary index does not too much value, other than simplifying retrieval from A in queries where the query criteria involves the 34-36 digit string.
Back to your current way of structuring data, there are some suggestion that could improve retrieval times:
- try using the latest Berkeley DB release, 5.1.19: Berkeley DB Release History
- try configuring a page size for the databases A and B equal to that of the filesystem's block size: Selecting a page size
- try to avoid the creation of overflow items and pages by properly sizing the page size -- you can inspect database statistics using db_stat -d: db_stat
- try increasing the cache size to a larger value: Selecting a cache size
- if there's a single process accessing the environment, try to back the environment shared region files in per-process private memory (using DB_PRIVATE flag in the DB_ENV->open() call);
- try performing read operations outside of transactions, that is, do not use transactional cursors.
For reference, review these sections in the Berkeley DB Reference Guide:
Access method tuning
Transaction tuning
Regards,
Andrei

Can i create DB in shared mem, rather than backed by a file

Hi
I'm using db 4.2.6 version and trying to create a DB in shared mem rather than a file. How do I achieve that.
While I look at documentation here:
http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/env_region.html
==========
If the DB_SYSTEM_MEM flag is specified to DB->open(), shared regions are created in system memory rather than files. This is an alternative mechanism for sharing the Berkeley DB environment among multiple processes and multiple threads within processes.
===========
But DB->open() rejects DB_SYSTEM_MEM flat out. db->open() doesn't even talk about flag:
http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/dbopen.html
I could specify DB_SYSTEM_MEM for DBENV.
Here is my code. Also DBENV->open() won't let me specify DB_PRIVATE.
flags = DB_INIT_MPOOL | DB_CREATE | DB_SYSTEM_MEM | DB_INIT_TXN | DB_RECOVER | DB_INIT_LOG |
DB_LOG_AUTOREMOVE;
dbenv->open(dbenv, dbhome, flags, 0);
flags = DB_CREATE | DB_EXCL| DB_AUTO_COMMIT;
dbp->open(dbp, NULL, dbfile, NULL, DB_BTREE, flags, 0);If I create db as above, where is it getting created? I don't see it here:
# ./bdbperf -t -i 10000 -k 32 -v 128;ls -la dbtest
create log/transaction
creating in system mem
inserted 10000 records. errs 0
logs removed
inserting 10000 records took 0s.290ms (29 micros/op)
done. 0s.290ms
total 200
drwxr-xr-x 2 root root 80 Jul 17 03:04 .
drwxrwxrwx 7 root root 1180 Jul 17 03:04 ..
-rw-r----- 1 root root 8 Jul 17 03:04 __db.001
-rw-r----- 1 root root 190424 Jul 17 03:04 log.0000000011
However, if I specify a filename arg in dbp->open(), I do see a file with that name in my env dir 'dbtest'. db_stat on such file seems to be good.
# ./lt-db_stat -d dbfile -h dbtest
Sat Jul 17 02:38:01 2010 Local time
53162 Btree magic number
9 Btree version number
Little-endian Byte order
Flags
2 Minimum keys per-page
1024 Underlying database page size
239 Overflow key/data size
4 Number of levels in the tree
10000 Number of unique keys in the tree
10000 Number of data items in the tree
171 Number of tree internal pages
79122 Number of bytes free in tree internal pages (54% ff)
1822 Number of tree leaf pages
738356 Number of bytes free in tree leaf pages (60% ff)
0 Number of tree duplicate pages
0 Number of bytes free in tree duplicate pages (0% ff)
0 Number of tree overflow pages
0 Number of bytes free in tree overflow pages (0% ff)
0 Number of empty pages
0 Number of pages on the free list
So:
1. how do I create DB in shared-mem area and
2. how do I get stats for such DB?
Appeciate any help.

Hi,
user3143985 wrote:
I'm using db 4.2.6 version and trying to create a DB in shared mem rather than a file. How do I achieve that.There must be a typo, as there is no 4.2.6 release.
user3143985 wrote:
While I look at documentation here:
http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/env_region.html
When reading the documentation, I would indicate to read the documentation from the release package you downloaded. The online documentation is accurate for the latest BDB release.
user3143985 wrote:
==========
If the DB_SYSTEM_MEM flag is specified to DB->open(), shared regions are created in system memory rather than files. This is an alternative mechanism for sharing the Berkeley DB environment among multiple processes and multiple threads within processes.
===========
But DB->open() rejects DB_SYSTEM_MEM flat out. db->open() doesn't even talk about flag:
http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/dbopen.html
That's a documentation error, it should refer to the DB_ENV->open()method. Thanks for pointing it out! As you can see, the DB_SYSTEM_MEM is linked to the DB_ENV->open() method: http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/envopen.html#envopen_DB_SYSTEM_MEM
user3143985 wrote:
1. how do I create DB in shared-mem area and This page should clarify this question and how a memory-only configuration looks like: http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/program_ram.html
user3143985 wrote:
2. how do I get stats for such DB?If you want to allocate region memory from the heap instead of from memory backed by the filesystem or system shared memory, you'll have to specify DB_PRIVATE ( http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/envopen.html#envopen_DB_PRIVATE ). DB_PRIVATE should not be specified if more than a single process is accessing the environment because it is likely to cause database corruption and unpredictable behavior. For example, if both a server application and Berkeley DB utilities (for example, db_archive, db_checkpoint or db_stat) are expected to access the environment, the DB_PRIVATE flag should not be specified.
Bogdan Coman

Failed to open more than one DBEnv in the same directory

I have a server program with replication enabled, which open the DBEnv with the following enflags:
DB_THREAD | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_INIT_TXN| DB_RECOVER | DB_CREATE |DB_RPIVATE
here DB_PRIVATE is set, does it mean other process can not read the data when the server is running?
Edited by: [email protected] on 2009-3-9 上午12:58

Hi Alexi,
When setting DB_PRIVATE flag to an environment of one process, other processes can't join this environment. So they can't read data from this environment.
However, since you are using replication, you should be able to share data via replication in other processes. That is, in another process P, let P be part of the replication group, so that data is replicated to P's own environment, so that P can read the data (P may not be able to update data, because update only happens in the master).

Addindex get errors

Hi,
When I tried to addindex to a container, I get some errors below:
page 0: illegal page type or format
PANIC: Invalid argument
PANIC: fatal region error detected; run recovery
*/root/dbxml/addindex.sh:7: addIndex failed, Error: DB_RUNRECOVERY: Fatal error, run database recovery*
PANIC: fatal region error detected; run recovery
Container - DB error during database close: -30974
PANIC: fatal region error detected; run recovery
Container - DB error during database close: -30974
PANIC: fatal region error detected; run recovery
Container - DB error during database close: -30974
PANIC: fatal region error detected; run recovery
Container - DB error during database close: -30974
PANIC: fatal region error detected; run recovery
Container - DB error during database close: -30974
PANIC: fatal region error detected; run recovery
Container - DB error during database close: -30974
PANIC: fatal region error detected; run recovery
Container - DB error during database close: -30974
PANIC: fatal region error detected; run recovery
Container - DB error during database close: -30974
File handles still open at environment close
Open file handle: 57909d417e7542228b3cd3ad1ff9546a.bdbxml
PANIC: fatal region error detected; run recovery
What I did to add index:
(1) stop my application. My application open database using DB_PRIVATE flag.
(2) run addindex from dbxml shell without using transaction. (even if I use transaction, I got similar error.)
I am wodering why this error happen? Can I use dbxml shell to add an index if the container is opened with DB_PRIVATE?
Thanks.
-yuan
Edited by: user5159213 on Mar 23, 2009 3:37 PM

Yuan,
Question...which user owns the log and container files (ls -l)? Is it the same as the user who is running the dbxml command line? If not, does he have permission to write to those files and the directory?
Just a thought that it could be a permissions issue...I think I had similar problems a few days back.
--kevin

Concurrency with multiprocess application

Hi,
I have a serious problem here.I am supposed to design a multi process application in C++ that interacts with Berkeley DB. I have two binaries, one of the binary updates the database, while the other has to read it.The Db object is opened only once during the start up of the application for the binary that reads, So when I read it any time in the life cycle of the application, I expect to read the latest records.But it is not reading the latest records. I get only the records that were present during the db open operation.Both the binaries have different environment objects but pointing to the same directory and have different Db objects opening the same database file.
The documentation speaks only about concurrency in a multi threaded application , how about for multi process application?Can some one help me with the same
Nandish

Hi there,
Thank you for posting the processes skeleton (it had been better if you had posted the env and db flags). I will try to explain from both, CDS and TDS, point of view ( http://www.oracle.com/technology/documentation/berkeley-db/db/ref/intro/products.html )
First, you should not use DB_PRIVATE flag on your environment. If you are using it, the env may only be accessed by a single process, multithreaded or not.
DB_PRIVATE: http://www.oracle.com/technology/documentation/berkeley-db/db/api_c/env_open.html#DB_PRIVATE
If you want to share database/databases that change at all, you must first share the environment. Otherwise, the processes wouldn't all see the same pages in the mpool and in case that more than one wants to modify the database/database, they wouldn't block each other from concurrent modifications.
A database environment is a way to share resources among a set of databases, most importantly for your application, the database cache. An environment may be shared by any number of processes, as well as by any number of threads within those processes, but you can't share anonymous databases between processes since they are identifiable by their DB handle, which can't be copied between processes.
All processes sharing the environment must use registration. If registration is not uniformly used across all participating processes, then you can see inconsistent results in terms of your application's ability to recognize that recovery must be run.
DB_REGISTER: http://www.oracle.com/technology/documentation/berkeley-db/db/api_c/env_open.html#DB_REGISTER.
It is possible for an environment to include resources from other directories on the system, and applications often choose to distribute resources to other directories or disks for performance or other reasons. However, by default, the databases, shared regions (the locking, logging, memory pool, and transaction shared memory areas) and log files will be stored in a single directory hierarchy.
When region files are backed by system memory, DB creates a single file in the environment's home directory. This file contains information necessary to identify the system shared memory in use by the environment. By creating this file, DB enables multiple processes to share the environment.
Shared memory regions: http://www.oracle.com/technology/documentation/berkeley-db/db/ref/env/region.html
Architecting Transactional Data Store applications: http://www.oracle.com/technology/documentation/berkeley-db/db/ref/transapp/app.html
Bogdan Coman

Speed of building database

We are building a hash database. Our data consist of a string key, which is about 20 characters long and the value is a number, which represents the frequency of that string in a text collection. The number value is inserted as a string.
The problem is that the time it takes to insert new records in the db increases as we add more records. I have read many of the discussions on this forum and I found many similar situations, but none that actually appear to solve the problem.
We process records as they occur in the text collection. We programmatically compute a bunch of records and then want to insert them as quickly as possible. We anticipate several tens of millions (perhaps more than 100 million) of records. Some of these records are duplicates of data already in the db. In this case, we want to increment the count in the db.
Using default parameters in a test program, and a 512 MB cache, the first inserts (consisting of, say 10,000 records) take less than a second. By the time we reach 2 million records in the database, each insert takes 5 seconds, by the time we hit 5 million, it takes about 9 seconds.
In our full program, running on Ubuntu, we have tried different page sizes and fillfactors, but we have not come up with a combination that is really any better. Each batch is about a million records long and the delays are substantial with running times of many days.
My main question is how can we get this task done in a reasonable amount of time?
Secondary question is why do we get overflow pages when our data are so small? No key should be more than about 20-30 bytes and no value is likely to be anywhere near that large.
Thanks very much.
Herb
Here is some example information:
db4.4_stat -m -h
en.olm
512MB Total cache size
1 Number of caches
512MB Pool individual cache size
0 Maximum memory-mapped file size
0 Maximum open file descriptors
0 Maximum sequential buffer writes
0 Sleep after writing maximum sequential buffers
0 Requested pages mapped into the process' address space
148M Requested pages found in the cache (99%)
868339 Requested pages not found in the cache
164263 Pages created in the cache
841144 Pages read into the cache
702786 Pages written from the cache to the backing file
734519 Clean pages forced from the cache
141381 Dirty pages forced from the cache
0 Dirty pages written by trickle-sync thread
129487 Current total page count
129467 Current clean page count
20 Current dirty page count
65537 Number of hash buckets used for page location
149M Total number of times hash chains searched for a page
(149919989)
4 The longest hash chain searched for a page
288M Total number of hash chain entries checked for page (288693411)
0 The number of hash bucket locks that required waiting (0%)
0 The maximum number of times any hash bucket lock was waited for
0 The number of region locks that required waiting (0%)
1005422 The number of page allocations
1780885 The number of hash buckets examined during allocations
14 The maximum number of hash buckets examined for an allocation
875900 The number of pages examined during allocations
1 The max number of pages examined for an allocation
Pool File: cofreq
4096 Page size
0 Requested pages mapped into the process' address space
54M Requested pages found in the cache (99%)
396357 Requested pages not found in the cache
44138 Pages created in the cache
370755 Pages read into the cache
354934 Pages written from the cache to the backing file
Pool File: olm
4096 Page size
0 Requested pages mapped into the process' address space
94M Requested pages found in the cache (99%)
471878 Requested pages not found in the cache
120004 Pages created in the cache
470364 Pages read into the cache
347714 Pages written from the cache to the backing file

Hi Herb,
Thank you for the clarification.
Are you suggesting that we do allow duplicates and then during retrieval, just use the duplicate with the largest number to get our count?No, I'm sorry, I misunderstood that part from your first update. I don't think that sorting the records will help.
Here are some things that I suspect and some that you'll have to try.
Maybe the default hashing function doesn't perform well on your keys, so you might want to try to define your own hash function:
C http://www.oracle.com/technology/documentation/berkeley-db/db/api_c/db_set_h_hash.html
C++ http://www.oracle.com/technology/documentation/berkeley-db/db/api_cxx/db_set_h_hash.html
An efficient hash function will distribute keys equally across the database pages. You can run the "db_stat" utility to decide if the hashing function performs well, by comparing the number of hash buckets and the number of keys:
http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_conf/h_hash.html
Knowing in advance the expected number of keys that you will store can help you in accurately construct the number of buckets that the database will require to store your keys. You can call the DB->set_h_nelem method to set an estimate of the final size of the hash table:
C http://www.oracle.com/technology/documentation/berkeley-db/db/api_c/db_set_h_nelem.html
C++ http://www.oracle.com/technology/documentation/berkeley-db/db/api_cxx/db_set_h_nelem.html
Open your environment with the DB_PRIVATE flag; the environment shared regions are created in per-process heap memory (this is safe since you only have one process accessing the environment. If you want to access at the same time the environment by using an utility than don't specify this flag, although for testing purposes you should try to set it.
Consider calling DbEnv::memp_trickle at a regular time interval (or run it in a separate thread). This will ensure that a specified percent of the pages in the shared memory pool are clean, by writing dirty pages to their backing files. You may want try to specify it to a value between 5%-25%.
C http://www.oracle.com/technology/documentation/berkeley-db/db/api_c/memp_trickle.html
C++ http://www.oracle.com/technology/documentation/berkeley-db/db/api_cxx/memp_trickle.html
If you are not using transactions because you don't need ACID semantics, nor locking and logging, why would you not try to run a test without even having an environment?
Since the keys varies in size, some extra step that you should consider, would be to use separate databases for the types of keys.
Regards,
Bogdan Coman

Getting realistic performance expectations.

I am running tests to see if I can use the Oracle Berkeley XML database as a backend to a web application but am running into query response performance limitations. As per the suggestions for performance related questions, I have pulled together answers to the series of questions that need to be addressed, and they are given below. The basic issue at stake, however, is am I being realistic about what I can expect to achieve with the database?
Regards
Geoff Shuetrim
Oracle Berkeley DB XML database performance.
Berkeley DB XML Performance Questionnaire
1. Describe the Performance area that you are measuring? What is the
current performance? What are your performance goals you hope to
achieve?
I am using the database as a back end to a web application that is expected
to field a large number of concurrent queries.
The database scale is described below.
Current performance involves responses to simple queries that involve 1-2
minute turn around (this improves after a few similar queries have been run,
presumably because of caching, but not to a point that is acceptable for
web applications).
Desired performance is for queries to execute in milliseconds rather than
minutes.
2. What Berkeley DB XML Version? Any optional configuration flags
specified? Are you running with any special patches? Please specify?
Berkeley DB XML Version: 2.4.16.1
Configuration flags: enable-java -b 64 prefix=/usr/local/BerkeleyDBXML-2.4.16
No special patches have been applied.
3. What Berkeley DB Version? Any optional configuration flags
specified? Are you running with any special patches? Please Specify.
Berkeley DB Version? 4.6.21
Configuration flags: None. The Berkeley DB was built and installed as part of the
Oracle Berkeley XML database build and installation process.
No special patches have been applied.
4. Processor name, speed and chipset?
Intel Core 2 CPU 6400 @ 2.13 GHz (1066 FSB) (4MB Cache)
5. Operating System and Version?
Ubuntu Linux 8.04 (Hardy) with the 2.6.24-23 generic kernel.
6. Disk Drive Type and speed?
300 GB 7200RPM hard drive.
7. File System Type? (such as EXT2, NTFS, Reiser)
EXT3
8. Physical Memory Available?
Memory: 3.8GB DDR2 SDRAM
9. Are you using Replication (HA) with Berkeley DB XML? If so, please
describe the network you are using, and the number of Replica’s.
No.
10. Are you using a Remote Filesystem (NFS) ? If so, for which
Berkeley DB XML/DB files?
No.
11. What type of mutexes do you have configured? Did you specify
–with-mutex=? Specify what you find inn your config.log, search
for db_cv_mutex?
I did not specify -with-mutex when building the database.
config.log indicates:
db_cv_mutex=POSIX/pthreads/library/x86_64/gcc-assembly
12. Which API are you using (C++, Java, Perl, PHP, Python, other) ?
Which compiler and version?
I am using the Java API.
I am using the gcc 4.2.4 compiler.
I am using the g++ 4.2.4 compiler.
13. If you are using an Application Server or Web Server, please
provide the name and version?
I am using the Tomcat 5.5 application server.
It is not using the Apache Portable Runtime library.
It is being run using a 64 bit version of the Sun Java 1.5 JRE.
14. Please provide your exact Environment Configuration Flags (include
anything specified in you DB_CONFIG file)
I do not have a DB_CONFIG file in the database home directory.
My environment configuration is as follows:
Threaded = true
AllowCreate = true
InitializeLocking = true
ErrorStream = System.err
InitializeCache = true
Cache Size = 1024 * 1024 * 500
InitializeLogging = true
Transactional = false
TrickleCacheWrite = 20
15. Please provide your Container Configuration Flags?
My container configuration is done using the Java API.
The container creation code is:
XmlContainerConfig containerConfig = new XmlContainerConfig();
containerConfig.setStatisticsEnabled(true);
XmlContainer container = xmlManager.createContainer("container",containerConfig);I am guessing that this means that the only flag I have set is the one
that enables recording of statistics to use in query optimization.
I have no other container configuration information to provide.
16. How many XML Containers do you have?
I have one XML container.
The container has 2,729,465 documents.
The container is a node container rather than a wholedoc container.
Minimum document size is around 1Kb.
Maximum document size is around 50Kb.
Average document size is around 2Kb.
I am using document data as part of the XQueries being run. For
example, I condition query results upon the values of attributes
and elements in the stored documents.
The database has the following indexes:
xmlIndexSpecification = dataContainer.getIndexSpecification();
xmlIndexSpecification.replaceDefaultIndex("node-element-presence");
xmlIndexSpecification.addIndex(Constants.XBRLAPINamespace,"fragment","node-element-presence");
xmlIndexSpecification.addIndex(Constants.XBRLAPINamespace,"data","node-element-presence");
xmlIndexSpecification.addIndex(Constants.XBRLAPINamespace,"xptr","node-element-presence");
xmlIndexSpecification.addIndex("","stub","node-attribute-presence");
xmlIndexSpecification.addIndex("","index", "unique-node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XBRL21LinkNamespace,"label","node-element-substring-string");
xmlIndexSpecification.addIndex(Constants.GenericLabelNamespace,"label","node-element-substring-string");
xmlIndexSpecification.addIndex("","name","node-attribute-substring-string");
xmlIndexSpecification.addIndex("","parentIndex", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","uri", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","type", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","targetDocumentURI", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","targetPointerValue", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","absoluteHref", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","id","node-attribute-equality-string");
xmlIndexSpecification.addIndex("","value", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","arcroleURI", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","roleURI", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","name", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","targetNamespace", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","contextRef", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","unitRef", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","scheme", "node-attribute-equality-string");
xmlIndexSpecification.addIndex("","value", "node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XBRL21Namespace,"identifier", "node-element-equality-string");
xmlIndexSpecification.addIndex(Constants.XMLNamespace,"lang","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"label","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"from","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"to","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"type","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"arcrole","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"role","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XLinkNamespace,"label","node-attribute-equality-string");
xmlIndexSpecification.addIndex(Constants.XBRLAPILanguagesNamespace,"language","node-element-presence");
xmlIndexSpecification.addIndex(Constants.XBRLAPILanguagesNamespace,"code","node-element-equality-string");
xmlIndexSpecification.addIndex(Constants.XBRLAPILanguagesNamespace,"value","node-element-equality-string");
xmlIndexSpecification.addIndex(Constants.XBRLAPILanguagesNamespace,"encoding","node-element-equality-string");17. Please describe the shape of one of your typical documents? Please
do this by sending us a skeleton XML document.
The following provides the basic information about the shape of all documents
in the data store.
<ns:fragment xmlns:ns="..." attrs...(about 20 of them)>
<ns:data>
    Single element that varies from document to document but that
    is rarely more than a few small elements in size and (in some cases)
    a lengthy section of string content for the single element.
</ns:data>
</ns:fragment>18. What is the rate of document insertion/update required or
expected? Are you doing partial node updates (via XmlModify) or
replacing the document?
Document insertion rates are not a first order performance criteria.
I do no document modifications using XmlModify.
When doing updates I replace the original document.
19. What is the query rate required/expected?
Not sure how to provide metrics for this but a single web page is
being generated, this can involve hundreds of queries. each of which
should be trivial to execute given the indexing strategy in use.
20. XQuery -- supply some sample queries
1. Please provide the Query Plan
2. Are you using DBXML_INDEX_NODES?
          I am using DBXML_INDEX_NODES by default because I
          am using a node container rather than a whole document
          container.
3. Display the indices you have defined for the specific query.
4. If this is a large query, please consider sending a smaller
query (and query plan) that demonstrates the problem.
Example queries.
1. collection('browser')/*[@parentIndex='none']
<XQuery>
<QueryPlanToAST>
    <LevelFilterQP>
      <StepQP axis="parent-of-attribute" uri="*" name="*" nodeType="element">
        <ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="parentIndex" value="none"/>
      </StepQP>
    </LevelFilterQP>
</QueryPlanToAST>
</XQuery>2. collection('browser')/*[@stub]
<XQuery>
<QueryPlanToAST>
    <LevelFilterQP>
      <StepQP axis="parent-of-attribute" uri="*" name="*" nodeType="element">
        <PresenceQP container="browser" index="node-attribute-presence-none" operation="eq" child="stub"/>
      </StepQP>
    </LevelFilterQP>
</QueryPlanToAST>
</XQuery>3. qplan "collection('browser')/*[@type='org.xbrlapi.impl.ConceptImpl' or @parentIndex='asdfv_3']"
<XQuery>
<QueryPlanToAST>
    <LevelFilterQP>
      <StepQP axis="parent-of-attribute" uri="*" name="*" nodeType="element">
        <UnionQP>
          <ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="type" value="org.xbrlapi.impl.ConceptImpl"/>
          <ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="parentIndex" value="asdfv_3"/>
        </UnionQP>
      </StepQP>
    </LevelFilterQP>
</QueryPlanToAST>
</XQuery>4.
setnamespace xlink http://www.w3.org/1999/xlink
qplan "collection('browser')/*[@uri='http://www.xbrlapi.org/my/uri' and */*[@xlink:type='resource' and @xlink:label='description']]"
<XQuery>
<QueryPlanToAST>
    <LevelFilterQP>
      <NodePredicateFilterQP uri="" name="#tmp8">
        <StepQP axis="parent-of-child" uri="*" name="*" nodeType="element">
          <StepQP axis="parent-of-child" uri="*" name="*" nodeType="element">
            <NodePredicateFilterQP uri="" name="#tmp1">
              <StepQP axis="parent-of-attribute" uri="*" name="*" nodeType="element">
                <ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="label:http://www.w3.org/1999/xlink"
                value="description"/>
              </StepQP>
              <AttributeJoinQP>
                <VariableQP name="#tmp1"/>
                <ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="type:http://www.w3.org/1999/xlink"
                value="resource"/>
              </AttributeJoinQP>
            </NodePredicateFilterQP>
          </StepQP>
        </StepQP>
        <AttributeJoinQP>
          <VariableQP name="#tmp8"/>
          <ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="uri" value="http://www.xbrlapi.org/my/uri"/>
        </AttributeJoinQP>
      </NodePredicateFilterQP>
    </LevelFilterQP>
</QueryPlanToAST>
</XQuery>21. Are you running with Transactions? If so please provide any
transactions flags you specify with any API calls.
I am not running with transactions.
22. If your application is transactional, are your log files stored on
the same disk as your containers/databases?
The log files are stored on the same disk as the container.
23. Do you use AUTO_COMMIT?
Yes. I think that it is a default feature of the DocumentConfig that
I am using.
24. Please list any non-transactional operations performed?
I do document insertions and I do XQueries.
25. How many threads of control are running? How many threads in read
only mode? How many threads are updating?
One thread is updating. Right now one thread is running queries. I am
not yet testing the web application with concurrent users given the
performance issues faced with a single user.
26. Please include a paragraph describing the performance measurements
you have made. Please specifically list any Berkeley DB operations
where the performance is currently insufficient.
I have loaded approximately 7 GB data into the container and then tried
to run the web application using that data. This involves running a broad
range of very simple queries, all of which are expected to be supported
by indexes to ensure that they do not require XML document traversal activity.
Querying performance is insufficient, with even the most basic queries
taking several minutes to complete.
27. What performance level do you hope to achieve?
I hope to be able to run a web application that simultaneously handles
page requests from hundreds of users, each of which involves a large
number of database queries.
28. Please send us the output of the following db_stat utility commands
after your application has been running under "normal" load for some
period of time:
% db_stat -h database environment -c
1038     Last allocated locker ID
0x7fffffff     Current maximum unused locker ID
9     Number of lock modes
1000     Maximum number of locks possible
1000     Maximum number of lockers possible
1000     Maximum number of lock objects possible
155     Number of current locks
157     Maximum number of locks at any one time
200     Number of current lockers
200     Maximum number of lockers at any one time
13     Number of current lock objects
17     Maximum number of lock objects at any one time
1566M     Total number of locks requested (1566626558)
1566M     Total number of locks released (1566626403)
0     Total number of locks upgraded
852     Total number of locks downgraded
3     Lock requests not available due to conflicts, for which we waited
0     Lock requests not available due to conflicts, for which we did not wait
0     Number of deadlocks
0     Lock timeout value
0     Number of locks that have timed out
0     Transaction timeout value
0     Number of transactions that have timed out
712KB     The size of the lock region
21807     The number of region locks that required waiting (0%)
% db_stat -h database environment -l
0x40988     Log magic number
13     Log version number
31KB 256B     Log record cache size
0     Log file mode
10Mb     Current log file size
0     Records entered into the log
28B     Log bytes written
28B     Log bytes written since last checkpoint
1     Total log file I/O writes
0     Total log file I/O writes due to overflow
1     Total log file flushes
0     Total log file I/O reads
1     Current log file number
28     Current log file offset
1     On-disk log file number
28     On-disk log file offset
1     Maximum commits in a log flush
0     Minimum commits in a log flush
96KB     Log region size
0     The number of region locks that required waiting (0%)
% db_stat -h database environment -m
500MB     Total cache size
1     Number of caches
1     Maximum number of caches
500MB     Pool individual cache size
0     Maximum memory-mapped file size
0     Maximum open file descriptors
0     Maximum sequential buffer writes
0     Sleep after writing maximum sequential buffers
0     Requested pages mapped into the process' address space
1749M     Requested pages found in the cache (99%)
722001     Requested pages not found in the cache
911092     Pages created in the cache
722000     Pages read into the cache
4175142     Pages written from the cache to the backing file
1550811     Clean pages forced from the cache
19568     Dirty pages forced from the cache
3     Dirty pages written by trickle-sync thread
62571     Current total page count
62571     Current clean page count
0     Current dirty page count
65537     Number of hash buckets used for page location
1751M     Total number of times hash chains searched for a page (1751388600)
8     The longest hash chain searched for a page
3126M     Total number of hash chain entries checked for page (3126038333)
4535     The number of hash bucket locks that required waiting (0%)
278     The maximum number of times any hash bucket lock was waited for (0%)
1     The number of region locks that required waiting (0%)
0     The number of buffers frozen
0     The number of buffers thawed
0     The number of frozen buffers freed
1633189     The number of page allocations
4301013     The number of hash buckets examined during allocations
259     The maximum number of hash buckets examined for an allocation
1570522     The number of pages examined during allocations
1     The max number of pages examined for an allocation
184     Threads waited on page I/O
Pool File: browser
8192     Page size
0     Requested pages mapped into the process' address space
1749M     Requested pages found in the cache (99%)
722001     Requested pages not found in the cache
911092     Pages created in the cache
722000     Pages read into the cache
4175142     Pages written from the cache to the backing file
% db_stat -h database environment -r
Not applicable.
% db_stat -h database environment -t
Not applicable.
vmstat
r b swpd free buff cache si so bi bo in cs us sy id wa
1 4 40332 773112 27196 1448196 0 0 173 239 64 1365 19 4 72 5
iostat
Linux 2.6.24-23-generic (dell)      06/02/09
avg-cpu: %user %nice %system %iowait %steal %idle
18.37 0.01 3.75 5.67 0.00 72.20
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 72.77 794.79 1048.35 5376284 7091504
29. Are there any other significant applications running on this
system? Are you using Berkeley DB outside of Berkeley DB XML?
Please describe the application?
No other significant applications are running on the system.
I am not using Berkeley DB outside of Berkeley DB XML.
The application is a web application that organises the data in
the stored documents into hypercubes that users can slice/dice and analyse.
Edited by: Geoff Shuetrim on Feb 7, 2009 2:23 PM to correct the appearance of the query plans.

Hi Geoff,
Thanks for filling out the performance questionnaire. Unfortunately the forum software seems to have destroyed some of your queries - you might want to use \[code\] and \[code\] to markup your queries and query plans next time.
Geoff Shuetrim wrote:
Current performance involves responses to simple queries that involve 1-2
minute turn around (this improves after a few similar queries have been run,
presumably because of caching, but not to a point that is acceptable for
web applications).
Desired performance is for queries to execute in milliseconds rather than
minutes.I think that this is a reasonable expectation in most cases.
14. Please provide your exact Environment Configuration Flags (include
anything specified in you DB_CONFIG file)
I do not have a DB_CONFIG file in the database home directory.
My environment configuration is as follows:
Threaded = true
AllowCreate = true
InitializeLocking = true
ErrorStream = System.err
InitializeCache = true
Cache Size = 1024 * 1024 * 500
InitializeLogging = true
Transactional = false
TrickleCacheWrite = 20If you are performing concurrent reads and writes, you need to enable transactions in the both the environment and the container.
Example queries.
1. collection('browser')/*[@parentIndex='none']
<XQuery>
<QueryPlanToAST>
<LevelFilterQP>
<StepQP axis="parent-of-attribute" uri="*" name="*" nodeType="element">
<ValueQP container="browser" index="node-attribute-equality-string" operation="eq" child="parentIndex" value="none"/>
</StepQP>
</LevelFilterQP>
</QueryPlanToAST>
</XQuery>
I have two initial observations about this query:
1) It looks like it could return a lot of results - a query that returns a lot of results will always be slow. If you only want a subset of the results, use lazy evalulation, or put an explicit call to the subsequence() function in the query.
2) An explicit element name with an index on it often performs faster than a "*" step. I think you'll get faster query execution if you specify the document element name rather than "*", and then add a "node-element-presence" index on it.
3) Generally descendant axis is faster than child axis. If you just need the document rather than the document (root) element, you might find that this query is a little faster (any document with a "parentIndex" attribute whose value is "none"):
collection()[descendant::*/@parentIndex='none']Similar observations apply to the other queries you posted.
Get back to me if you're still having problems with specific queries.
John

How do I remove a DB from shared memory in Solaris 10?

I'm having trouble removing an in-memory database placed in shared memory. I set SHM key and cache size, and then open an environment with flags: DB_CREATE | DB_SYSTEM_MEM | DB_INIT_MPOOL | DB_INIT_LOG | DB_INIT_LOCK | DB_INIT_TXN. I also set the flag DB_TXN_NOSYNC on the DbEnv. At the end, after closing all Db and DbEnv handles, I create a new DbEnv instance and call DbEnv::remove. That's when things get weird.
If I have the force flag set to 0, then it throws an exception saying "DbEnv::remove: Device busy". The shared memory segments do not get removed in this case (checking with `ipcs -bom`).
When the force flag is set to zero, the shared memory is released but the program crashes saying "Db::close: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery".
What am I doing wrong?

This is curious, since a simple program similar to what is described is known to work. I've modified the standard sample program examples/cxx/EnvExample.cpp C++ to use an in-memory database, DB_SYSTEM_MEM, and DB_TXN_NOSYNC. The "Device busy" symptom occurs if the close of the environment handle is bypassed. I have not been able to reproduce the DB_RUNRECOVERY error.
How does the program's use of Berkeley DB different from what is provided in EnvExample.cpp?
Is it possible to send me the relevant portions of it?
Regards,
Charles Koester
Oracle Berkeley DB

Contracts - 'Document still being processed in the background; try again la

Hello,
When trying to make adjustments to multiple contracts we are receiving the error message 'Document is still being processed in the background; try again later.' We have tried again later to no effect.
Ultimately, this means that these contracts (30 altogether) cannot be altered in anyway i.e. held, deleted, released, approved and it means that they just sit there.
This also has implications when raising another contract to replace these contracts. For example if you want to use the same vendor this will cause great confusion for our users when trying to select the correct correct to spend money against when raising purchase orders. Often the result is the user selecting the wrong contract.
Have you any suggestions on why this error message is appearing?
Many Thanks,
Sarah

1465740 - SRM document is stuck in the approval process
Symptom
Workflow hangs.
Workflow gets stuck.
SRM document in status waiting.
Approval process cannot continue.
SRM document frozen in status 'awaiting approval'.
BBP_PD 443 Document is still being processed in the background; Try again later
BBP_ICON_TEXTS 049 No workflow started. Application error occurred
Environment
SAP Supplier Relationship Management
Reproducing the Issue
Create or change a document in SRM.
Cause
*Not possible to identify root cause without further deeper analysis
Resolution
Find the object guid of your SRM document via transaction BBP_PD.
Then display the relevant workflow instance via transaction SWI6 using the following steps:
Select BOR object type.
Enter your object type.
Enter your guid (or document number in the case of shopping cart) in the Key field.
Select variant All Instances.
Select selection period All.
Hit execute.
Click on icon 'Display workflow log'.
Click on icon 'List with technical details'.
Click on icon 'Print Log'.
Record the date and time of the very last step in the workflow log.
Using the date and time of the last workflow step, here are some transactions that can be used to investigate the root cause of the problem.
SM58 - check for stuck remote function calls
ST22 - check for system dumps
SM13 - check for stuck update requests
SM21 - check in the system log for anything unusual
SWUD - run a consistency check on the workflow
Enter the task number and hit enter e.g. 14000044
Choose 'Test Environment' and flag 'Including parts'
Execute
Choose 'Consistency check for components' and flag 'Including subworkflows'
Execute
Red lights indicate an error in the definition.
Header Data
Released on 05.05.2010 09:53:08
Release status Released to Customer
Component SRM-EBP-WFL Workflow
Priority Normal
Category Problem

Environment DB_PRIVATE flag

Similar Messages

Maybe you are looking for