Dbe- open and db- put returning ENOMEM

I am running into a problem with bdb 4.7.25 returning ENOMEM on db->put and dbe->open at apparently random times. There are four different environments, two of which are entirely in memory including the logs, and the other two are disk backed with in memory logs. At anyone time there are at least 4 processes sharing these environments doing reads and writes. They are all using transactions for atomicity and concurrency. I am not worried about durability, if the application crashes it is acceptable for the databases to be recreated. The in memory dbs have secondary indexes.
The application has a mechanism for recycling these processes by sending a signal to have them exit gracefully (all dbs and dbes are being closed) or depending on its function closing and reopening the environment and dbs. The ENOMEMs seem to show up only when this recycle happens. For debugging purposes the recycle interval was set to 10 seconds and this causes many ENOMEMs to begin popping up. But they do not show up on every dbe->open/db->put call, just some. Having a process exit gracefully when it detects an ENOMEM seems to clear up the problem until another recycle triggers it again.
I have serialized the DBE and DB open and close calls with locks. A previous bug required this as concurrent dbe and/or db open calls was causing problems. But It is possible that one process might be doing a read/write while another process is opening the same environment.
The ENOMEMs have occurred in both the full in memory dbes as well as the disk backed dbes.
DBE Configuration:
DB_INIT_TXN | DB_INIT_LOCK | DB_INIT_MPOOL | DB_CREATE | DB_TXN_NOSYNC | DB_SYSTEM_MEM
dbe->log_set_config( dbe, DB_LOG_IN_MEMORY, 1 );
DB Configuration:
DB_AUTO_COMMIT | ( DB_CREATE but only at initial startup of the application, the recycling processes don’t set this )
mpf->set_flags(mpf, DB_MPOOL_NOFILE, 1);
I am doing one tricky thing with the db:
db->flags |= DB_AM_NOT_DURABLE
This is an internal flag that I found digging around the source. Without this our in disk backed dbs were triggering “file unknown has LSN 3/128061, past end of log at 1/28” errors when the shared memory was cleared between application stop/starts (ie. reboots) due to the logs being in memory. If there is a better way of handling this I’d love to know.
Also, I occasionally see what seems to be an internal deadlock where all processes are sitting ona futex() call as seen by strace.
Has anyone seen this behavior before?

Oh I also forgot to mention that BDB does not log an error through our msg callback. Also using dbe->set_verbose() doesn't give us any clues as to what is happening when ENOMEM is being returned.

Similar Messages

Maybe you are looking for