BDB v5.0.73 - EnvironmentFailureException: (JE 5.0.73) JAVA_ERROR

Hi there!
Im using java berkeley db as caching tool between two applications (backend application is quite slow so we cache some requests). We are caching xml data (one request xml can be ~4MB), if xml data exists we update the cache entry, if does not exists we insert new entry. there can be many concurrent hits same time.
Currently I tested with JMeter and with 10 Threads all works fine but if I increase to 20 Threads following error occurs:
+2013-05-14 15:31:15,914 [ERROR] CacheImpl - error occured while trying to get data from cache.+
com.sleepycat.je.EnvironmentFailureException: (JE 5.0.73) JAVA_ERROR: Java Error occurred, recovery may not be possible. fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0
+     at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1507)+
+     at com.sleepycat.je.Environment.checkEnv(Environment.java:2185)+
+     at com.sleepycat.je.Environment.beginTransactionInternal(Environment.java:1313)+
+     at com.sleepycat.je.Environment.beginTransaction(Environment.java:1284)+
+     at com.ebcont.redbull.bullchecker.cache.impl.CacheImpl.get(CacheImpl.java:157)+
+     at com.ebcont.redbull.bullchecker.handler.EndpointHandler.doPerform(EndpointHandler.java:132)+
+     at com.ebcont.redbull.bullchecker.WSCacheEndpointServlet.doPost(WSCacheEndpointServlet.java:86)+
+     at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)+
+     at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)+
+     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)+
+     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)+
+     at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)+
+     at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)+
+     at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)+
+     at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)+
+     at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)+
+     at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)+
+     at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)+
+     at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)+
+     at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)+
+     at java.lang.Thread.run(Unknown Source)+
Caused by: java.lang.OutOfMemoryError: Java heap space
+2013-05-14 15:31:15,939 [ERROR] CacheImpl - error occured while trying to get data from cache.+
com.sleepycat.je.EnvironmentFailureException: (JE 5.0.73) JAVA_ERROR: Java Error occurred, recovery may not be possible. fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0 fetchTarget of 0x11/0x1d1d parent IN=8 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x1b/0x4cd lastLoggedVersion=0x1b/0x4cd parent.getDirty()=false state=0
+     at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1507)+
+     at com.sleepycat.je.Environment.checkEnv(Environment.java:2185)+
+     at com.sleepycat.je.Environment.beginTransactionInternal(Environment.java:1313)+
+     at com.sleepycat.je.Environment.beginTransaction(Environment.java:1284)+
+     at com.ebcont.redbull.bullchecker.cache.impl.CacheImpl.get(CacheImpl.java:157)+
+     at com.ebcont.redbull.bullchecker.handler.EndpointHandler.doPerform(EndpointHandler.java:132)+
+     at com.ebcont.redbull.bullchecker.WSCacheEndpointServlet.doPost(WSCacheEndpointServlet.java:86)+
+     at javax.servlet.http.HttpServlet.service(HttpServlet.java:637)+
+     at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)+
+     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)+
+     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)+
+     at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)+
+     at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)+
+     at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)+
+     at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)+
+     at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)+
+     at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)+
+     at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)+
+     at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)+
+     at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)+
+     at java.lang.Thread.run(Unknown Source)+
after restarting the server I get following error while trying to get data from cache:
java.lang.OutOfMemoryError: Java heap space
+     at com.sleepycat.je.log.LogUtils.readBytesNoLength(LogUtils.java:365)+
+     at com.sleepycat.je.tree.LN.readFromLog(LN.java:786)+
+     at com.sleepycat.je.log.entry.LNLogEntry.readBaseLNEntry(LNLogEntry.java:196)+
+     at com.sleepycat.je.log.entry.LNLogEntry.readEntry(LNLogEntry.java:130)+
+     at com.sleepycat.je.log.LogManager.getLogEntryFromLogSource(LogManager.java:1008)+
+     at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:848)+
+     at com.sleepycat.je.log.LogManager.getLogEntryAllowInvisibleAtRecovery(LogManager.java:809)+
+     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1412)+
+     at com.sleepycat.je.tree.BIN.fetchTarget(BIN.java:1251)+
+     at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2261)+
+     at com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1466)+
+     at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1593)+
+     at com.sleepycat.je.Cursor.retrieveNextAllowPhantoms(Cursor.java:2924)+
+     at com.sleepycat.je.Cursor.retrieveNextNoDups(Cursor.java:2801)+
+     at com.sleepycat.je.Cursor.retrieveNext(Cursor.java:2775)+
+     at com.sleepycat.je.Cursor.getNextNoDup(Cursor.java:1244)+
+     at com.ebcont.redbull.bullchecker.cache.impl.BDBCacheImpl.getStoredKeys(BDBCacheImpl.java:244)+
+     at com.ebcont.redbull.bullchecker.CacheStatisticServlet.doPost(CacheStatisticServlet.java:108)+
+     at com.ebcont.redbull.bullchecker.CacheStatisticServlet.doGet(CacheStatisticServlet.java:74)+
+     at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)+
+     at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)+
+     at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)+
+     at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)+
+     at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)+
+     at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)+
+     at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)+
+     at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)+
+     at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)+
+     at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)+
+     at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)+
+     at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)+
+     at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)+
my bdb configuration:
environmentConfig.setReadOnly(false);
databaseConfig.setReadOnly(false);
environmentConfig.setAllowCreate(true);
databaseConfig.setAllowCreate(true);
environmentConfig.setTransactional(true);
databaseConfig.setTransactional(true);
environmentConfig.setCachePercent(60);
environmentConfig.setLockTimeout(2000, TimeUnit.MILLISECONDS);
environmentConfig.setCacheMode(CacheMode.DEFAULT);
environment path: C:/tmp/berkeleydb
Tomcat JVM Parameters: Initial Memory Pool: 1024
Maximum Memory Pool: 2048
Server: Windows Server 2008
Memory: 8 GB
Edited by: 1005842 on 14.05.2013 07:22
Edited by: 1005842 on 14.05.2013 07:23
Edited by: 1005842 on 14.05.2013 07:37

Hi,
The stack trace shows an OOME error due to running out of heap space.
Could you detail what is the exact Java version you are using, on what OS, and what are the JVM options, in particular the max heap size (-Xmx), you are using?
Also, what is the JE cache size you use? (if you do not set any of the MAX_MEMORY or MAX_MEMORY_PERCENT then the JE cache size will default to 60% of the JVM max heap size)
You should look into the way you are using transactions, cursors etc. It might be possible that you are using long running transactions that accumulate a large number of locks, or you might be opening more and more transactions without closing/completing them (by aborting or committing them). Is any of this the case for your application? You can check the lock and transaction statistics using Environment.getStats() and respectively Environment.getTransactionStats().
Aside properly ending/closing transactions and cursors, you should also examine your cache statistics to understand the memory profile. See the following documentation sections on this:
http://docs.oracle.com/cd/E17277_02/html/GettingStartedGuide/cachesize.html
http://www.oracle.com/technetwork/database/berkeleydb/je-faq-096044.html#HowcanIestimatemyapplicationsoptimalcachesize
http://www.oracle.com/technetwork/database/berkeleydb/je-faq-096044.html#WhyshouldtheJEcachebelargeenoughtoholdtheBtreeinternalnodes
Regards,
Andrei

Similar Messages

LOG_FILE_NOT_FOUND bug possible in current BDB JE?

I've seen references to the LOG_FILE_NOT_FOUND bug in older BDB JE versions (4.x and 5 <= 5.0.34(, however, I seem to be suffering something similar with 5.0.48.
I have a non-transactional, deferred-write DB that seems to have gotten itself into an inconsistent state. It was fine loading several million records, but after ~8 hours of operation, bailed out with:
com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 5.0.55) /tmp/data/index fetchTarget of 0x9f1/0x24d34eb parent IN=44832 IN class=com.sleepycat.je.tree.BIN lastFullVersion=0xdcf/0x5a96c91 lastLoggedVersion=0xdcf/0x5a96c91 parent.getDirty()=true state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1429)
     at com.sleepycat.je.tree.BIN.fetchTarget(BIN.java:1251)
     at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2229)
     at com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1434)
     at com.sleepycat.je.Cursor.searchInternal(Cursor.java:2716)
     at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2576)
     at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:2430)
     at com.sleepycat.je.Cursor.search(Cursor.java:2397)
     at com.sleepycat.je.Database.get(Database.java:1042)
     at com.xxxx.db.BDBCalendarStorageBackend.indexCalendar(BDBCalendarStorageBackend.java:95)
     at com.xxxx.indexer.TicketIndexer.indexDeltaLogs(TicketIndexer.java:201)
     at com.xxxx.indexer.DeltaLogLoader.run(DeltaLogLoader.java:87)
Caused by: java.io.FileNotFoundException: /tmp/data/index/000009f1.jdb (No such file or directory)
     at java.io.RandomAccessFile.open(Native Method)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:101)
     at com.sleepycat.je.log.FileManager$6.<init>(FileManager.java:1282)
     at com.sleepycat.je.log.FileManager.openFileHandle(FileManager.java:1281)
     at com.sleepycat.je.log.FileManager.getFileHandle(FileManager.java:1147)
     at com.sleepycat.je.log.LogManager.getLogSource(LogManager.java:1102)
     at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:808)
     at com.sleepycat.je.log.LogManager.getLogEntryAllowInvisibleAtRecovery(LogManager.java:772)
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1412)
     ... 11 more
Subsequent opens/use on the DB pretty much instantly yield the same error. I tried upgrading to 5.0.55 (hence the ver in the output above) but still get the same error.
As a recovery attempt, I used DbDump to try to dump the DB, however, it failed with a similar error. Enabling salvage mode enabled me to successfuly dump it, however, reloading it into a clean environment by programmatically running DbLoad.load() (so I can setup my env) caused the following error (after about 30% of the DB has restored):
Exception in thread "main" com.sleepycat.je.EnvironmentFailureException: (JE 5.0.55) Node 11991 should have been split before calling insertEntry UNEXPECTED_STATE: Unexpected internal state, may have side effects. fetchTarget of 0x25/0x155a822 parent IN=2286 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x3e/0x118d8f6 lastLoggedVersion=0x3e/0x118d8f6 parent.getDirty()=false state=0
     at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:376)
     at com.sleepycat.je.tree.IN.insertEntry1(IN.java:2326)
     at com.sleepycat.je.tree.IN.insertEntry(IN.java:2296)
     at com.sleepycat.je.tree.BINDelta.reconstituteBIN(BINDelta.java:216)
     at com.sleepycat.je.tree.BINDelta.reconstituteBIN(BINDelta.java:144)
     at com.sleepycat.je.log.entry.BINDeltaLogEntry.getIN(BINDeltaLogEntry.java:53)
     at com.sleepycat.je.log.entry.BINDeltaLogEntry.getResolvedItem(BINDeltaLogEntry.java:43)
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1422)
     at com.sleepycat.je.tree.Tree.searchSubTreeUntilSplit(Tree.java:1786)
     at com.sleepycat.je.tree.Tree.searchSubTreeSplitsAllowed(Tree.java:1729)
     at com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1296)
     at com.sleepycat.je.tree.Tree.findBinForInsert(Tree.java:2205)
     at com.sleepycat.je.dbi.CursorImpl.putInternal(CursorImpl.java:834)
     at com.sleepycat.je.dbi.CursorImpl.put(CursorImpl.java:779)
     at com.sleepycat.je.Cursor.putAllowPhantoms(Cursor.java:2243)
     at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2200)
     at com.sleepycat.je.Cursor.putNotify(Cursor.java:2117)
     at com.sleepycat.je.Cursor.putNoDups(Cursor.java:2052)
     at com.sleepycat.je.Cursor.putInternal(Cursor.java:2020)
     at com.sleepycat.je.Database.putInternal(Database.java:1324)
     at com.sleepycat.je.Database.put(Database.java:1194)
     at com.sleepycat.je.util.DbLoad.loadData(DbLoad.java:544)
     at com.sleepycat.je.util.DbLoad.load(DbLoad.java:414)
     at com.xxxx.db.BDBCalendarStorageBackend.loadBDBDump(BDBCalendarStorageBackend.java:254)
     at com.xxxx.cli.BDBTool.run(BDBTool.java:49)
     at com.xxxx.cli.AbstractBaseCommand.execute(AbstractBaseCommand.java:114)
     at com.xxxx.cli.BDBTool.main(BDBTool.java:69)
The only other slightly exotic thing I'm using is a custom partial BTree comparator, however, it quite happily loaded/updated literally tens of millions of records for hours before the FileNotFound error cropped up, so it seems unlikely that would be the cause.
Any ideas?
Thanks in advance,
fb.

Thanks heaps to Mark for working through this with me.You're welcome. Thanks for following up and explaining it for the benefit of others. And I'm very glad it wasn't a JE bug!
My solution is to switch to using a secondary database for providing differentiated "uniqueness" vs "ordering".An index for uniqueness may be a good solution. But as you said in email, it adds significant overhead (memory and disk). This overhead can be minimized by keeping your keys (primary and secondary) as small as possible, and enabling key prefixing.
I'd also like to point out that adding a secondary isn't always the best choice. For example, if the number of keys with the same C1 value is fairly small, another way of checking for uniqueness (when inserting) is to iterate over them, looking for a match on C1:C3. The cost of this iteration may be less than the cost of maintaining a uniqueness index. To make this work, you'll have to use Serializable isolation during the iteration, to prevent another thread from inserting a key in that range.
If you're pushing the performance limits of your hardware, it may be worth trying more than one such approach and comparing the performance. If performance is not a big concern, then the additional index is the simplest approach to get right.
--mark

EnvironmentFailureException thrown while recovering the database!

While recovering the database, an EnvironmentFailureException with LOG_FILE_NOT_FOUND was thrown. The exception was thrown after some data was recovered, and the left data can not be recovered because of the EnvironmentFailureException.
I upgraded the je to 4.1.7, but the data still can not be recovered!
Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 4.0.92) /home/admin/shopcenter/cdncleaner fetchTarget of 0x64/0x3b8f73 parent IN=8811763 IN class=com.sleepycat.je.tree.IN lastFullVersion=0xffffffff/0xffffffff parent.getDirty()=true state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1241)
     at com.sleepycat.je.tree.Tree.searchSubTreeInternal(Tree.java:1858)
     at com.sleepycat.je.tree.Tree.searchSubTree(Tree.java:1682)
     at com.sleepycat.je.tree.Tree.search(Tree.java:1548)
     at com.sleepycat.je.dbi.CursorImpl.searchAndPosition(CursorImpl.java:2054)
     at com.sleepycat.je.Cursor.searchInternal(Cursor.java:2088)
     at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2058)
     at com.sleepycat.je.Cursor.search(Cursor.java:1926)
     at com.sleepycat.je.Cursor.getSearchKey(Cursor.java:1351)
     at com.sleepycat.util.keyrange.RangeCursor.doGetSearchKey(RangeCursor.java:966)
     at com.sleepycat.util.keyrange.RangeCursor.getSearchKey(RangeCursor.java:593)
     at com.sleepycat.collections.DataCursor.doGetSearchKey(DataCursor.java:571)
     at com.sleepycat.collections.DataCursor.initForPut(DataCursor.java:812)
     at com.sleepycat.collections.DataCursor.put(DataCursor.java:752)
     at com.sleepycat.collections.StoredContainer.putKeyValue(StoredContainer.java:322)
     at com.sleepycat.collections.StoredMap.put(StoredMap.java:280)
     at com.taobao.shopservice.picture.core.util.BdbStoredQueueImpl.offer(BdbStoredQueueImpl.java:118)
     at com.taobao.shopservice.picture.core.service.CdnClearServiceImpl.clearCdnCache(CdnClearServiceImpl.java:45)
     at sun.reflect.GeneratedMethodAccessor484.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
     at com.taobao.shopservice.common.monitor.ProfileInterceptor.invoke(ProfileInterceptor.java:26)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
     at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
     at $Proxy74.clearCdnCache(Unknown Source)
     at com.taobao.shopservice.picture.core.service.PictureWriteServiceImpl.movePicturesToRecycleBin(PictureWriteServiceImpl.java:302)
     at com.taobao.shopservice.picture.core.service.PictureWriteServiceImpl.deletePictures(PictureWriteServiceImpl.java:207)
     at sun.reflect.GeneratedMethodAccessor483.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
     at com.taobao.shopservice.common.monitor.ProfileInterceptor.invoke(ProfileInterceptor.java:26)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
     at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
     at $Proxy77.deletePictures(Unknown Source)
     at sun.reflect.GeneratedMethodAccessor482.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest0(ProviderProcessor.java:222)
     at com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:174)
     at com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:41)
     at com.taobao.remoting.impl.DefaultMsgListener$1ProcessorExecuteTask.run(DefaultMsgListener.java:131)
     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.FileNotFoundException: /home/admin/shopcenter/cdncleaner/00000064.jdb (No such file or directory)
     at java.io.RandomAccessFile.open(Native Method)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
     at com.sleepycat.je.log.FileManager$1.<init>(FileManager.java:993)
     at com.sleepycat.je.log.FileManager.openFileHandle(FileManager.java:992)
     at com.sleepycat.je.log.FileManager.getFileHandle(FileManager.java:888)
     at com.sleepycat.je.log.LogManager.getLogSource(LogManager.java:1073)
     at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:779)
     at com.sleepycat.je.log.LogManager.getLogEntryAllowInvisibleAtRecovery(LogManager.java:743)
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1225)
     ... 49 more
2011-03-24 00:00:27,967 INFO [org.quartz.core.JobRunShell] Job DEFAULT.cdnCleanerJobDetail threw a JobExecutionException:
org.quartz.JobExecutionException: Invocation of method 'clearCdn' on target class [class com.taobao.shopservice.picture.core.job.clearcdn.CdnCleaner] failed [See nested exception: com.sleepycat.je.EnvironmentFailureException: (JE 4.0.92) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 4.0.92) /home/admin/shopcenter/cdncleaner fetchTarget of 0x64/0x3b8f73 parent IN=8811763 IN class=com.sleepycat.je.tree.IN lastFullVersion=0xffffffff/0xffffffff parent.getDirty()=true state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.]
     at sun.reflect.GeneratedConstructorAccessor102.newInstance(Unknown Source)
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
     at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
     at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:85)
     at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:283)
     at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
     at org.quartz.core.JobRunShell.run(JobRunShell.java:203)
     at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:520)
* Nested Exception (Underlying Cause) ---------------
com.sleepycat.je.EnvironmentFailureException: (JE 4.0.92) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 4.0.92) /home/admin/shopcenter/cdncleaner fetchTarget of 0x64/0x3b8f73 parent IN=8811763 IN class=com.sleepycat.je.tree.IN lastFullVersion=0xffffffff/0xffffffff parent.getDirty()=true state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.
     at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:197)
     at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1403)
     at com.sleepycat.je.Database.checkEnv(Database.java:1772)
     at com.sleepycat.je.Database.openCursor(Database.java:619)
     at com.sleepycat.collections.CurrentTransaction.openCursor(CurrentTransaction.java:416)
     at com.sleepycat.collections.MyRangeCursor.openCursor(MyRangeCursor.java:54)
     at com.sleepycat.collections.MyRangeCursor.<init>(MyRangeCursor.java:30)
     at com.sleepycat.collections.DataCursor.init(DataCursor.java:171)
     at com.sleepycat.collections.DataCursor.<init>(DataCursor.java:59)
     at com.sleepycat.collections.StoredContainer.getValue(StoredContainer.java:301)
     at com.sleepycat.collections.StoredMap.get(StoredMap.java:241)
     at com.taobao.shopservice.picture.core.util.BdbStoredQueueImpl.peek(BdbStoredQueueImpl.java:131)
     at com.taobao.shopservice.picture.core.util.BdbStoredQueueImpl.poll(BdbStoredQueueImpl.java:169)
     at com.taobao.shopservice.picture.core.job.clearcdn.CdnCleaner.clearCdn(CdnCleaner.java:194)
     at sun.reflect.GeneratedMethodAccessor641.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:283)
     at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:272)
     at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
     at org.quartz.core.JobRunShell.run(JobRunShell.java:203)
     at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:520)
Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 4.0.92) /home/admin/shopcenter/cdncleaner fetchTarget of 0x64/0x3b8f73 parent IN=8811763 IN class=com.sleepycat.je.tree.IN lastFullVersion=0xffffffff/0xffffffff parent.getDirty()=true state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1241)
     at com.sleepycat.je.tree.Tree.searchSubTreeInternal(Tree.java:1858)
     at com.sleepycat.je.tree.Tree.searchSubTree(Tree.java:1682)
     at com.sleepycat.je.tree.Tree.search(Tree.java:1548)
     at com.sleepycat.je.dbi.CursorImpl.searchAndPosition(CursorImpl.java:2054)
     at com.sleepycat.je.Cursor.searchInternal(Cursor.java:2088)
     at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2058)
     at com.sleepycat.je.Cursor.search(Cursor.java:1926)
     at com.sleepycat.je.Cursor.getSearchKey(Cursor.java:1351)
     at com.sleepycat.util.keyrange.RangeCursor.doGetSearchKey(RangeCursor.java:966)
     at com.sleepycat.util.keyrange.RangeCursor.getSearchKey(RangeCursor.java:593)
     at com.sleepycat.collections.DataCursor.doGetSearchKey(DataCursor.java:571)
     at com.sleepycat.collections.DataCursor.initForPut(DataCursor.java:812)
     at com.sleepycat.collections.DataCursor.put(DataCursor.java:752)
     at com.sleepycat.collections.StoredContainer.putKeyValue(StoredContainer.java:322)
     at com.sleepycat.collections.StoredMap.put(StoredMap.java:280)
     at com.taobao.shopservice.picture.core.util.BdbStoredQueueImpl.offer(BdbStoredQueueImpl.java:118)
     at com.taobao.shopservice.picture.core.service.CdnClearServiceImpl.clearCdnCache(CdnClearServiceImpl.java:45)
     at sun.reflect.GeneratedMethodAccessor484.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
     at com.taobao.shopservice.common.monitor.ProfileInterceptor.invoke(ProfileInterceptor.java:26)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
     at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
     at $Proxy74.clearCdnCache(Unknown Source)
     at com.taobao.shopservice.picture.core.service.PictureWriteServiceImpl.movePicturesToRecycleBin(PictureWriteServiceImpl.java:302)
     at com.taobao.shopservice.picture.core.service.PictureWriteServiceImpl.deletePictures(PictureWriteServiceImpl.java:207)
     at sun.reflect.GeneratedMethodAccessor483.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
     at com.taobao.shopservice.common.monitor.ProfileInterceptor.invoke(ProfileInterceptor.java:26)
     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
     at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
     at $Proxy77.deletePictures(Unknown Source)
     at sun.reflect.GeneratedMethodAccessor482.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest0(ProviderProcessor.java:222)
     at com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:174)
     at com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:41)
     at com.taobao.remoting.impl.DefaultMsgListener$1ProcessorExecuteTask.run(DefaultMsgListener.java:131)
     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.FileNotFoundException: /home/admin/shopcenter/cdncleaner/00000064.jdb (No such file or directory)
     at java.io.RandomAccessFile.open(Native Method)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
     at com.sleepycat.je.log.FileManager$1.<init>(FileManager.java:993)
     at com.sleepycat.je.log.FileManager.openFileHandle(FileManager.java:992)
     at com.sleepycat.je.log.FileManager.getFileHandle(FileManager.java:888)
     at com.sleepycat.je.log.LogManager.getLogSource(LogManager.java:1073)
     at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:779)
     at com.sleepycat.je.log.LogManager.getLogEntryAllowInvisibleAtRecovery(LogManager.java:743)
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1225)
     ... 49 more

I mean that i can open the database and read some data from it, which was stored before the database was closed, and than the exception was thrown.
here is exception stack with JE 4.1.7:
com.sleepycat.je.EnvironmentFailureException: (JE 4.1.7) F:\job fetchTarget of 0x64/0x4735f7 parent IN=7847269 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x66/0x927f09 parent.getDirty()=false state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1337)
     at com.sleepycat.je.tree.IN.fetchTargetWithExclusiveLatch(IN.java:1278)
     at com.sleepycat.je.tree.Tree.getNextBinInternal(Tree.java:1358)
     at com.sleepycat.je.tree.Tree.getPrevBin(Tree.java:1240)
     at com.sleepycat.je.dbi.CursorImpl.getNextWithKeyChangeStatus(CursorImpl.java:1754)
     at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1617)
     at com.sleepycat.je.Cursor.retrieveNextAllowPhantoms(Cursor.java:2488)
     at com.sleepycat.je.Cursor.retrieveNext(Cursor.java:2304)
     at com.sleepycat.je.Cursor.getPrev(Cursor.java:1190)
     at com.ppsoft.bdb.test.Main.main(Main.java:52)
Caused by: java.io.FileNotFoundException: F:\job\00000064.jdb (No such file or directory)
     at java.io.RandomAccessFile.open(Native Method)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
     at com.sleepycat.je.log.FileManager$1.<init>(FileManager.java:995)
     at com.sleepycat.je.log.FileManager.openFileHandle(FileManager.java:994)
     at com.sleepycat.je.log.FileManager.getFileHandle(FileManager.java:890)
     at com.sleepycat.je.log.LogManager.getLogSource(LogManager.java:1074)
     at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:778)
     at com.sleepycat.je.log.LogManager.getLogEntryAllowInvisibleAtRecovery(LogManager.java:742)
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1320)
     ... 9 more

EnvironmentFailureException question

Hi,
I'm relatively new to BDB... I've (part-)written a distributed crawler which uses BDB to store a persistent on-disk queue of URIs. This is the setup.
+/* Open a transactional Berkeley DB engine environment. */+
EnvironmentConfig envConfig = new EnvironmentConfig();
envConfig.setAllowCreate(true);
envConfig.setTransactional(true);
_env = new Environment(envDir, envConfig);+
+/* Open a transactional entity store. */+
StoreConfig storeConfig = new StoreConfig();
storeConfig.setAllowCreate(true);
storeConfig.setTransactional(true);
store = new EntityStore(env, this.getClass().getSimpleName(), storeConfig);+
+/* Primary index of the queue */+
urlIndex = store.getPrimaryIndex(String.class, URLObject.class);+
+/* Secondary index of the queue */+
countIndex = store.getSecondaryIndex(_urlIndex, Integer.class, "count");+
The machine is running Java 1.6.0_12 on Debian 5.0.4. The environment directory is on a local partition (in fact, everything is pretty much local as far as BDB can see).
This setup seems to work quite well. However, after about 30 hours or crawling one server (of eight) dies with the initial exception:
+<DaemonThread name="Cleaner-1"/> caught exception: com.sleepycat.je.EnvironmentFailureException: (JE 4.0.71) /data/webdb/may10/crawl/q java.io.IOException: Input/output error LOG_READ: IOException on read, log is likely invalid. Environment is invalid and must be closed.+
com.sleepycat.je.EnvironmentFailureException: (JE 4.0.71) /data/webdb/may10/crawl/q java.io.IOException: Input/output error LOG_READ: IOException on read, log is likely invalid. Environment is invalid and must be closed.
at com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1516)
at com.sleepycat.je.log.FileReader$ReadWindow.fillFromFile(FileReader.java:1116)
at com.sleepycat.je.log.FileReader$ReadWindow.fillNext(FileReader.java:1074)
at com.sleepycat.je.log.FileReader.readData(FileReader.java:759)
at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:315)
at com.sleepycat.je.cleaner.FileProcessor.processFile(FileProcessor.java:396)
at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:236)
at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:141)
at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:161)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Input/output error
at java.io.RandomAccessFile.readBytes(Native Method)
at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
at com.sleepycat.je.log.FileManager.readFromFileInternal(FileManager.java:1551)
at com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1506)
+... 9 more+
Exiting
Followed by a plethora of similar exceptions for different lookup threads:
Exception in thread "LookupThread-XXX" ...
Also, in the je.info.0 file in the environment directory, I found the following:
+SEVERE [data/webdb/may10/crawl/q]Halted log file reading at file 0x997 offset 0x1edcb offset(decimal)=126411 prev=0x1ed84:+
entry=BINDeltatype=22,version=7)
prev=0x1ed84
size=886
Next entry should be at 0x1f14f
I'm generally at a loss as to why this happened. There's no obvious cause (such as no disk space, etc.) and it seems more-so that the index is corrupted. At the time of the exception, there is 1.5GB in the environment directory (~150 x 9.6M *.jdb files).
I cannot really reproduce the error/create a test-case, so I don't know if updating JE would help -- I'd like to have as much information on the bug as possible, and have implemented as sure a fix as possible before I try restarting the crawl.
Any help/thoughts greatly appreciated.

Aidan,
From the error message and the je.info logging you showed us, the complaint seems to be about the file /data/webdb/may10/crawl/q/00000997.jdb. But unfortunately, the Java IOException is not telling you much.
What's happened is that our daemon thread, the log cleaner, was going along, and it hit an IOException when reading that file. We just wrap the IOException, and send it back to up, and in this case, the exception is particularly uninformative ("Input/output exception"). Since we toString the exception, I think there's really not any more message. We've noticed that JDKs on different platforms can have more or less informative IOExceptions -- perhaps what's available underneath?
The string "LOG_READ: IOException on read, log is likely invalid. Environment is invalid and must be closed." is a wrapper from JE, and is potentially a bit alarmist. We take the conservative approach that if anything unexpected goes wrong with a read, we don't want to return bad data, and we shut down the whole environment. Other threads will note the invalidation of the environment and will also shut down. Tthough it can certainly always be a JE bug, in this case, it seems more like some kind of underlying and transient system issue.
JE does detect when it is able to do a read and it thinks the data is bad. In that case, you get an checksum exception. This is different, in that something killed off the read operation itself. Could something have sent an interrupt to the process? Though in those cases, we usually see an InterruptedException.
One thing you can do is to use the com.sleepycat.je.util.DbPrintLog utility to just read the afflicted file. Despite the JE suggestion of an invalid environment, in this case, it smells a little more like a transient platform I/O problem. You could do "java -jar je.jar DbPrintLog -h /data/webdb/may10/crawl/q -s 0x997 > temp.xml" and see if that file can be read. If it is successful, the utility will dump the contents of the log and you'll see a complete xml file, with no exception at the end. If it can, it would reinforce the likelihood that some transient IO incident occurred, and you may well be able to restart the environment with no problem.
Hope that give you a start,
Linda

EnvironmentFailureException

Hello!
I've upgraded my systems to use 6.0.11 instead of 5.0.97. Everything works fine, storage is 20+ percent more compact and so on, BUT.
1) Under load (all the time) on production server, after every 25-40 minutes uptime the
    > (JE 6.0.11) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 6.0.11) /usr/local/ae3/private/data/bdbj-lcl java.lang.AssertionError UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
    > com.sleepycat.je.EnvironmentFailureException
        > Environment invalid because of previous exception: (JE 6.0.11) /usr/local/ae3/private/data/bdbj-lcl java.lang.AssertionError UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
        > com.sleepycat.je.EnvironmentFailureException
            > null
            > java.lang.AssertionError
              : com.sleepycat.je.evictor.LRUEvictor.processTarget(LRUEvictor.java:2014)
              : com.sleepycat.je.evictor.LRUEvictor.findParentAndRetry(LRUEvictor.java:2182)
              : com.sleepycat.je.evictor.LRUEvictor.processTarget(LRUEvictor.java:2019)
              : com.sleepycat.je.evictor.LRUEvictor.evictBatch(LRUEvictor.java:1689)
              : com.sleepycat.je.evictor.LRUEvictor.doEvict(LRUEvictor.java:1538)
              : com.sleepycat.je.evictor.Evictor$BackgroundEvictTask.run(Evictor.java:739)
              : java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              : java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              : java.lang.Thread.run(Thread.java:724)
error appears and floods everything, I have to restart the instance and then it works another 30 minutes approx.
Same thing was happening on non heavy loaded (all) servers during log file upgrade to new format, timing was random tough. None of the environments that are bigger than 1G size were upgraded 'in one go' - I had to restart several times because of the same error.
2) I had to reduce the number of cleaner threads to 1 (config.setConfigParam( EnvironmentConfig.CLEANER_THREADS, "1" )) - otherwise it was not starting AT ALL, on every database instance it was failing on with words like "expect BEING_CLEANED but CLEANED"
I am using low-level BDB JE API with cursors and byte arrays. No secondary databases. No DPL. Does anyone experiencing anything like that?

2nd issue regarding @ I had to reduce the number of cleaner threads to 1 (config.setConfigParam( EnvironmentConfig.CLEANER_THREADS, "1" )) - otherwise it was not starting AT ALL, on every database instance it was failing on with words like "expect BEING_CLEANED but CLEANED"@ is 100% reproducible.
The main issue with com.sleepycat.je.evictor.LRUEvictor is happening on one production server currently (57G data, 3/4 of that rotates per week (new data added, old data cleaned)), but it was happening on other servers while upgrading from previous log format.
My workaround was:
if (environment != null && !environment.isValid()) {
       WorkerBdbj.LOG.event( "BDBJ-WORKER:FAILURE:FATAL",
            "Environment is invalid!",
            Convert.Throwable.toText( new IllegalStateException( "this:" + this + ", env:" + environment ) ) );
       try {
            environment.close();
       } catch (final Throwable t) {
            // ignore
       Runtime.getRuntime().exit( -37 );
Thanks for -da:com.sleepycat.je.evictor.LRUEvictor hint, anyway I wouldn't be able to assert it is safe to do so without your reply!
Several times I noticed different stack traces (normally it is like in initial post).
    > (JE 6.0.11) JAVA_ERROR: Java Error occurred, recovery may not be possible.
    > com.sleepycat.je.EnvironmentFailureException
        > null
        > java.lang.AssertionError
          : com.sleepycat.je.evictor.LRUEvictor.processTarget(LRUEvictor.java:2014)
          : com.sleepycat.je.evictor.LRUEvictor.findParentAndRetry(LRUEvictor.java:2182)
          : com.sleepycat.je.evictor.LRUEvictor.processTarget(LRUEvictor.java:2019)
          : com.sleepycat.je.evictor.LRUEvictor.evictBatch(LRUEvictor.java:1689)
          : com.sleepycat.je.evictor.LRUEvictor.doEvict(LRUEvictor.java:1538)
          : com.sleepycat.je.evictor.Evictor.doCriticalEviction(Evictor.java:469)
          : com.sleepycat.je.dbi.EnvironmentImpl.criticalEviction(EnvironmentImpl.java:2726)
          : com.sleepycat.je.dbi.CursorImpl.criticalEviction(CursorImpl.java:624)
          : com.sleepycat.je.Cursor.beginMoveCursor(Cursor.java:4217)
          : com.sleepycat.je.Cursor.beginMoveCursor(Cursor.java:4237)
          : com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2795)
          : com.sleepycat.je.Cursor.searchNoDups(Cursor.java:2647)
          : com.sleepycat.je.Cursor.search(Cursor.java:2594)
          : com.sleepycat.je.Cursor.search(Cursor.java:2579)
          : com.sleepycat.je.Cursor.getSearchKey(Cursor.java:1698)
    > (JE 6.0.11) JAVA_ERROR: Java Error occurred, recovery may not be possible.
    > com.sleepycat.je.EnvironmentFailureException
        > null
        > java.lang.AssertionError
          : com.sleepycat.je.evictor.LRUEvictor.processTarget(LRUEvictor.java:2014)
          : com.sleepycat.je.evictor.LRUEvictor.findParentAndRetry(LRUEvictor.java:2182)
          : com.sleepycat.je.evictor.LRUEvictor.processTarget(LRUEvictor.java:2019)
          : com.sleepycat.je.evictor.LRUEvictor.evictBatch(LRUEvictor.java:1689)
          : com.sleepycat.je.evictor.LRUEvictor.doEvict(LRUEvictor.java:1538)
          : com.sleepycat.je.evictor.Evictor.doCriticalEviction(Evictor.java:469)
          : com.sleepycat.je.dbi.EnvironmentImpl.criticalEviction(EnvironmentImpl.java:2726)
          : com.sleepycat.je.dbi.CursorImpl.criticalEviction(CursorImpl.java:624)
          : com.sleepycat.je.dbi.CursorImpl.close(CursorImpl.java:583)
          : com.sleepycat.je.Cursor.endMoveCursor(Cursor.java:4269)
          : com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2811)
          : com.sleepycat.je.Cursor.searchNoDups(Cursor.java:2647)
          : com.sleepycat.je.Cursor.search(Cursor.java:2594)
          : com.sleepycat.je.Cursor.search(Cursor.java:2579)
          : com.sleepycat.je.Cursor.getSearchKeyRange(Cursor.java:1757)

EnvironmentFailureException on opening EntityStore

Adding a new secondary key field to an entity class made it impossible to open EntityStore:
com.sleepycat.je.EnvironmentFailureException: (JE 4.1.6) UNEXPECTED_STATE: Unexpected internal state, may have side effects.
     at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:347)
     at com.sleepycat.compat.DbCompat.unexpectedState(DbCompat.java:507)
     at com.sleepycat.persist.impl.EnhancedAccessor.newInstance(EnhancedAccessor.java:104)
     at com.sleepycat.persist.impl.ComplexFormat.checkNewSecKeyInitializer(ComplexFormat.java:475)
     at com.sleepycat.persist.impl.ComplexFormat.initialize(ComplexFormat.java:451)
     at com.sleepycat.persist.impl.Format.initializeIfNeeded(Format.java:542)
     at com.sleepycat.persist.impl.ComplexFormat.initialize(ComplexFormat.java:334)
     at com.sleepycat.persist.impl.Format.initializeIfNeeded(Format.java:542)
     at com.sleepycat.persist.impl.ComplexFormat.initialize(ComplexFormat.java:334)
     at com.sleepycat.persist.impl.Format.initializeIfNeeded(Format.java:542)
     at com.sleepycat.persist.impl.PersistCatalog.init(PersistCatalog.java:454)
     at com.sleepycat.persist.impl.PersistCatalog.<init>(PersistCatalog.java:221)
     at com.sleepycat.persist.impl.Store.<init>(Store.java:186)
     at com.sleepycat.persist.EntityStore.<init>(EntityStore.java:185)
Entity store opens fine if there are no changes to format at all or there is no @SecondaryKey annotation for the new field. Here is my entity class:
@Entity
public abstract class AbstractMessageEntity implements MessageEntity {
@PrimaryKey
private Long id;
and after adding new secondary key:
@Entity( version = 3 )
public abstract class AbstractMessageEntity implements MessageEntity {
@PrimaryKey
private Long id;
@SecondaryKey( relate = Relationship.MANY_TO_ONE )
private Long executionTime;
AbstractMessageEntity has 3 persistent subclasses, but they were not changed except increasing their version from 0 to 3.
I would greatly appreciate any workaround for this problem!

Hi,
According to your given class information, I assume that you have an abstract class, AbstractMessageEntity, and three subclasses, called MessageEntity1, MessageEntity2, and MessageEntity3. All you want to store are those three subclasses, right?
Before addressing to your problem, I wanna point out that, the annotation of AbstractMessageEntity, which is @Entity should be changed to @Persistent. Because in DPL, you cannot define an entity subclass for another entity class. This is just a reminder, which is not related to your current errors.
Now come back to your error. Actually, I cannot reproduce your error. Below is the code I made and tried to use it to reproduce your error:
import static com.sleepycat.persist.model.Relationship.MANY_TO_ONE;
import java.io.File;
import com.sleepycat.je.Environment;
import com.sleepycat.je.EnvironmentConfig;
import com.sleepycat.persist.EntityStore;
import com.sleepycat.persist.PrimaryIndex;
import com.sleepycat.persist.StoreConfig;
import com.sleepycat.persist.model.AnnotationModel;
import com.sleepycat.persist.model.Entity;
import com.sleepycat.persist.model.EntityModel;
import com.sleepycat.persist.model.Persistent;
import com.sleepycat.persist.model.PrimaryKey;
import com.sleepycat.persist.model.SecondaryKey;
public class AddNewSecKeyTest {
    private Environment env;
    private EntityStore store;
    private PrimaryIndex<Long, MessageEntity> primary;
    public static void main(String args[]) {
        AddNewSecKeyTest epc = new AddNewSecKeyTest();
        epc.open();
        epc.writeData();
        epc.close();
    private void writeData() {
        primary.put(null, new MessageEntity(1));
    private void getData() {
        AbstractMessageEntity data = primary.get(1L);
    private void close() {
        store.close();
        store = null;
        env.close();
        env = null;
    private void open() {
        EnvironmentConfig envConfig = new EnvironmentConfig();
        envConfig.setAllowCreate(true);
        File envHome = new File("./");
        env = new Environment(envHome, envConfig);
        EntityModel model = new AnnotationModel();
        StoreConfig config = new StoreConfig();
        config.setAllowCreate(envConfig.getAllowCreate());
        config.setTransactional(envConfig.getTransactional());
        config.setModel(model);
        store = new EntityStore(env, "test", config);
        primary = store.getPrimaryIndex(Long.class, MessageEntity.class);
    @Persistent(version = 3)
    static public abstract class AbstractMessageEntity {
        AbstractMessageEntity(Long i) {
            this.id = i;
        private AbstractMessageEntity(){}
        @PrimaryKey
        private Long id;
        @SecondaryKey( relate = MANY_TO_ONE )
        private Long executionTime;
    @Entity(version = 3)
    static public class MessageEntity extends AbstractMessageEntity{
        private int f1;
        private MessageEntity(){}
        MessageEntity(int i) {
            super(Long.valueOf(i));
            this.f1 = i;
}However, the above code can run successfully on my sandbox (with JE 4.1.6). I don't know how much difference between my code and your code (I mean the class hierarchy). So please post your class hierarchy according to my code, also the code about how you serialize your class.
Thanks.
Eric Wang
BDB JE Team

[bdb bug]repeatly open and close db may cause memory leak

my test code is very simple :
char *filename = "xxx.db";
char *dbname = "xxx";
for( ; ;)
DB *dbp;
DB_TXN *txnp;
db_create(&dbp,dbenvp, 0);
dbenvp->txn_begin(dbenvp, NULL, &txnp, 0);
ret = dbp->open(dbp, txnp, filename, dbname, DB_BTREE, DB_CREATE, 0);
if(ret != 0)
printf("failed to open db:%s\n",db_strerror(ret));
return 0;
txnp->commit(txnp, 0);
dbp->close(dbp, DB_NOSYNC);
I try to run my test program for a long time opening and closing db repeatly, then use the PS command and find the RSS is increasing slowly:
ps -va
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
1986 pts/0 S 0:00 466 588 4999 980 0.3 -bash
2615 pts/0 R 0:01 588 2 5141 2500 0.9 ./test
after a few minutes：
ps -va
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
1986 pts/0 S 0:00 473 588 4999 976 0.3 -bash
2615 pts/0 R 30:02 689 2 156561 117892 46.2 ./test
I had read bdb's source code before, so i tried to debug it for about a week and found something like a bug:
If open a db with both filename and dbname, bdb will open a db handle for master db and a db handle for subdb,
both of the two handle will get an fileid by a internal api called __dbreg_get_id, however, just the subdb's id will be
return to bdb's log region by calling __dbreg_pop_id. It leads to a id leak if I tried to open and close the db
repeatly, as a result， __dbreg_add_dbentry will call realloc repeatly to enlarge the dbentry area, this seens to be
the reason for RSS increasing.
Is it not a BUG?
sorry for my pool english :)
Edited by: user9222236 on 2010-2-25 下午10:38

I have tested my program using Oracle Berkeley DB release 4.8.26 and 4.7.25 in redhat 9.0 (Kernel 2.4.20-8smp on an i686) and AIX Version 5.
The problem is easy to be reproduced by calling the open method of db handle with both filename and dbname being specified and calling the close method.
My program is very simple:
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
#include "db.h"
int main(int argc, char * argv[])
int ret, count;
DB_ENV *dbenvp;
char * filename = "test.dbf";
char * dbname = "test";
db_env_create(&dbenvp, 0);
dbenvp->open(dbenvp, "/home/bdb/code/test/env",DB_CREATE|DB_INIT_LOCK|DB_INIT_LOG|DB_INIT_TXN|DB_INIT_MPOOL, 0);
for(count = 0 ; count < 10000000 ; count++)
DB *dbp;
DB_TXN *txnp;
db_create(&dbp,dbenvp, 0);
dbenvp->txn_begin(dbenvp, NULL, &txnp, 0);
ret = dbp->open(dbp, txnp, filename, dbname, DB_BTREE, DB_CREATE, 0);
if(ret != 0)
printf("failed to open db:%s\n",db_strerror(ret));
return 0;
txnp->commit(txnp, 0);
dbp->close(dbp, DB_NOSYNC);
dbenvp->close(dbenvp, 0);
return 0;
DB_CONFIG is like below:
set_cachesize 0 20000 0
set_flags db_auto_commit
set_flags db_txn_nosync
set_flags db_log_inmemory
set_lk_detect db_lock_minlocks
Edited by: user9222236 on 2010-2-28 下午5:42
Edited by: user9222236 on 2010-2-28 下午5:45

How feasible would it be to DIY BDB JE encyrption

Hello All,
I'm aware that BDB JE won't be supporting encryption.
However, if I wanted to be bold/foolish enough to implement encryption myself for my project, what would the options be? I have encryption code (http://www.jasypt.org/). I have a small BDB JE database of less than a megabyte and plenty of RAM.
Our client wants to host an application that deals with healthcare records with a 3rd party host and comply with HIPPA and departmental security and encryption policies. A hosting provider is handling the operating system and I cannot, in good faith, promise the client that their hosting provider wouldn't screw up an installation of the BDB native/C database.
I see 2 options for this:
1. Encrypting the payload and leaving the PK indexes unencrypted. This complies with regulations, but removes the query benefits of using BDB (we wouldn't be able to index confidential fields). This also makes the people we answer to nervous. I'd rather not do it this way.
2. Doing all database operations in memory and manually saving, in encrypted form, to disk periodically as well as on shutdown. The app would decrypt from file on startup. I'd be interested in pursuing this if it is the best option.
So I'll ask:
1. Is there a strategy I didn't think of that would encrypt the database more reliably? Is there an API in the DB that I didn't think of that I could easily ensure encryption?
2. Can the DB be run from memory? I assume it'd perform quite well. Would it's memory usage be reasonable? (I have < 1m of data and .75 GB of RAM for a small JEE app.
3. If the DB can be run from memory, is there any reason why that would be a terrible idea?...beyond the 2 obvious concerns of the app shutting down without writing to the disk and storing more data than I have RAM to allocate (I have have a half gig of RAM, after app startup, to store 500k or so of data).
Any strategic guidance would be greatly appreciated. I can implement the app in BDB 3.x or 4.x beta.
Thanks,
Steven
PS: I realize that in the grand scheme, my strategy is flawed from the beginning....dealing with super-secret data on a 3rd party host, who I assume is barely competent, with a very low budget, but in this economy, we're happy to be employed. :) If this is just too much of a square peg in a round hole problem, I'll just use serialization and the collections API for storage and encrypt manually in the same way described above.
An embedded, encrypted BDB JE install would be a great problem to solve as I like working with BDB JE much more than the BDB C version or a JPA + RDBMS solution, but am working with patient data for my next few projects.
Edited by: JavaGeek_Boston on Oct 2, 2009 1:59 PM

My confusion about "I'll just use serialization and the collections API for storage and encrypt manually in the same way" is that I thought you meant the JE collections API. You meant the built-in Java collections stuff in java.util.
I don't think I have anything to add about the pure in-memory transitional approach you described.
No, there are no built-in APIs that would work to implement full encryption in JE. Either you can encrypt each data record (payload you're calling it), or all keys and data record, individually. If you encrypt keys, then of course you lose sorting. If you decide you want to do this, I can make suggestions about how to do it with bindings.
For DIY JE encryption, you would have to change the JE implementation. I suggest an off-line email discussion with Charles and myself if you want to explore that option.
--mark

How to let two programs access the same BDB data

I want to use two programs to access data in one BDB database, but get this error
multiple databases specified but not supported by file
db open failed:Invalid argument
I do not know how to deal with it.....
anyone can help?
thank you first

Hello,
Can you clarify a bit more as to what you are doing?
A message like:
multiple databases specified but not supported by file
open: Invalid argument
is expected when you are incorrectly creating multiple databases
within a single physical file.
The details are at:
http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am/opensub.html
http://www.oracle.com/technology/documentation/berkeley-db/db/api_c/db_open.html
It is possible to contain multiple databases in a single physical file
but the application has to follow some steps in order to do that.
For example you can not attempt to open a second database in a file that
was not initially created using a database name. Doing will generate
the error you posted.
Thanks,
Sandra

Are there any plans to migrate the BDB project to Visual Studio 2005?

The problem is when BDB is linked into an application built with VS8, deleting some BDB objects results in mixing memory managers and application crashes.
That is, if BDB is built with VS6, the BDB DLL will use VS6's new and delete operators. If the application that uses BDB is built with VS8 its objects will be allocated using VS8's new and delete:
DbSequence *seq = new DbSequence(db, 0);
delete seq;
The first line calls application's new and then the DbSequence constructor. The second line, on the other hand, calls the destructor, which in turn calls the operator delete (Microsoft's "scalar deleting destructor"). The destructor resides in the BDB DLL, so the delete call is made in the context of the DLL, trying to delete the memory block that was allocated by the executable.

Thank you for the reply, Michael.
I finally have been able to work around all inter-DLL dependencies between VS6 and VS8 builds. You may want to mention in README the fact that if C++ interface is used, the application is expected to be built with the same compiler used to build BDB, even if custom callbacks for malloc, realloc and free are used in the code.
The BDB project builds partially in VS2005. A few subprojects fail by default and the db_dll project requires a couple of small fixes. For example, advapi32.lib is missing from the list of input libraries in db_dll project (SetSecurityDescriptorDacl and InitializeSecurityDescriptor). The resulting DLL seems to work fine.
Also, those applications that build with 32-bit time_t will fail to link. It's easy to fix, but I wonder if this will have any effect on the format of the database.
Andre

Poor performance of the BDB cache

I'm experiencing incredibly poor performance of the BDB cache and wanted to share my experience, in case anybody has any suggestions.
Overview
Stone Steps maintains a fork of a web log analysis tool - the Webalizer (http://www.stonesteps.ca/projects/webalizer/). One of the problems with the Webalizer is that it maintains all data (i.e. URLs, search strings, IP addresses, etc) in memory, which puts a cap on the maximum size of the data set that can be analyzed. Naturally, BDB was picked as the fastest database to maintain analyzed data on disk set and produce reports by querying the database. Unfortunately, once the database grows beyond the cache size, overall performance goes down the drain.
Note that the version of SSW available for download does not support BDB in the way described below. I can make the source available for you, however, if you find your own large log files to analyze.
The Database
Stone Steps Webalizer (SSW) is a command-line utility and needs to preserve all intermediate data for the month on disk. The original approach was to use a plain-text file (webalizer.current, for those who know anything about SSW). The BDB database that replaced this plain text file consists of the following databases:
sequences (maintains record IDs for all other tables)
urls -primary database containing URL data - record ID (key), URL itself, grouped data, such as number of hits, transfer size, etc)
urls.values - secondary database that contains a hash of the URL (key) and the record ID linking it to the primary database; this database is used for value lookups)
urls.hits - secondary database that contains the number of hits for each URL (key) and the record ID to link it to the primary database; this database is used to order URLs in the report by the number of hits.
The remaining databases are here just to indicate the database structure. They are the same in nature as the two described above. The legend is as follows: (s) will indicate a secondary database, (p) - primary database, (sf) - filtered secondary database (using DB_DONOTINDEX).
urls.xfer (s), urls.entry (s), urls.exit (s), urls.groups.hits (sf), urls.groups.xfer (sf)
hosts (p), hosts.values (s), hosts.hits (s), hosts.xfer (s), hosts.groups.hits (sf), hosts.groups.xfer (sf)
downloads (p), downloads.values (s), downloads.xfer (s)
agents (p), agents.values (s), agents.values (s), agents.hits (s), agents.visits (s), agents.groups.visits (sf)
referrers (p), referrers.values (s), referrers.values (s), referrers.hits (s), referrers.groups.hits (sf)
search (p), search.values (s), search.hits (s)
users (p), users.values (s), users.hits (s), users.groups.hits (sf)
errors (p), errors.values (s), errors.hits (s)
dhosts (p), dhosts.values (s)
statuscodes (HTTP status codes)
totals.daily (31 days)
totals.hourly (24 hours)
totals (one record)
countries (a couple of hundred countries)
system (one record)
visits.active (active visits - variable length)
downloads.active (active downloads - variable length)
All these databases (49 of them) are maintained in a single file. Maintaining a single database file is a requirement, so that the entire database for the month can be renamed, backed up and used to produce reports on demand.
Database Size
One of the sample Squid logs I received from a user contains 4.4M records and is about 800MB in size. The resulting database is 625MB in size. Note that there is no duplication of text data - only nodes and such values as hits and transfer sizes are duplicated. Each record also contains some small overhead (record version for upgrades, etc).
Here are the sizes of the URL databases (other URL secondary databases are similar to urls.hits described below):
urls (p):
8192 Underlying database page size
2031 Overflow key/data size
1471636 Number of unique keys in the tree
1471636 Number of data items in the tree
193 Number of tree internal pages
577738 Number of bytes free in tree internal pages (63% ff)
55312 Number of tree leaf pages
145M Number of bytes free in tree leaf pages (67% ff)
2620 Number of tree overflow pages
16M Number of bytes free in tree overflow pages (25% ff)
urls.hits (s)
8192 Underlying database page size
2031 Overflow key/data size
2 Number of levels in the tree
823 Number of unique keys in the tree
1471636 Number of data items in the tree
31 Number of tree internal pages
201970 Number of bytes free in tree internal pages (20% ff)
45 Number of tree leaf pages
243550 Number of bytes free in tree leaf pages (33% ff)
2814 Number of tree duplicate pages
8360024 Number of bytes free in tree duplicate pages (63% ff)
0 Number of tree overflow pages
The Testbed
I'm running all these tests using the latest BDB (v4.6) built from the source on Win2K3 server (release version). The test machine is 1.7GHz P4 with 1GB of RAM and an IDE hard drive. Not the fastest machine, but it was able to handle a log file like described before at a speed of 20K records/sec.
BDB is configured in a single file in a BDB environment, using private memory, since only one process ever has access to the database).
I ran a performance monitor while running SSW, capturing private bytes, disk read/write I/O, system cache size, etc.
I also used a code profiler to analyze SSW and BDB performance.
The Problem
Small log files, such as 100MB, can be processed in no time - BDB handles them really well. However, once the entire BDB cache is filled up, the machine goes into some weird state and can sit in this state for hours and hours before completing the analysis.
Another problem is that traversing large primary or secondary databases is a really slow and painful process. It is really not that much data!
Overall, the 20K rec/sec quoted above drop down to 2K rec/sec. And that's all after most of the analysis has been done, just trying to save the database.
The Tests
SSW runs in two modes, memory mode and database mode. In memory mode, all data is kept in memory in SSW's own hash tables and then saved to BDB at the end of each run.
In memory mode, the entire BDB is dumped to disk at the end of the run. First, it runs fairly fast, until the BDB cache is filled up. Then writing (disk I/O) goes at a snail pace, at about 3.5MB/sec, even though this disk can write at about 12-15MB/sec.
Another problem is that the OS cache gets filled up, chewing through all available memory long before completion. In order to deal with this problem, I disabled the system cache using the DB_DIRECT_DB/LOG options. I could see OS cache left alone, but once BDB cache was filed up, processing speed was as good as stopped.
Then I flipped options and used DB_DSYNC_DB/LOG options to disable OS disk buffering. This improved overall performance and even though OS cache was filling up, it was being flushed as well and, eventually, SSW finished processing this log, sporting 2K rec/sec. At least it finished, though - other combinations of these options lead to never-ending tests.
In the database mode, stale data is put into BDB after processing every N records (e.g. 300K rec). In this mode, BDB behaves similarly - until the cache is filled up, the performance is somewhat decent, but then the story repeats.
Some of the other things I tried/observed:
* I tried to experiment with the trickle option. In all honesty, I hoped that this would be the solution to my problems - trickle some, make sure it's on disk and then continue. Well, trickling was pretty much useless and didn't make any positive impact.
* I disabled threading support, which gave me some performance boost during regular value lookups throughout the test run, but it didn't help either.
* I experimented with page size, ranging them from the default 8K to 64K. Using large pages helped a bit, but as soon as the BDB cached filled up, the story repeated.
* The Db.put method, which was called 73557 times while profiling saving the database at the end, took 281 seconds. Interestingly enough, this method called ReadFile function (Win32) 20000 times, which took 258 seconds. The majority of the Db.put time was wasted on looking up records that were being updated! These lookups seem to be the true problem here.
* I tried libHoard - it usually provides better performance, even in a single-threaded process, but libHoard didn't help much in this case.

I have been able to improve processing speed up to
6-8 times with these two techniques:
1. A separate trickle thread was created that would
periodically call DbEnv::memp_trickle. This works
especially good on multicore machines, but also
speeds things up a bit on single CPU boxes. This
alone improved speed from 2K rec/sec to about 4K
rec/sec.Hello Stone,
I am facing a similar problem, and I too hope to resolve the same with memp_trickle. I had these queries.
1. what was the % of clean pages that you specified?
2. What duration were you clling this thread to call memp_trickle?
This would give me a rough idea about which to tune my app. Would really appreciate if you can answer these queries.
Regards,
Nishith.
>
2. Maintaining multiple secondary databases in real
time proved to be the bottleneck. The code was
changed to create secondary databases at the end of
the run (calling Db::associate with the DB_CREATE
flag), right before the reports are generated, which
use these secondary databases. This improved speed
from 4K rec/sec to 14K rec/sec.

The chapter about php_db4 in the BDB documentation need to be updated

Hi all,
I think the chapter about php_db4 in the BerkeleyDB documentation should be updated.
The first sentence "A PHP 4 extension for this release of Berkeley DB..." gave me the concept that the extension can be ONLY run with PHP 4. I've got the idea that DBXML's php extension can run with PHP 5, so the BDB's extension should work with PHP 5 too.(Perhaps I'm not clever enough, but it did give me the wrong thought.)
It's too simple in the documentation to let everyone know how to sovle the problems in compile process(and such subjects are hard to be found through search engines).
I had compiled BDB 4.5 on my Fedora Core 6. When I make the php_db4, it could not finish normally(with some errors). When I added CPPFLAGS=-DHAVE_CXX_STDHEADERS, it solved - this should be written somewhere in the documentation or at lease in the INSTALL file, right?
In the INSTALL file, it says "PHP can itself be forced to link against libpthread either by manually editing its build files (which some distributions do), or by building it with --with-experimental-zts". But at least I can't find this option in the configure of PHP 5.2.1. Only an option '--enable-maintainer-zts' can be found, and it is noted as 'for code maintainers only'(so it should not be enabled by end-users with less experience, isn't it?). PHP 5.2.1's TSRM is ptheads-enabled by default, I don't know whether I should follow the note about pthreads or not.
Anyway, I can use the native API of BerkeleyDB in my php code now. Thanks for the developers! Hope the documentation can be updated with more directives, so the new users of php_db4 can use it more smoothly:-)

nikkap wrote:
I haven't updated to the latest software because I didn't want my google maps to change to the new version they made with poorer directions.
To each their own, but you can always install the Google Maps app.
nikkap wrote:
I also didn't want to lose battery charge quicker and didn't want certain functions to cease working like wifi which I've heard has been a problem. It just seems that since my phone is so 'old' that updating it could cause more harm than good.
Nothing is further from the truth. Installing updates does not cause hardware issues or hardware to stop working. Anyone that tells you otherwise is completely ignorant.
I (and millions of others) have installed iOS updates with little or no adverse affects. The vast majority of issues that arise during an iOS update can be rectified with basic troubleshooting from the User's Guide.
There is no legitimate reason to not update the iOS which adds new features and fixes security issues in iOS.
nikkap wrote:
My other problem is that someone proxied the call to ATT to obtain the unlock so I don't know exactly what happened and if in fact they did agree to unlock or perhaps lied instead or did something else. But I don't have the account information for ATT since it's a group account so i'm SOL.
Well, there you go. You need to contact AT&T or get the correct information and go to AT&T's website and request the unlock.
There are no codes, as you've already found out.
Once the unlock has been approved, AT&T will email you to let you know it has been processed and that the next step is to restore the device.

Sefault in __lock_get_internal using BDB 4.7.25

Hi,
I am having trouble finding the root cause of a segfault. The program generating the fault uses both the bdb and repmgr APIs; the segfault happends in a bdb call.
Here is a quick run-down of the problem. My test is setup with two nodes. The master node is started first, then queried by a client program. Then a client node is started. It replicates the database successfully, then is queried by the same client program. Each node is asked to perform two database gets, the first completes the second causes the segfault, but only in the client node.
Each node is configured the same, except the client node will close and re-open the database after the syncronization is done.
I would appreciate any insight to what could be causing my problem, as I've noted the segfault occrrus during a lock aquisition. The program is multi-threaded, but I enable the database to be thread-safe.
I've included an example of the API calls made to setup each environment, a backtrace from the client corefile, and the verbose output from both nodes during the run.
h5. Node Configuration Example
int master_port = 10001;
int client_port = 10002;
DB_ENV *env;
DB *db;
int env_flags = DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_INIT_TXN | DB_INIT_REP | DB_RECOVER | DB_THREAD;
int db_flags = DB_CREATE | DB_AUTO_COMMIT | DB_THREAD:
db_env_create(&env, 0);
env->set_lk_detect(env, DB_LOCK_DEFAULT);
if(master)
env->repmgr_set_local_site(env, 'localhost', master_port, 0);
else
env->repmgr_set_local_site(env, 'localhost', client_port, 0);
* The DB_REPMGR_PEER seems useless in this example. But the actual
* design allows for a client to peer with another client.
if(master)
env->repmgr_add_remote_site(env, 'localhost', 0, NULL, DB_REPMGR_PEER);
else
env->repmgr_add_remote_site(env, 'localhost', master_port, NULL, DB_REPMGR_PEER);
if(master)
env->open(env, '/tmp/dbs_m', env_flags, 0);
else
env->open(env, '/tmp/dbs_c', env_flags, 0);
db_create(&db, env, 0);
db->open(db, NULL, 'DB', NULL, DB_BTREE, db_flags, 0);
env->repmgr_start(env, 3, DB_REP_ELECTION);
h5. GDB backtrace
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Core was generated by `./dbserver/dbserver bootstrap=localhost:24050 address=localhost:17000 -'.
Program terminated with signal 11, Segmentation fault.
[New process 685]
#0 0x0814239f in __lock_get_internal (lt=0x40140868, sh_locker=0x4032d508, flags=0, obj=0x81f7108, lock_mode=DB_LOCK_READ,
timeout=0, lock=0xbe7ff5bc) at ../dist/../lock/lock.c:586
586               OBJECT_LOCK(lt, region, obj, lock->ndx);
(gdb) bt full
#0 0x0814239f in __lock_get_internal (lt=0x40140868, sh_locker=0x4032d508, flags=0, obj=0x81f7108, lock_mode=DB_LOCK_READ,
timeout=0, lock=0xbe7ff5bc) at ../dist/../lock/lock.c:586
     newl = (struct __db_lock *) 0x0
     lp = (struct __db_lock *) 0x40142e48
     env = (ENV *) 0x40140860
     sh_obj = (DB_LOCKOBJ *) 0x0
     region = (DB_LOCKREGION *) 0x40140880
     ip = (DB_THREAD_INFO *) 0x40142e48
     ndx = 3196058196
     part_id = 1074222655
     did_abort = 1073875436
     ihold = 0
     grant_dirty = 1075064392
     no_dd = 0
     ret = 0
     t_ret = 1073875436
     holder = 1075064392
     sh_off = 0
     action = 3196056724
#1 0x08141da2 in __lock_get (env=0x401407f0, locker=0x4032d508, flags=0, obj=0x81f7108, lock_mode=DB_LOCK_READ,
lock=0xbe7ff5bc) at ../dist/../lock/lock.c:456
     lt = (DB_LOCKTAB *) 0x40140868
     ret = 0
#2 0x08181674 in __db_lget (dbc=0x81f7080, action=0, pgno=1075054832, mode=DB_LOCK_READ, lkflags=0, lockp=0xbe7ff5bc)
at ../dist/../db/db_meta.c:1035
     dbp = (DB *) 0x401407e0
     couple = {{op = DB_LOCK_DUMP, mode = DB_LOCK_NG, timeout = 3196058052, obj = 0x400546b8, lock = {off = 32, ndx = 0,
      gen = 0, mode = DB_LOCK_NG}}, {op = 136380459, mode = 3196057632, timeout = 3196057624, obj = 0xbe7ff9c4, lock = {
      off = 3196057576, ndx = 0, gen = 1073916640, mode = DB_LOCK_NG}}, {op = 1073875800, mode = 3196057972, timeout = 0,
    obj = 0x0, lock = {off = 35, ndx = 66195, gen = 3196057576, mode = 43}}}
     reqp = (DB_LOCKREQ *) 0x0
     txn = (DB_TXN *) 0x0
     env = (ENV *) 0x401407f0
     has_timeout = 0
     i = 0
     ret = -1
#3 0x080d8f9e in __bam_get_root (dbc=0x81f7080, pg=1075054832, slevel=1, flags=1409, stack=0xbe7ff6a8)
at ../dist/../btree/bt_search.c:94
     cp = (BTREE_CURSOR *) 0x8259248
     dbp = (DB *) 0x401407e0
     lock = {off = 1073709056, ndx = 510075, gen = 2260372568, mode = 3758112764}
     mpf = (DB_MPOOLFILE *) 0x401407f8
     h = (PAGE *) 0x0
     lock_mode = DB_LOCK_READ
     ret = 89980928
     t_ret = 134764095
#4 0x080d9407 in __bam_search (dbc=0x81f7080, root_pgno=1075054832, key=0xbe7ffa6c, flags=1409, slevel=1, recnop=0x0,
---Type <return> to continue, or q <return> to quit---
exactp=0xbe7ff8b0) at ../dist/../btree/bt_search.c:203
     t = (BTREE *) 0x401408f8
     cp = (BTREE_CURSOR *) 0x8259248
     dbp = (DB *) 0x401407e0
     lock = {off = 0, ndx = 0, gen = 0, mode = DB_LOCK_NG}
     mpf = (DB_MPOOLFILE *) 0x401407f8
     env = (ENV *) 0x401407f0
     h = (PAGE *) 0x0
     base = 0
     i = 0
     indx = 0
     inp = (db_indx_t *) 0x0
     lim = 0
     lock_mode = DB_LOCK_NG
     pg = 0
     recno = 0
     adjust = 0
     cmp = 0
     deloffset = 0
     ret = 0
     set_stack = 0
     stack = 0
     t_ret = 0
     func = (int (*)(DB *, const DBT *, const DBT *)) 0
#5 0x0819b1d1 in __bamc_search (dbc=0x81f7080, root_pgno=0, key=0xbe7ffa6c, flags=26, exactp=0xbe7ff8b0)
at ../dist/../btree/bt_cursor.c:2501
     t = (BTREE *) 0x401408f8
     cp = (BTREE_CURSOR *) 0x8259248
     dbp = (DB *) 0x401407e0
     h = (PAGE *) 0x0
     indx = 0
     inp = (db_indx_t *) 0x0
     bt_lpgno = 0
     recno = 0
     sflags = 1409
     cmp = 0
     ret = 0
     t_ret = 0
#6 0x08196ff7 in __bamc_get (dbc=0x81f7080, key=0xbe7ffa6c, data=0xbe7ffa50, flags=26, pgnop=0xbe7ff93c)
at ../dist/../btree/bt_cursor.c:970
     cp = (BTREE_CURSOR *) 0x8259248
     dbp = (DB *) 0x401407e0
     mpf = (DB_MPOOLFILE *) 0x401407f8
     orig_pgno = 0
     orig_indx = 0
     exact = 1075236764
     newopd = 1
---Type <return> to continue, or q <return> to quit---
     ret = 136272648
#7 0x0816f6fc in __dbc_get (dbc_arg=0x81f7080, key=0xbe7ffa6c, data=0xbe7ffa50, flags=26) at ../dist/../db/db_cam.c:700
     dbp = (DB *) 0x401407e0
     dbc = (DBC *) 0x0
     dbc_n = (DBC *) 0x81f7080
     opd = (DBC *) 0x0
     cp = (DBC_INTERNAL *) 0x8259248
     cp_n = (DBC_INTERNAL *) 0x0
     mpf = (DB_MPOOLFILE *) 0x401407f8
     env = (ENV *) 0x401407f0
     pgno = 0
     indx_off = 0
     multi = 0
     orig_ulen = 0
     tmp_flags = 0
     tmp_read_uncommitted = 0
     tmp_rmw = 0
     type = 64 '@'
     key_small = 0
     ret = 136268720
     t_ret = -1098909244
#8 0x0817a1ac in __db_get (dbp=0x8258bb0, ip=0x0, txn=0x0, key=0xbe7ffa6c, data=0xbe7ffa50, flags=26)
at ../dist/../db/db_iface.c:760
     dbc = (DBC *) 0x81f7080
     mode = 0
     ret = 0
     t_ret = 1075208764
#9 0x08179f6c in __db_get_pp (dbp=0x8258bb0, txn=0x0, key=0xbe7ffa6c, data=0xbe7ffa50, flags=0)
at ../dist/../db/db_iface.c:684
     ip = (DB_THREAD_INFO *) 0x0
     env = (ENV *) 0x81f4bb0
     mode = 0
     handle_check = 1
     ignore_lease = 0
     ret = 0
     t_ret = 1073880126
     txn_local = 0
#10 0x0804c7a8 in _get (database=0x81f37a8, txn=0x0, query=0x821d1a0, callName=0x81cc1b7 "GET") at ../dbserver/database.c:503
     k = {data = 0x81f67e8, size = 22, ulen = 22, dlen = 0, doff = 0, app_data = 0x0, flags = 0}
     v = {data = 0x821d2a0, size = 255, ulen = 255, dlen = 0, doff = 0, app_data = 0x0, flags = 256}
     err = 136263592
     __PRETTY_FUNCTION__ = "_get"
#11 0x0804c8f0 in get (database=0x81f37a8, txn_id=3, query=0x821d1a0) at ../dbserver/database.c:643
     txn = (DB_TXN *) 0x416a7db4
#12 0x08053f1d in workerThreadMain (threadArg=0x7c87b) at ../dbserver/server.c:433
     type = ISProtocol_IDENTIFYMASTER
     class = <value optimized out>
---Type <return> to continue, or q <return> to quit---
     s = {context = 0x8211930, protocol = 0x8211980, socketToClient = 3, query = 0x821d1a0, deleteClientSocket = ISFalse,
abortActiveTxn = ISFalse}
     __PRETTY_FUNCTION__ = "workerThreadMain"
#13 0x4001d0ba in pthread_start_thread () from /lib/libpthread.so.0
No symbol table info available.
#14 0x400fad6a in clone () from /lib/libc.so.6
No symbol table info available.
h5. Verbose Master Node Log
REP_UNDEF: rep_start: Found old version log 14
CLIENT: db rep_send_message: msgv = 5 logv 14 gen = 0 eid -1, type newclient, LSN [0][0] nogroup nobuf
CLIENT: starting election thread
CLIENT: elect thread to do: 0
CLIENT: repmgr elect: opcode 0, finished 0, master -2
CLIENT: elect thread to do: 1
CLIENT: Start election nsites 1, ack 1, priority 100
CLIENT: Tallying VOTE1[0] (2147483647, 1)
CLIENT: Beginning an election
CLIENT: db rep_send_message: msgv = 5 logv 14 gen = 0 eid -1, type vote1, LSN [1][8702] nogroup nobuf
CLIENT: Tallying VOTE2[0] (2147483647, 1)
CLIENT: Counted my vote 1
CLIENT: Skipping phase2 wait: already got 1 votes
CLIENT: Got enough votes to win; election done; winner is 2147483647, gen 0
CLIENT: Election finished in 0.039845000 sec
CLIENT: Election done; egen 2
CLIENT: Ended election with 0, sites 0, egen 2, flags 0x200a01
CLIENT: Election done; egen 2
CLIENT: New master gen 2, egen 3
MASTER: rep_start: Old log version was 14
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type newmaster, LSN [1][8702] nobuf
MASTER: restore_prep: No prepares. Skip.
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][8702]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][8785]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][8821]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][8904] perm
MASTER: rep_send_function returned: -30975
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][8948]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9034]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9115]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9202]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9287] flush perm
MASTER: rep_send_function returned: -30975
MASTER: election thread is exiting
MASTER: accepted a new connection
MASTER: handshake introduces unknown site localhost:10002
MASTER: EID 0 is assigned for site localhost:10002
MASTER: db rep_process_message: msgv = 5 logv 14 gen = 0 eid 0, type newclient, LSN [0][0] nogroup
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type newsite, LSN [0][0] nobuf
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type newmaster, LSN [1][9367] nobuf
MASTER: NEWSITE info from site localhost:10002 was already known
MASTER: db rep_process_message: msgv = 5 logv 14 gen = 0 eid 0, type master_req, LSN [0][0] nogroup
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type newmaster, LSN [1][9367] nobuf
MASTER: db rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type verify_req, LSN [1][8658]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type verify, LSN [1][8658] nobuf
MASTER: db rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type update_req, LSN [0][0]
MASTER: Walk_dir: Getting info for dir: db
MASTER: Walk_dir: Dir db has 10 files
MASTER: Walk_dir: File 0 name: __db.001
MASTER: Walk_dir: File 1 name: __db.002
MASTER: Walk_dir: File 2 name: __db.rep.gen
MASTER: Walk_dir: File 3 name: __db.rep.egen
MASTER: Walk_dir: File 4 name: __db.003
MASTER: Walk_dir: File 5 name: __db.004
MASTER: Walk_dir: File 6 name: __db.005
MASTER: Walk_dir: File 7 name: __db.006
MASTER: Walk_dir: File 8 name: log.0000000001
MASTER: Walk_dir: File 9 name: ROUTER
MASTER: Walk_dir: File 0 (of 1) ROUTER at 0x40356018: pgsize 4096, max_pgno 1
MASTER: Walk_dir: Getting info for in-memory named files
MASTER: Walk_dir: Dir INMEM has 0 files
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type update, LSN [1][9367] nobuf
MASTER: db rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type page_req, LSN [0][0]
MASTER: page_req: file 0 page 0 to 1
MASTER: page_req: Open 0 via mpf_open
MASTER: sendpages: file 0 page 0 to 1
MASTER: sendpages: 0, page lsn [0][1]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type page, LSN [1][9367] nobuf resend
MASTER: sendpages: 0, lsn [1][9367]
MASTER: sendpages: 1, page lsn [1][9202]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type page, LSN [1][9367] nobuf resend
MASTER: sendpages: 1, lsn [1][9367]
MASTER: db rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log_req, LSN [1][28]
MASTER: [1][28]: LOG_REQ max lsn: [1][9367]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][28] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][91] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][4266] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8441] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8535] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8575] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8658] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8702] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8785] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8821] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8904] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8948] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9034] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9115] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9202] nobuf resend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9287] nobuf resend
MASTER: db rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type all_req, LSN [1][9287]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9287] nobuf resend logend
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9367]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9469]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9548] flush perm
MASTER: will await acknowledgement: need 1
MASTER: rep_send_function returned: 110
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9628]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9696]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9785] flush perm
MASTER: will await acknowledgement: need 1
MASTER: got ack [1][9548](2) from site localhost:10002
MASTER: got ack [1][9785](2) from site localhost:10002
MASTER: got ack [1][9287](2) from site localhost:10002
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type start_sync, LSN [1][9785] nobuf
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9865]
MASTER: db rep_send_message: msgv = 5 logv 14 gen = 2 eid -1, type log, LSN [1][9948] flush perm
MASTER: will await acknowledgement: need 1
MASTER: got ack [1][9948](2) from site localhost:10002
EOF on connection from site localhost:10002
h5. Verbose Client Node Log
REP_UNDEF: EID 0 is assigned for site localhost:10001
REP_UNDEF: rep_start: Found old version log 14
CLIENT: db2 rep_send_message: msgv = 5 logv 14 gen = 0 eid -1, type newclient, LSN [0][0] nogroup nobuf
CLIENT: starting election thread
CLIENT: elect thread to do: 0
CLIENT: repmgr elect: opcode 0, finished 0, master -2
CLIENT: init connection to site localhost:10001 with result 115
CLIENT: handshake from connection to localhost:10001
CLIENT: handshake with no known master to wake election thread
CLIENT: reusing existing elect thread
CLIENT: repmgr elect: opcode 3, finished 0, master -2
CLIENT: elect thread to do: 3
CLIENT: db2 rep_send_message: msgv = 5 logv 14 gen = 0 eid -1, type newclient, LSN [0][0] nogroup nobuf
CLIENT: repmgr elect: opcode 0, finished 0, master -2
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type newsite, LSN [0][0]
CLIENT: db2 rep_send_message: msgv = 5 logv 14 gen = 0 eid -1, type master_req, LSN [0][0] nogroup nobuf
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type newmaster, LSN [1][9367]
CLIENT: Election done; egen 1
CLIENT: Updating gen from 0 to 2 from master 0
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type newmaster, LSN [1][9367]
CLIENT: egen: 3. rep version 5
CLIENT: db2 rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type verify_req, LSN [1][8658] any nobuf
CLIENT: sending request to peer
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type verify, LSN [1][8658]
CLIENT: db2 rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type update_req, LSN [0][0] nobuf
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type update, LSN [1][9367]
CLIENT: Update setup for 1 files.
CLIENT: Update setup: First LSN [1][28].
CLIENT: Update setup: Last LSN [1][9367]
CLIENT: Walk_dir: Getting info for dir: db2
CLIENT: Walk_dir: Dir db2 has 11 files
CLIENT: Walk_dir: File 0 name: __db.001
CLIENT: Walk_dir: File 1 name: __db.002
CLIENT: Walk_dir: File 2 name: __db.rep.gen
CLIENT: Walk_dir: File 3 name: __db.rep.egen
CLIENT: Walk_dir: File 4 name: __db.003
CLIENT: Walk_dir: File 5 name: __db.004
CLIENT: Walk_dir: File 6 name: __db.005
CLIENT: Walk_dir: File 7 name: __db.006
CLIENT: Walk_dir: File 8 name: log.0000000001
CLIENT: Walk_dir: File 9 name: ROUTER
CLIENT: Walk_dir: File 0 (of 1) ROUTER at 0x40356018: pgsize 4096, max_pgno 1
CLIENT: Walk_dir: File 10 name: __db.rep.db
CLIENT: Walk_dir: Getting info for in-memory named files
CLIENT: Walk_dir: Dir INMEM has 0 files
CLIENT: Next file 0: pgsize 4096, maxpg 1
CLIENT: db2 rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type page_req, LSN [0][0] any nobuf
CLIENT: sending request to peer
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type page, LSN [1][9367] resend
CLIENT: PAGE: Received page 0 from file 0
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type page, LSN [1][9367] resend
CLIENT: PAGE: Write page 0 into mpool
CLIENT: rep_write_page: Calling fop_create for ROUTER
CLIENT: PAGE_GAP: pgno 0, max_pg 1 ready 0, waiting 0 max_wait 0
CLIENT: FILEDONE: have 1 pages. Need 2.
CLIENT: PAGE: Received page 1 from file 0
CLIENT: PAGE: Write page 1 into mpool
CLIENT: PAGE_GAP: pgno 1, max_pg 1 ready 1, waiting 0 max_wait 0
CLIENT: FILEDONE: have 2 pages. Need 2.
CLIENT: NEXTFILE: have 1 files. RECOVER_LOG now
CLIENT: NEXTFILE: LOG_REQ from LSN [1][28] to [1][9367]
CLIENT: db2 rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type log_req, LSN [1][28] any nobuf
CLIENT: sending request to peer
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][28] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][91] resend
CLIENT: rep_apply: Set apply_th 1
CLIENT: rep_apply: Decrement apply_th 0
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][4266] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8441] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8535] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8575] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8658] resend
CLIENT: Returning NOTPERM [1][8658], cmp = 1
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8702] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8785] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8821] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8904] resend
CLIENT: Returning NOTPERM [1][8904], cmp = 1
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][8948] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9034] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9115] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9202] resend
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9287] resend
CLIENT: Returning NOTPERM [1][9287], cmp = 1
CLIENT: rep_apply: Set apply_th 1
CLIENT: rep_apply: Decrement apply_th 0
CLIENT: Returning LOGREADY up to [1][9287], cmp = 0
CLIENT: Election done; egen 3
Recovery starting from [1][28]
Recovery complete at Fri Jul 31 10:11:33 2009
Maximum transaction ID 80000002 Recovery checkpoint [0][0]
CLIENT: db2 rep_send_message: msgv = 5 logv 14 gen = 2 eid 0, type all_req, LSN [1][9287] any nobuf
CLIENT: sending request to peer
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9287] resend logend
CLIENT: Start-up is done [1][9287]
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9367]
CLIENT: rep_apply: Set apply_th 1
CLIENT: rep_apply: Decrement apply_th 0
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9469]
CLIENT: rep_apply: Set apply_th 1
CLIENT: rep_apply: Decrement apply_th 0
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9548] flush
CLIENT: rep_apply: Set apply_th 1
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9628]
CLIENT: rep_apply: Decrement apply_th 0
CLIENT: Returning ISPERM [1][9548], cmp = 0
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9696]
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9785] flush
CLIENT: Returning NOTPERM [1][9785], cmp = 1
CLIENT: rep_apply: Set apply_th 1
CLIENT: rep_apply: Decrement apply_th 0
CLIENT: Returning ISPERM [1][9785], cmp = 0
CLIENT: Returning ISPERM [1][9287], cmp = -1
CLIENT: election thread is exiting
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type start_sync, LSN [1][9785]
CLIENT: ALIVE: Completed sync [1][9785]
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9865]
CLIENT: rep_apply: Set apply_th 1
CLIENT: rep_apply: Decrement apply_th 0
CLIENT: db2 rep_process_message: msgv = 5 logv 14 gen = 2 eid 0, type log, LSN [1][9948] flush
CLIENT: rep_apply: Set apply_th 1
CLIENT: rep_apply: Decrement apply_th 0
CLIENT: Returning ISPERM [1][9948], cmp = 0
Regards,
Chris

I was able to track this issue down to a usage error. I was calling a DB API call from within a callback -- which violates the APIs re-entrancy assumptions.

How to read the BDB log ... and other questions

I am using a bdb database interface in the application openldap. When the bdb database there is established it creates, in addition to the data storage diles, a log file (log.0000000001). "file log.0000000001" reports that to be a binary file. How does one read that log?
I have asked this question on the openldap forum and was advised it can be read using tools provided by Oracle to support Berkeley DB, and was furhter advised to go the the Oracle Berkeley DB site. Well, I have done that ... looked around for evidence of any such "tools", but have found nothing.
Also, was advised there (in the openldap forum) that having that log file in the same directory with the data files is not a good idea, that it should be on a different spindle for performance purposes. I have looked at the BDB reference manual on line here but find no configuration options to move that log file to a different location.
Help? Thanks.

Hi Robert,
The information about setting log directories can be found here:
http://www.oracle.com/technology/documentation/berkeley-db/db/api_c/env_set_lg_dir.html
General information about log files that you may want to read about:
http://www.oracle.com/technology/documentation/berkeley-db/db/gsg_txn/C/index.html
You can use db_printlog to display the log files:
http://www.oracle.com/technology/documentation/berkeley-db/db/utility/db_printlog.html
The above link will also point you to a place to review the output. The db_printlog utility should be installed as part of your distribution.
Ron

How to load a BDB file into memory?

The entire BDB database needs to reside in memory for performance reasons, it needs to be in memory all the time, not paged in on demand. The physical memory and virtual process address space are large enough to hold this file. How can I load it into memory just before accessing the first entry? I've read the C++ API reference, and it seems that I can do the following:
1, Create a DB environment;
2, Call DB_ENV->set_cachesize() to set a memory pool large enough to hold the BDB file;
3, Call DB_MPOOLFILE->open() to open the BDB file in memory pool of that DB environment;
4, Create a DB handle in that DB environment and open the BDB file (again) via this DB handle.
My questions are:
1, Is there a more elegant way instead of using that DB environment? If the DB environment is a must, then:
2, Does step 3 above load the BDB file into memory pool or just reserve enough space for that file?
Thanks in advance,
Feng

Hello,
Does the documentation on "Memory-only or Flash configurations" at:
http://download.oracle.com/docs/cd/E17076_02/html/programmer_reference/program_ram.html
answer the question?
From there we have:
By default, databases are periodically flushed from the Berkeley DB memory cache to backing physical files in the filesystem. To keep databases from being written to backing physical files, pass the DB_MPOOL_NOFILE flag to the DB_MPOOLFILE->set_flags() method. This flag implies the application's databases must fit entirely in the Berkeley DB cache, of course. To avoid a database file growing to consume the entire cache, applications can limit the size of individual databases in the cache by calling the DB_MPOOLFILE->set_maxsize() method.
Thanks,
Sandra

BDB v5.0.73 - EnvironmentFailureException: (JE 5.0.73) JAVA_ERROR

Similar Messages

Maybe you are looking for