DB_RUNRECOVERY or DB_PAGE_NOTFOUND

i'm using the bdb 4.5.20.i found after plenty of call the "put" and "delete" operation and "sync" operation with program crush,when reload the database file and call the "delete" operation,i will get the
DB_RUNRECOVERY OR DB_PAGE_NOTFOUND error.how can i fix this bug?

So we can better understand what us going on, please describe your use case better. What are you doing with BDB? Also, can you provide a test case that we can use to reproduce the problem? Both of those are error messages being reported.
thanks
mike

Similar Messages

DB_RUNRECOVERY error after just opening and closing a DB_ENV and DB

Hi,
I'm currently writing a bdb database maintanance utility for the OS project Netatalk which uses bdb for some data storage.
We're using transactions and therefor we open DB_ENV with
(DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_INIT_TXN | [DB_RECOVER]).
DB_RECOVER is only used from the main application, not from the utility I'm currently developing, because afaict running recovery on an environment which another process has opened is not allowed.
Now I can reproduce the following problem by just running my utility exactly 138 times(!) [1] while the utility just:
- opens the environment
- configures logging
- opens a database file which contains one main database with 2 additional indexes
- associates the indexes
- closes the database (indexes first)
- closes the environment
The main application (netatalk) is not running!
At the 138 time I'm getting these errors when closing the database:
---8<---
Run: 137
Run: 138
CNID database dump:
Mai 08 16:06:04.449917 [12991] {dbif.c:328} (E:CNID): error closing database cnid2.db: DB_RUNRECOVERY: Fatal error, run database recovery
Mai 08 16:06:04.450054 [12991] {dbif.c:348} (E:CNID): error closing DB environment: DB_RUNRECOVERY: Fatal error, run database recovery
Error closing database
---8<---
These log messages come from my utility. As I've setup bdb logging I've got some more interesting stuff from that:
---8<---
./log.0000000001: log file open failed: No such file or directory
PANIC: No such file or directory
DB_ENV->log_newfh: 1: DB_RUNRECOVERY: Fatal error, run database recovery
PANIC: fatal region error detected; run recovery
PANIC: fatal region error detected; run recovery
File handles still open at environment close
Open file handle: ./cnid2.db
PANIC: fatal region error detected; run recovery
---8<---
I've tested this on different platforms (Solaris, Linux) with the same result.
I was hoping this info might be enough for someone more experienced with bdb then I am. If not I can put up a minimal source file which reproduces the error. But as that requieres collecting code spread over quite some files I didn't come up with that this time.
Anybody?
Thanks!
-Ralph
[1]
like this:
$ i=1; while true; do echo Run: $i ; pfexec ./dbd -d /Volumes/ACL/ ; if [ $? -ne 0 ] ; then break ; fi ; i=$((i+1)) done
The utility is named dbd.

Ok, got it, sorry for the noise:
I had mixed up the order of int cwd=open(".", O_RDONLY), chdir to bdb env, open bdb stuff, do bdb stuff, close bdb stuff, chdir back to cwd.
I was changing dir back before closing the bdb stuff...
-Ralph

Frequent but unpredictable DB_PAGE_NOTFOUND corruption

Hi,
We have developed a multi-process data processing engine that uses BDB as state storage to store queues of pointers to datums in on-disk flat files. The engine is written in Perl, using the standard BerkeleyDB CPAN module as its interface to BDB.
Platform: Red Hat Enterprise Linux 5.1 x86-64
Perl: 5.8.8 (with 64-bit support)
BDB: 4.3.29 (the default for this version of RHEL)
After running in production for some time without any errors, occasionally one of the data queues (a Btree database) has started to corrupt after a few hours of record creation/deletion by forked children. The error (which is elicited after subsequent db_put() calls is "DB_PAGE_NOTFOUND: Requested page not found"), and running db_verify on the database returns:
"db_verify: Page 1: internal page is empty and should not be
db_verify: queue.db: DB_VERIFY_BAD: Database verification failed"
Worse, is that the error cannot be recreated on any of our development or staging environments - it just intermittently occurs in production, now maybe every 3 to 8 hours.
Some background:
Roughly - the child processes that seem to be causing the corruption read a bunch of key/values via a cursor, and then delete the keys from the DB.
The environment is created with: DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_THREAD | DB_INIT_TXN
The database is created with: DB_CREATE|DB_THREAD
The parent process closes all Env & DB handles before forking children, then re-opens upon returning from fork().
The child processes all open their own Env & DB handles after fork().
There are usually around 5-8 children running in parallel, and will execute the deletes on the DB in parallel.
Before exiting, the child processes always explicitly call db_sync() before calling db_close() - probably overkill.
Here's where my understanding of deadlocking in BDB gets shaky:
DB_INIT_LOCK should implement multiple-writer locking semantics, and because of the way the parent process distributes the work to the child processes, children are never competing to delete the same keys.
I suspect the reason for the corruption is that BDB's locking may be page-based, not key (record) based, and if (say) child A deleting a key causes an underlying page split (?) whilst child B is also deleting a key stored on that same page, corruption occurs. Am I on the right track here? The app is not yet doing any deadlock detection or resolution - we haven't yet gone down that route because nowhere are any errors regarding deadlocks being surfaced in the statuses of any DB calls, or the output of db_stat().
Interestingly, none of the db_del() calls in any of children fail, with deadlock errors or otherwise - the corruption is only noticed by calls to db_put() into the same database during a subsequent processing run - obviously after the in-memory cache has been synced to disk.
We haven't yet gone for upgrading BDB to 4.7 (or even 4.4) , but will attempt to do this if no other fix is forthcoming.
An alternative, quicker fix we're trying out is to use DB_INIT_CDB to enforce single-writer semantics on the children, or to move the responsibility of writing back up to the parent process, and have no multiple-writers at all.
I know my understanding of the pitfalls of deadlocking and how they relate to the underlying Btree store aren't great and suspect herein lies the real problem. Many thanks in advance for anyone with advice or recommendations here.

Thanks Michael. I'll engage here for the sake of Googlers and also follow up by email.
- Yes, the same flags are used to open the environments and db in the children; all processes use the same storage class that wraps the BDB access.
- db_sync() before db_close() was paranoia on my part - noted and understood that it's unnecessary.
- The db_verify output is indeed all it reports. <tt>db_dump -qa queue.db</tt> on a corrupt DB reports:
<tt>
In-memory DB structure:
btree: 0x120200 (duplicates, open called, read-only)
bt_meta: 0 bt_root: 1
bt_maxkey: 0 bt_minkey: 2
bt_compare: 0x30b2222900 bt_prefix: 0x30b2222970
bt_lpgno: 0
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
page 0: btree metadata level: 0 (lsn.file: 0 lsn.offset: 1)
magic: 0x53162
version: 9
pagesize: 8192
type: 9
keys: 0 records: 0
free list: 2, 0
last_pgno: 2
flags: 0x1 (duplicates)
uid: 5f 0 db 4 0 fd 0 0 1b d6 75 51 bf 5c 0 0 0 0 0 0
maxkey: 0 minkey: 2
root: 1
page 1: btree internal level: 2 records: 0 (lsn.file: 0 lsn.offset: 1)
entries: 0 offset: 8192
page 2: invalid level: 0 (lsn.file: 0 lsn.offset: 1)
prev: 0 next: 0 entries: 0 offset: 8192
</tt>
There are records in the queue.db, though - viewing it reveals recognisable keys.
Other things I ought to mention, which may be giveaways:
- Although creating the environment with DB_INIT_TXN, the app does not perform any transaction handling or checkpointing - in effect it is in auto-commit mode.
- Since modifying the storage to use DB_INIT_CDB overnight, there has been (so far!) no corruption.
Thanks again.

DB_RUNRECOVERY from DB remove and other operations in same transaction

Hello,
I have a program which basically does the following:
- Start transaction
- Insert data into a database
- Remove that database
- Create another database
- Insert data into database
- Commit transaction - this returns DB_RUNRECOVERY.
Error messages are:
lock_downgrade: Lock is no longer valid
PANIC: Invalid argument
PANIC: fatal region error detected; run recovery
(repeated serveral times)
The program creates and removes some In-memory databases during its run.
If a commit/begin transaction is inserted somewhere, the error disappears.
I could not produce a small demonstration program yet because there is a rather complex layer between the program and Berkeley DB.
As stated above, there is a workaround, but the behavior appears nontheless incorrect. Is there a way to fix it?
Edited by: user547613 on Jan 28, 2009 12:02 PM

Hello,
We would greatly appreciate a standalone program that reproduces this problem. If that is not feasible, then could you provide more details about the problem including relevant snippets of code which would allow us to try and recreate the problem?
Thanks and warm regards.
ashok joshi
Berkeley DB development

BDB0075 DB_PAGE_NOTFOUND: Requested page not found

Hello,
If I create a primary and a secondary database within the same BDB database file,
BDB fails with "BDB0075 DB_PAGE_NOTFOUND: Requested page not found". This
happens if the secondary database returns more values (DB_DBT_MULTIPLE).
Down below a testcase. The callback simply creates ngrams, e.g.:
ABCDEFG -> ABC BCD CDE DEF EFG
BDB version: 5.2.28 (most recent), Ubuntu 11.04
Thanks a lot
Josef
#include <stdio.h>
#include <malloc.h>
#include <string.h>
#include <db.h>
#define DATABASE "/tmp/db.db"
static int callback_ngram( DB *secondary, const DBT *key, const DBT *data, DBT *result );
int main(int argc, char *argv[])
    unlink( DATABASE );
    // Open database with a secondary index
    DB *primary;
    for( int i=0; i < 2; ++i )
        DB *handle = NULL;
        int status = db_create( &handle, NULL, 0 );
        if( status != 0 )
            return 1;
        handle->set_lorder( handle, 1234 );
        if( i != 0 )
            handle->set_flags( handle, DB_DUP | DB_DUPSORT );
        handle->set_msgfile( handle, stderr );
        handle->set_errfile( handle, stderr );
        // Open database, name is "0" and "1"
        char dbname[32];
        sprintf(dbname, "%d", i);
        status = handle->open( handle, NULL, DATABASE, dbname, DB_BTREE, DB_CREATE, 0644 );
        if( status != 0 )
            return 1;
        // Associate
        if( i == 0 )
            primary = handle;
        else
            primary->associate( primary, NULL, handle, callback_ngram, DB_CREATE );
    // Insert records
    for( unsigned i=0; i < 100000; ++i )
        char buffer[10];
        for( unsigned j=0; j < sizeof(buffer); ++j )
            buffer[j] = 'A' + (((i + 1) * (j + 1)) % 26);
        //printf("%c%c%c%c%c%c%c%c%c%c\n", buffer[0], buffer[1], buffer[2], buffer[3], buffer[4],
        //                                 buffer[5], buffer[6], buffer[7], buffer[8], buffer[9] );
        // put record
        DBT key;
        memset( &key, 0, sizeof(DBT) );
        key.data = &i;
        key.size = sizeof(i);
        DBT data;
        memset( &data, 0, sizeof(DBT) );
        data.data = buffer;
        data.size = sizeof(buffer);
        int status = primary->put( primary, NULL, &key, &data, 0 );
        if( status == 0 )
            continue;
        fprintf(stderr, "Put failed: %s\n", db_strerror(status));
        return 1;
    return 0;
static int callback_ngram( DB *secondary, const DBT *key, const DBT *data, DBT *result )
    int ngrams = data->size - 2;
    result->data = malloc( sizeof(DBT) * ngrams );
    result->size = ngrams;
    result->flags = DB_DBT_MULTIPLE | DB_DBT_APPMALLOC;
    for( int i=0; i < ngrams; ++i )
        DBT *item = ((DBT*)result->data) + i;
        item->data = ((char*)data->data) + i;
        item->size = 3;
        item->flags = 0;
    return 0;
}

Hello Josef,
I just tried your test case exactly as it (with the addition of printing "done" at the end) on 5.2.28
RHEL and no error is raised:
./test
done
Do I need to do something else to see the error?
Thank you,
Sandra

DB_PAGE_NOTFOUND error after recovery on a CDS database

I am getting a DB_PAGE_NOTFOUND error after recovery, with a database that is configured in CDS mode. The error occurs after I insert several rows, bring down the process, remove the environment files and start the process which attempts to insert additional rows - gets a DB_PAGE_NOTFOUND error. The error persists ie. terminating and starting a new oprocess does not resolve the problem.
Amy help will be appreciated.

Assuming, that the db is corrupted and that CDS db's can get corrupted - is there any way, I can reduce the frequency of these errors by increasing Cache, page size etc.?
Following is my DB configuration:
               dbEnv_->open(envHome_.c_str(), envFlags_, 0);
                dbEnv_->set_errpfx(envHome_.c_str());
                dbEnv_->set_thread_count(16);
                dbEnv_->set_cachesize(0,(100*1024*1024),2);
                envFlags_ =   DB_INIT_CDB | DB_THREAD | DB_INIT_MPOOL ;
                envFlags_ |= DB_CREATE ;
                dbEnv_->open(envHome_.c_str(), envFlags_, 0);
db->set_pagesize((32*1024));
            db->open(NULL, temp.c_str(), tableName.c_str(),\
                DB_BTREE, DB_CREATE | DB_THREAD , S_IRUSR | S_IWUSR);
                dbEnv_->open(envHome_.c_str(), envFlags_, 0);
The platform is HPUX. The problems occurs periodically - i.e. not every time.
Thanks for your help.

DB_RUNRECOVERY: Fatal error, run database recovery

I am getting this error when trying to add data to QUEUE. But after I restart my app, this error does not happen anymore.
2009-08-16 10:27:12.558990 [ERR] mod_cdr_bdb.c:370 Unable to add cdr to Queue. Error=DB_RUNRECOVERY: Fatal error, run database recovery
Does anyone know what could be the cause of the error?

Hi,
Do you know the steps that lead up to this error? Can you reproduce it?
Were there any error messages sent to the error log file? Can you confirm that you have verbose error messages turned on by always initializing one of the error callback interfaces in your environment. This will provide verbose error messages:
DB_ENV->set_errcall, DB_ENV->set_errfile, DB_ENV->set_errpfx, and DB_ENV->set_verbose.
What flags are you using when opening the environment and the database?
The procedure you have to follow when you receive this error is described here: [DB_RUNRECOVERY|http://www.oracle.com/technology/documentation/berkeley-db/db/ref/program/errorret.html#DB_RUNRECOVERY]
DB_RUNRECOVERY:
There exists a class of errors that Berkeley DB considers fatal to an entire Berkeley DB environment. An example of this type of error is a corrupted database page. The only way to recover from these failures is to have all threads of control exit the Berkeley DB environment, run recovery of the environment, and re-enter Berkeley DB. (It is not strictly necessary that the processes exit, although that is the only way to recover system resources, such as file descriptors and memory, allocated by Berkeley DB.)
When this type of error is encountered, the error value DB_RUNRECOVERY is returned. This error can be returned by any Berkeley DB interface. Once DB_RUNRECOVERY is returned by any interface, it will be returned from all subsequent Berkeley DB calls made by any threads of control participating in the environment.
Applications can handle such fatal errors in one of two ways: first, by checking for DB_RUNRECOVERY as part of their normal Berkeley DB error return checking, similarly to DB_LOCK_DEADLOCK or any other error. Alternatively, applications can specify a fatal-error callback function using the DB_ENV->set_event_notify method. Applications with no cleanup processing of their own should simply exit from the callback function.Thanks,
Bogdan Coman

DB_PAGE_NOTFOUND given when do db_dump

Hi.
I'm going to do some development on my Motolora E680i cellphone which has a linux on it. I crosscompile BDB4.5.20NC and run db_dump to see the database on my cellphone. But when I type db_dump -p native.db, an error reported:
#db_dump: DBcursor->get: DB_PAGE_NOTFOUND: Requested page not found
I copied the native.db to my windows computer, and used windows version db_dump, all infomation was given normally.
When I try to use Db::open to open one of the databases in the native.db file on my cellphone, the same error message is given. So who can tell me what's happend?

here's the codes, I will try the DB_INIT_LOCK
int CEmsDoc::ConnectDB()
int ret = RET_OK;
m_pdb = new Db( NULL, DB_CXX_NO_EXCEPTIONS );
if( NULL == m_pdb )
TRACELOG("new Db failed");
goto CLEANUP;
if ( m_pErrFile )
m_pdb->set_error_stream( m_pErrFile );
ret = m_pdb->open( NULL, "/ezxlocal/sysDatabase/native.db", "ems_table_in_flash", DB_UNKNOWN, DB_RDONLY, 0);
if ( ret )
TRACE( "db open faild at line %i in %s" );
goto CLEANUP;
ret = m_pdb->cursor( NULL, &m_pdbcCursor, DB_TXN_SNAPSHOT );
if ( ret )
TRACE( "cursor failed at line %i in %s" );
goto CLEANUP;
CLEANUP:
if ( ret )
DisConnectDB();
return ret;
and I have tried this:
m_pDbEnv = new DbEnv( DB_CXX_NO_EXCEPTIONS );
m_pDbEnv->open( ""/ezxlocal/sysDatabase", DB_INIT_CDB|DB_INIT_MPOOL, 0 );
m_pdb->open( m_pDbEnv, native.db", "ems_table_in_flash", DB_UNKNOWN, DB_RDONLY, 0);
It returns No such file or dirctory when open DBEnv, but I'm sure the path and filename and the dbname are right.
BTW, am I right to cross compile bdb like this:
first I add this at the begining of dist/configure
#my crosscompiler named arm-linux-...
CC=arm-linux-gcc
CXX=arm-linux-g++
AR=arm-linux-ar
RANLIB=arm-linux-ranlib
STRIP=arm-linux-strip
then under build_unix:
../dist/configure host=arm-linux perfix=/opt/bdb --enable-cxx
make
make install
null

Com.sleepycat.db.RunRecoveryException: DB_RUNRECOVERY error

Hi all,
Some time there will be such kind of error occur when initial the db envrironement:
com.sleepycat.db.RunRecoveryException: DB_RUNRECOVERY: Fatal error, run database recovery: DB_RUNRECOVERY: Fatal error, run database recovery
     at com.sleepycat.db.internal.db_javaJNI.DbEnv_close0(Native Method)
     at com.sleepycat.db.internal.DbEnv.close0(DbEnv.java:204)
     at com.sleepycat.db.internal.DbEnv.close(DbEnv.java:81)
     at com.sleepycat.db.Environment.close(Environment.java:39)
     at com.ssc.crd.db.BerkeleyDBUtil.shutdown(BerkeleyDBUtil.java:215)
     at com.ssc.util.BDBContextListener.contextDestroyed(BDBContextListener.java:85)
     at org.apache.catalina.core.StandardContext.listenerStop(StandardContext.java:3770)
     at org.apache.catalina.core.StandardContext.stop(StandardContext.java:4339)
     at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1066)
     at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1066)
     at org.apache.catalina.core.StandardEngine.stop(StandardEngine.java:447)
     at org.apache.catalina.core.StandardService.stop(StandardService.java:512)
     at org.apache.catalina.core.StandardServer.stop(StandardServer.java:743)
     at org.apache.catalina.startup.Catalina.stop(Catalina.java:601)
     at org.apache.catalina.startup.Catalina.start(Catalina.java:576)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
     at java.lang.reflect.Method.invoke(Unknown Source)
     at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:294)
     at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:432)
And this issue can resolve by running db_recover -c, such as:
C:\Work\Conf\Test\berkeley>db_recover -c
And we try to use setRunRecovery(true); and setRunFatalRecovery(true);, but both failed.
EnvironmentConfig envConf = new EnvironmentConfig();
envConf.setRunRecovery(true);
envConf.setRunFatalRecovery(true);
Only run db_recover -c can resolve it. Why? And how to deal with it in code.
Thanks.
Jane

But after I set all environment, there are some error occur when starting the tomcat server and failed to setup it, generate hs_err_pid3020.txt.
Detail in hs_err_pid3020.txt:
# An unexpected error has been detected by HotSpot Virtual Machine:
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x27d4e525, pid=3020, tid=2760
# Java VM: Java HotSpot(TM) Client VM (1.5.0_08-b03 mixed mode)
# Problematic frame:
# C [libdb43.dll+0x3e525]
--------------- T H R E A D ---------------
Current thread (0x00238d50): JavaThread "main" [_thread_in_native, id=2760]
siginfo: ExceptionCode=0xc0000005, reading address 0x00000018
Registers:
EAX=0x27dcda30, EBX=0x00000000, ECX=0x00808000, EDX=0x00000000
ESP=0x0006f0e8, EBP=0x00000000, ESI=0x27dc4cb8, EDI=0x27dc4cb8
EIP=0x27d4e525, EFLAGS=0x00010206
Top of Stack: (sp=0x0006f0e8)
0x0006f0e8: 27dcda30 ffff86ff 00800000 00000000
0x0006f0f8: 27d50294 00800000 00000000 00800000
0x0006f108: 00000000 27dc4cb8 00000016 00238e10
0x0006f118: 00238d50 27458978 27dc4cb8 00000000
0x0006f128: 00000000 00800000 00008000 27cf8035
0x0006f138: 0000005c 27458978 0002e061 000001a4
0x0006f148: 00238d50 231f4e28 231f4e20 0006f194
0x0006f158: 00a0832f 00238e10 0006f19c 27dc4cb8
Instructions: (pc=0x27d4e525)
0x27d4e515: 00 3b c3 89 9e dc 00 00 00 74 12 8b 54 24 08 50
0x27d4e525: 8b 42 18 50 56 e8 01 c1 02 00 83 c4 0c 8b 86 08
Stack: [0x00030000,0x00070000), sp=0x0006f0e8, free space=252k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libdb43.dll+0x3e525]
[error occurred during error reporting, step 120, id 0xc0000005]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j com.sleepycat.db.internal.db_javaJNI.DbEnv_open(JLjava/lang/String;II)V+0
j com.sleepycat.db.internal.DbEnv.open(Ljava/lang/String;II)V+7
j com.sleepycat.db.EnvironmentConfig.openEnvironment(Ljava/io/File;)Lcom/sleepycat/db/internal/DbEnv;+312
j com.sleepycat.db.Environment.<init>(Ljava/io/File;Lcom/sleepycat/db/EnvironmentConfig;)V+6
j com.ssc.crd.db.BerkeleyDBUtil.setup()V+97
j com.ssc.util.BDBContextListener.contextInitialized(Ljavax/servlet/ServletContextEvent;)V+10
j org.apache.catalina.core.StandardContext.listenerStart()Z+429
j org.apache.catalina.core.StandardContext.start()V+1244
j org.apache.catalina.core.ContainerBase.addChildInternal(Lorg/apache/catalina/Container;)V+149
j org.apache.catalina.core.ContainerBase.addChild(Lorg/apache/catalina/Container;)V+26
j org.apache.catalina.core.StandardHost.addChild(Lorg/apache/catalina/Container;)V+25
j org.apache.catalina.startup.HostConfig.deployWAR(Ljava/lang/String;Ljava/io/File;Ljava/lang/String;)V+482
j org.apache.catalina.startup.HostConfig.deployWARs(Ljava/io/File;[Ljava/lang/String;)V+163
j org.apache.catalina.startup.HostConfig.deployApps()V+25
j org.apache.catalina.startup.HostConfig.start()V+147
j org.apache.catalina.startup.HostConfig.lifecycleEvent(Lorg/apache/catalina/LifecycleEvent;)V+132
j org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(Ljava/lang/String;Ljava/lang/Object;)V+68
j org.apache.catalina.core.ContainerBase.start()V+306
j org.apache.catalina.core.StandardHost.start()V+314
j org.apache.catalina.core.ContainerBase.start()V+266
j org.apache.catalina.core.StandardEngine.start()V+221
j org.apache.catalina.core.StandardService.start()V+132
j org.apache.catalina.core.StandardServer.start()V+88
j org.apache.catalina.startup.Catalina.start()V+32
v ~StubRoutines::call_stub
j sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0
j sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+87
J sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
J java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
v ~RuntimeStub::alignment_frame_return Runtime1 stub
j org.apache.catalina.startup.Bootstrap.start()V+37
j org.apache.catalina.startup.Bootstrap.main([Ljava/lang/String;)V+158
v ~StubRoutines::call_stub
--------------- P R O C E S S ---------------
Java Threads: ( => current thread )
0x27388908 JavaThread "Dispatcher-Thread-1" daemon [_thread_blocked, id=3304]
0x009c8780 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=2904]
0x00238f80 JavaThread "CompilerThread0" daemon [_thread_blocked, id=560]
0x009c6860 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=2040]
0x009bd840 JavaThread "Finalizer" daemon [_thread_blocked, id=3420]
0x009bc368 JavaThread "Reference Handler" daemon [_thread_blocked, id=2584]
=>0x00238d50 JavaThread "main" [_thread_in_native, id=2760]
Other Threads:
0x009b8138 VMThread [id=2484]
0x009e35e0 WatcherThread [id=3052]
VM state:not at safepoint (normal execution)
VM Mutex/Monitor currently owned by a thread: None
Heap
def new generation total 36288K, used 31681K [0x02a80000, 0x051e0000, 0x051e0000)
eden space 32256K, 90% used [0x02a80000, 0x046eae78, 0x04a00000)
from space 4032K, 64% used [0x04df0000, 0x05075968, 0x051e0000)
to space 4032K, 0% used [0x04a00000, 0x04a00000, 0x04df0000)
tenured generation total 483968K, used 0K [0x051e0000, 0x22a80000, 0x22a80000)
the space 483968K, 0% used [0x051e0000, 0x051e0000, 0x051e0200, 0x22a80000)
compacting perm gen total 8192K, used 7782K [0x22a80000, 0x23280000, 0x26a80000)
the space 8192K, 94% used [0x22a80000, 0x23219850, 0x23219a00, 0x23280000)
No shared spaces configured.
Dynamic libraries:
0x00400000 - 0x0040d000      G:\Jianfang Ye\FM\20070129\Java\jre1.5.0_08\bin\java.exe
0x77f80000 - 0x77ffc000      C:\WINNT\system32\ntdll.dll
0x7c2d0000 - 0x7c335000      C:\WINNT\system32\ADVAPI32.dll
0x7c570000 - 0x7c624000      C:\WINNT\system32\KERNEL32.dll
0x77d30000 - 0x77d9f000      C:\WINNT\system32\RPCRT4.dll
0x78000000 - 0x78045000      C:\WINNT\system32\MSVCRT.dll
0x6d6c0000 - 0x6d85b000      G:\Jianfang Ye\FM\20070129\Java\jre1.5.0_08\bin\client\jvm.dll
0x77e10000 - 0x77e79000      C:\WINNT\system32\USER32.dll
0x77f40000 - 0x77f7c000      C:\WINNT\system32\GDI32.dll
0x77570000 - 0x775a0000      C:\WINNT\system32\WINMM.dll
0x6d280000 - 0x6d288000      G:\Jianfang Ye\FM\20070129\Java\jre1.5.0_08\bin\hpi.dll
0x690a0000 - 0x690ab000      C:\WINNT\system32\PSAPI.DLL
0x6d690000 - 0x6d69c000      G:\Jianfang Ye\FM\20070129\Java\jre1.5.0_08\bin\verify.dll
0x6d300000 - 0x6d31d000      G:\Jianfang Ye\FM\20070129\Java\jre1.5.0_08\bin\java.dll
0x6d6b0000 - 0x6d6bf000      G:\Jianfang Ye\FM\20070129\Java\jre1.5.0_08\bin\zip.dll
0x6d4c0000 - 0x6d4d3000      G:\Jianfang Ye\FM\20070129\Java\jre1.5.0_08\bin\net.dll
0x75030000 - 0x75044000      C:\WINNT\system32\WS2_32.dll
0x75020000 - 0x75028000      C:\WINNT\system32\WS2HELP.DLL
0x74fd0000 - 0x74fee000      C:\WINNT\system32\msafd.dll
0x75010000 - 0x75017000      C:\WINNT\System32\wshtcpip.dll
0x27cf0000 - 0x27d08000      G:\Jianfang Ye\FM\20070129\BDBXML\bin\libdb_java43.dll
0x27d10000 - 0x27db7000      G:\Jianfang Ye\FM\20070129\BDBXML\bin\libdb43.dll
0x7c340000 - 0x7c396000      G:\Jianfang Ye\FM\20070129\BDBXML\bin\MSVCR71.dll
0x7c3a0000 - 0x7c41b000      G:\Jianfang Ye\FM\20070129\BDBXML\bin\MSVCP71.dll
VM Arguments:
jvm_args: -Dsiteminder=false -Dmyss=false -Dconfig.file=G:\Jianfang Ye\FM\20070129\config.properties -Dcom.ssc.eis.myssc.jvmglobal.configfile=G:\Jianfang Ye\FM\20070129\mssconfig-50_dev.xml -Xms512m -Xmx512m -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=G:\Jianfang Ye\FM\20070129\apache-tomcat-5.5.17\conf\logging.properties -Djava.endorsed.dirs=G:\Jianfang Ye\FM\20070129\apache-tomcat-5.5.17\common\endorsed -Dcatalina.base=G:\Jianfang Ye\FM\20070129\apache-tomcat-5.5.17 -Dcatalina.home=G:\Jianfang Ye\FM\20070129\apache-tomcat-5.5.17 -Djava.io.tmpdir=G:\Jianfang Ye\FM\20070129\apache-tomcat-5.5.17\temp
java_command: org.apache.catalina.startup.Bootstrap start
Launcher Type: SUN_STANDARD
Environment Variables:
JAVA_HOME=C:\jdk1.3.1_10
JRE_HOME=G:\Jianfang Ye\FM\20070129\Java\jre1.5.0_08
CLASSPATH=C:\jdk1.3.1_10\lib\tools.jar;G:\Jianfang Ye\FM\20070129\apache-tomcat-5.5.17\bin\bootstrap.jar
PATH=G:\Jianfang Ye\FM\20070129\BDBXML\bin; C:\jdk1.3.1_10\bin
USERNAME=e461579
OS=Windows_NT
PROCESSOR_IDENTIFIER=x86 Family 15 Model 2 Stepping 9, GenuineIntel
--------------- S Y S T E M ---------------
OS: Windows 2000 Build 2195 Service Pack 4
CPU:total 1 (cores per cpu 1, threads per core 1) family 15 model 2 stepping 9, cmov, cx8, fxsr, mmx, sse, sse2
Memory: 4k page, physical 1046324k(275832k free), swap 2519928k(1509212k free)
vm_info: Java HotSpot(TM) Client VM (1.5.0_08-b03) for windows-x86, built on Jul 26 2006 01:10:50 by "java_re" with MS VC++ 6.0

DB_PAGE_NOTFOUND

hello, I have a single threaded application with Bdb flags : DB_INIT_MPOOL, DB_INIT_TXN, DB_INIT_LOCK ... I take backups through the API call env->backup .. Yet the backup taken sometimes fails with the DB_PAGE_NOTFOUND

I am afraid I can not provide a reproducer program as the problem happens occasionally and not persistently, But the scenario is that the app closes the env to make sure all the updated are written to the db, reopens it and then calls env->backup ... Attempting to iterate on the data in the some of the taken backups fails with the given error at a get operation, while other backups work perfectly
Thank you

About DB_RUNRECOVERY

I keep getting this error in unit tests (which I don't get using the application itself), any hints? I don't really get where to start to track down the problem.
The environment is open with DB_THREAD | DB_RECOVER
txn = c.manager.createTransaction(model.isolation_level)
File "/usr/lib/python2.4/site-packages/dbxml.py", line 166, in createTransaction
def createTransaction(*args): return _dbxml.XmlManager_createTransaction(*args)
XmlDatabaseError: (5, 'Error: DbEnv::txn_begin: DB_RUNRECOVERY: Fatal error, run database recovery')
Thanks in advance

Hi,
Given that the problem is happening in createTransaction() and not when you first open the environment, it's probably related to concurrency in some way -- perhaps another thread closing the environment out from under you.
Regards,
George

Open environment failure "DB_RUNRECOVERY"

Hi BDB experts,
This a HA master environment, for some reason unknown, now open environment always returned DB_RUNRECOVERY. The environment was open with flag:
DB_INIT_TXN |   DB_INIT_LOCK | DB_INIT_LOG |   DB_REGISTER |   DB_RECOVER |    DB_INIT_MPOOL | DB_THREAD |     DB_INIT_REP;
In this case, can the enviroment be recovered? What can be done for this error?
Could you tell me the causes of error "DB_RUNRECOVERY"?
Thanks,
Min

DB_RUNRECOVERY gets returned when there is a suspected issue in the database.      You need to run recovery.    Chap 11 of BDB Programmers Reference covers the procedures for running recovery.      In this case, I would suggest running the db_recover utility.    When you run the utility you should ensure that there are no processes connected to the environment and it is closed.
possible causes -- memory errors, disk errors, hw errors, application errors, power failures, cpu errors, human errors    There are many paths in the code that can throw this particular error.    To figure out the exact trigger for your case, this is something that Oracle support handles. Have you purchased a support contract for BDB?
thanks
mike

Getting a -30987 (DB_PAGE_NOTFOUND) after using struct as data in the C API

Hi,
I'm using the Berkeley DB on my Ubuntu for some time now. Today, I've spend several hours on the following problem:
My function for writing in the DB looks like that:
- I've written the following code for writing in the db:
+==========+
+int write_entry(char argv_entry, char main_header){+
+typedef struct write_entry {+
+char *header;+
+char *entry;+
+} WRITE_ENTRY;+
u_int32_t flags;
int ret;
DBT key, data;
+DB *blog_db;+
WRITE_ENTRY my_entry;
int buffsize, bufflen;
+char *databuff;+
+/* === Open DB */+
ret = db_create (&blog_db, NULL, 0);
+if (ret != 0){ printf(":( - ERROR while db_create\n"); }+
flags = DB_CREATE;
ret = blog_db->open(blog_db, NULL, "blog.db", NULL, DB_BTREE, flags, 0);
+if (ret != 0){printf("ERROR\n"); }+
+/* === declaring vars and mem */+
unsigned long my_key;
my_key = time (NULL);
my_entry.header = main_header;
my_entry.entry = argv_entry;
buffsize = (strlen (my_entry.header) strlen (my_entry.entry) + 2);+
databuff = malloc (buffsize);
memset(databuff, 0, buffsize);
memcpy (databuff, my_entry.header, strlen (my_entry.header));
bufflen = strlen (my_entry.header) 1;+
memcpy (databuff bufflen, my_entry.entry, strlen (my_entry.entry));+
bufflen = strlen (my_entry.entry) + 1;+
+/* rdy to store */+
memset (&key, 0, sizeof(DBT));
memset (&data, 0, sizeof(DBT));
key.data = &(my_key);
key.size = sizeof(unsigned long);
data.data = databuff;
data.size = bufflen;
ret = blog_db->put(blog_db, NULL, &key, &data, DB_NOOVERWRITE);
free(databuff);
+if(ret == 0) { return 0; }+
+else { return (ret); }+
+if (blog_db != NULL){+
blog_db->close(blog_db, 0);
+}+
+}+
+==========+
I use the following code to receive data from the DB:
+==========+
+void main(){+
write_entry ();
+}+
+write_entry(){+
+typedef struct write_entry {+
+char *header;+
+char *entry;+
+} WRITE_ENTRY;+
int id;
DBT key, data;
+DB *my_blog;+
WRITE_ENTRY my_entry;
u_int32_t flags;
int ret;
ret = db_create (&my_blog, NULL, 0);
flags = DB_CREATE;
ret = my_blog->open(my_blog, NULL, "blog.db", NULL, DB_BTREE, flags, 0 );
memset (&key, 0, sizeof(DBT));
memset (&data, 0, sizeof(DBT));
id = 1288563852;
key.data = &id;
key.size = sizeof (unsigned long);
ret = my_blog->get(my_blog, NULL, &key, &data, 0);
if (ret != 0)
+{+
printf("RET != 0 ==> %d\n", ret);
+}+
printf("Output: %s", data.data);
+}+
+==========+
Whatever I do - I get a "-30987" after the following line:
ret = my_blog->get(my_blog, NULL, &key, &data, 0);
I also tried to replace the long variable with an int - same result.
Could anybody tell me, what's going wring here?
Greetings
Jan

Hello,
We do not have all the details here i.e. product (DS, CDS, TDS, HA),
if the application is multi-process/multi-threaded, if transactions/
environment/locking are in use, BDB version, but a few suggestions would
be:
1. make sure that there are no environment/database corruptions
2. if applicable make sure that any log/region files are not corrupted.
3. ensure that the same database is not updated both from within
an environment and outside the environment
4. make sure any cursors are closed when the application is finished
with them.
5. ensure that the same database is not accessed from within and without
a transaction
6. check if any processes/threads have been killed unexpected?
Since the application is long-running and you just hit this problem,
some type of corruption could have occurred.
Thanks,
Sandra

Spontaneous trouble receiving mail from Mail Application and Web Mail

Hello,
Unfortunately I need some assistance. For unknown reasons, today at about 12 PM our in house email server (10.3.9) started failing to allow any user to connect to download mail. I have users running POP and other users running IMAP accounts that had the same lack of function.
When I connect using web mail I get the following error:
Error connecting to IMAP server: localhost. 61 : Connection refused
When I connect from the Mail application I get the following error:
The server "mail.mycompany.com" refused to allow a connection on port 110
 When I connect using the Server Admin program the IMAP and POP logs are empty despite being set to the “All events” setting, and the SMTP log contains the following message frequently despite still being able to send email from the affected accounts:
(temporary failure. Command output: couldn't connect to lmtpd: Unknown Error Code: 0_ 421 4.3.0 deliver: couldn't connect to lmtpd_ )
I have restarted the server, I have run disk utility to repair permissions and made no intentional adjustments to the server in the last week. The last adjustment to the mail server I made was about 1 month ago to turn on IMAP authentication to allow me to switch from a POP to an IMAP email account.
Please help me figure this out!
Thanks in advance.
 Mike
(PS I am a veterinarian, not an IT guru. I run my own mail and web servers because with Apple products, I can.)

Thanks for your assistance.
I found the logs and while there is a mail.log there is no mailaccess.log file.
There is a mailaccess.log.0.gz which I decompressed and I also decompressed the mailacess.log.1.gz.
The mailaccess.log.1 file contains entries from July 10th through July 11th.
The mailaccess.log.0 file contains entries from August 14th through August 15th (today). Seems like there was a long period that did not get archived correctly.
Here is a subset of what was in the mail.log from this morning:
Aug 15 07:04:03 www postfix/pipe[24378]: 5B2201493AC: to=<[email protected]>, relay=cyrus, delay=1001, status=bounced (Command time limit exceeded: "/usr/bin/cyrus/bin/deliver")
Aug 15 07:04:03 www postfix/pipe[24376]: 5B2201493AC: to=<[email protected]>, relay=cyrus, delay=1001, status=bounced (Command time limit exceeded: "/usr/bin/cyrus/bin/deliver")
Aug 15 07:04:03 www postfix/cleanup[10663]: 86BD71493BA: message-id=<[email protected]>
Aug 15 07:04:03 www postfix/qmgr[25517]: 86BD71493BA: from=, size=19952, nrcpt=1 (queue active)
Aug 15 07:04:04 www postfix/smtp[10668]: 86BD71493BA: to=<sentto-12433195-1540-1187185326-mrbroome=avmi.net@returns.groups.yahoo.com> , relay=rtn7.grp.scd.yahoo.com[66.218.66.214], delay=1, status=sent (250 ok 1187186644 qp 73498)
Aug 15 07:06:16 www postfix/pipe[26600]: 8D50C1493AF: to=<[email protected]>, relay=cyrus, delay=1002, status=bounced (Command time limit exceeded: "/usr/bin/cyrus/bin/deliver")
Aug 15 07:06:16 www postfix/pipe[26602]: 8D50C1493AF: to=<[email protected]>, relay=cyrus, delay=1002, status=bounced (Command time limit exceeded: "/usr/bin/cyrus/bin/deliver")
Aug 15 07:06:16 www postfix/cleanup[12865]: 56B5E1493BC: message-id=<[email protected]>
Aug 15 07:06:16 www postfix/qmgr[25517]: 56B5E1493BC: from=, size=51949, nrcpt=1 (queue active)
Aug 15 07:06:22 www postfix/smtp[12866]: 56B5E1493BC: to=<[email protected]>, relay=mailer.versiontracker.com[66.179.48.93], delay=6, status=sent (250 2.0.0 l7FE6L807756 Message accepted for delivery)
Aug 15 07:30:50 www postfix/smtpd[7082]: unable to get certificate from '/etc/postfix/server.pem'
Aug 15 07:30:50 www postfix/smtpd[7082]: 7082:error:02001002:system library:fopen:No such file or directory:bss_file.c:278:fopen('/etc/postfix/server.pem','r'):
Aug 15 07:30:50 www postfix/smtpd[7082]: 7082:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:280:
Aug 15 07:30:50 www postfix/smtpd[7082]: 7082:error:140DC002:SSL routines:SSLCTX_use_certificate_chainfile:system lib:ssl_rsa.c:760:
Aug 15 07:30:50 www postfix/smtpd[7082]: TLS engine: cannot load RSA cert/key data
Aug 15 07:30:50 www postfix/smtpd[7082]: connect from valhalla.mailpure.com[66.109.52.210]
Aug 15 07:30:51 www postfix/smtpd[7082]: 183AA1493D0: client=valhalla.mailpure.com[66.109.52.210]
Aug 15 07:30:51 www postfix/cleanup[7088]: 183AA1493D0: message-id=<[email protected]>
Aug 15 07:30:52 www postfix/qmgr[25517]: 183AA1493D0: from=<[email protected]>, size=26283, nrcpt=2 (queue active)
Aug 15 07:30:52 www postfix/smtpd[7082]: disconnect from valhalla.mailpure.com[66.109.52.210]
Aug 15 07:47:32 www postfix/pipe[7105]: 183AA1493D0: to=<[email protected]>, relay=cyrus, delay=1001, status=bounced (Command time limit exceeded: "/usr/bin/cyrus/bin/deliver")
Aug 15 07:47:32 www postfix/pipe[7109]: 183AA1493D0: to=<[email protected]>, relay=cyrus, delay=1001, status=bounced (Command time limit exceeded: "/usr/bin/cyrus/bin/deliver")
Aug 15 07:47:32 www postfix/cleanup[22267]: CD81D1493F0: message-id=<[email protected]>
Aug 15 07:47:32 www postfix/qmgr[25517]: CD81D1493F0: from=, size=28115, nrcpt=1 (queue active)
Aug 15 07:47:37 www postfix/smtp[22271]: CD81D1493F0: to=<[email protected]>, relay=cvm39.vetmed.wsu.edu[134.121.130.6], delay=5, status=sent (250 2.6.0 <[email protected]> Queued mail for delivery)
The following is copied from the mailaccess.log.0 file from yesterday when the trouble started:
Aug 14 21:43:56 www master[269]: exiting on SIGTERM/SIGINT
Aug 14 21:43:56 www deliver[24981]: backend_connect(): couldn't read initial greeting: (null)
Aug 14 21:43:56 www master[25420]: process started
Aug 14 21:48:55 www deliver[25478]: connect(/var/imap/socket/lmtp) failed: Connection refused
Aug 14 21:50:22 www master[25519]: process started
Aug 14 22:07:03 www deliver[25632]: connect(/var/imap/socket/lmtp) failed: Connection refused
Aug 14 22:40:22 www deliver[25829]: connect(/var/imap/socket/lmtp) failed: Connection refused
Aug 15 06:23:50 www ctl_cyrusdb[25520]: DBERROR db4: PANIC: Too many open files
Aug 15 06:23:50 www ctl_cyrusdb[25520]: DBERROR: critical database situation
Aug 14 23:23:50 www master[25519]: process 25520 exited, status 75
Aug 14 23:23:50 www master[25519]: ready for work
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR db4: fatal region error detected; run recovery
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: dbenv->open '/var/imap/db' failed: DB_RUNRECOVERY: Fatal error, run database recovery
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: init() on berkeley
Aug 14 23:23:50 www ctl_cyrusdb[26081]: checkpointing cyrus databases
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR db4: txn_checkpoint interface requires an environment configured for the transaction subsystem
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: couldn't checkpoint: Invalid argument
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: sync /var/imap/db: cyrusdb error
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR db4: DBENV->logarchive interface requires an environment configured for the logging subsystem
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: error listing log files: Invalid argument
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: archive /var/imap/db: cyrusdb error
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR db4: txn_checkpoint interface requires an environment configured for the transaction subsystem
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: couldn't checkpoint: Invalid argument
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: sync /var/imap/db: cyrusdb error
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR db4: DBENV->logarchive interface requires an environment configured for the logging subsystem
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: error listing log files: Invalid argument
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: archive /var/imap/db: cyrusdb error
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR db4: txn_checkpoint interface requires an environment configured for the transaction subsystem
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: couldn't checkpoint: Invalid argument
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: sync /var/imap/db: cyrusdb error
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR db4: DBENV->logarchive interface requires an environment configured for the logging subsystem
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: error listing log files: Invalid argument
Aug 14 23:23:50 www ctl_cyrusdb[26081]: DBERROR: archive /var/imap/db: cyrusdb error
Aug 14 23:23:50 www ctl_cyrusdb[26081]: done checkpointing cyrus databases
Looks like my mail database is trashed. Let me know if this is correct and if so, how I should go about trying to restore/rebuild, etc.
Thanks again for your assistance.
Mike

Memory issue on replica client

I am using bdb 4.7.25 on freebsd 7.0 C++ api.
I have applied patch from Link: Re: Question on replication error like "DB_ENV->rep_process_message: DB_NOTF.." to fix log_archive issue. I have also applied the patch suggested in the reply to above message.
On master node, I am doing lot of write operation with periodic checkpointing.
Case 1:
=======
Later, when master node archives (deletes) log files after checkpointing, after few minutes of transaction, I get following error on client node.
Log sequence error: page LSN 0 0; previous LSN 25 1048356
Recovery function for LSN 26 4263441 failed on forward pass
Client initialization failed. Need to manually restore client
PANIC: Invalid argument
DB_ENV->rep_process_message: DB_RUNRECOVERY: Fatal error, run database recovery
message thread failed: DB_RUNRECOVERY: Fatal error, run database recovery
PANIC: fatal region error detected; run recovery
DB_ENV->rep_process_message: DB_RUNRECOVERY: Fatal error, run database recovery
message thread failed: DB_RUNRECOVERY: Fatal error, run database recovery
PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
Please advice, what could be possibly wrong and how can I fix it?
Case 2:
=====
On similar instances, when I dont do log_archive'ing on master node to delete the log file, the memory footprint of client process periodically increases a lot and then decreases back to normal. I suspect this happens around the checkpointing, where master sends burst of messages to client to replicate. But gradually the footprint increases too high and starts using swap space and there is not enough memory to allocate. Is this fluctuation of memory footprint on client node an expected behaviour?
Potentially following output for db_stat-4.7 -MA might help.
This is the statistics from the replica client node machine.
Mpool REGINFO information:
Mpool Region type
3 Region ID
__db.003 Region name
0x28710000 Original region address
0x28710000 Region address
0x287100c0 Region primary address
0 Region maximum allocation
0 Region allocated
Region allocations: 4094 allocations, 12894388 failures, 4007 frees, 1 longest
Allocations by power-of-two sizes:
1KB 34
2KB 1
4KB 0
8KB 12898447
16KB 0
32KB 0
64KB 0
128KB 0
256KB 0
512KB 0
1024KB 0
REGION_JOIN_OK Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
MPOOL structure:
9 MPOOL region mutex 2 / 26M 0% 11489 / 674238720
401 / 2533580 Maximum checkpoint LSN
37 Hash table entries
11 Hash table last-checked
496749207 Hash table LRU count
497385622 Put counter
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Please help me resolve both cases.
Regards,
Sury

Thanks very much! It's long, Is there an rignt <font face="tahoma,verdana,sans-serif" size="1" color="#000">answer</font> for the problem?

DB_RUNRECOVERY or DB_PAGE_NOTFOUND

Similar Messages

Maybe you are looking for