What is Berkeley DB?

Please post some links which can help us understand what this new (??) product is.

have a look there
http://en.wikipedia.org/wiki/Berkeley_DBKhurram

Similar Messages

Is Berkeley an object-oriented database?

As the subject described, please help me to understand if the Berkeley is an object-oriented, or a data structure based database.

Hi Kevin,
A Berkeley DB database is like a relational table. Data is stored as key/data pairs and a key/data pair (a record) is similar to an RDBMS row with a primary key.
What Berkeley DB is not: http://www.oracle.com/technology/documentation/berkeley-db/db/ref/intro/dbisnot.html
What is Berkeley DB: http://www.oracle.com/technology/documentation/berkeley-db/db/ref/intro/dbis.html
Do you need Berkeley DB?: http://www.oracle.com/technology/documentation/berkeley-db/db/ref/intro/need.html
Bogdan Coman

Election problem after repeated split-brains with two nodes

Hi
I'm using a customized source based on BDB-5.1.19 (excxx_repquote)
with two site one - MASTER and the other SLAVE...
nsite=2
ack=quorum
- the master is writing to quotedb at a rate of 10 txn per sec
- the test consist to isolate the client from the master (split brain) and reconnect it after a random time include from 1sec to 10sec
the test run well about 10 times but at a moment the process slave receive DB_EVENT_REP_ELECTION_FAILED
and the master enter in election mode and never exit from the CLIENT mode. I must say that to freeze the client I decide to kill me (kill -9 my pid) when I receive such event...
here is the verbose log on the master...
[1307872770:871621][6510/47655809107168] MASTER: rep_send_function returned: 110
[1307872770:973655][6510/47655809107168] MASTER: bulk_msg: Send buffer after copy due to PERM
[1307872770:973667][6510/47655809107168] MASTER: send_bulk: Send 266 (0x10a) bulk buffer bytes
[1307872770:973672][6510/47655809107168] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type bulk_log, LSN [21][986648] perm
[1307872770:973693][6510/47655809107168] MASTER: will await acknowledgement: need 1
[1307872771:26623][6510/47655809107168] MASTER: rep_send_function returned: 110
[1307872771:126380][6510/1162996032] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type log, LSN [21][946345]
[1307872771:126407][6510/1162996032] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type dupmaster, LSN [0][0] nobuf
[1307872771:126695][6510/1162996032] MASTER: rep_start: Found old version log 17
[1307872771:126753][6510/1162996032] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type newclient, LSN [0][0] nobuf
[1307872771:126833][6510/1183975744] CLIENT: starting election thread
[1307872771:126876][6510/1183975744] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872771:126890][6510/1183975744] CLIENT: Election thread owns egen 69
[1307872771:127423][6510/1173485888] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type newclient, LSN [0][0]
[1307872771:130079][6510/1183975744] CLIENT: Tallying VOTE1[0] (2147483647, 69)
[1307872771:130113][6510/1183975744] CLIENT: Beginning an election
[1307872771:130134][6510/1183975744] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type vote1, LSN [21][986728] nobuf
[1307872771:130147][6510/1173485888] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type master_req, LSN [0][0] nobuf
[1307872771:130438][6510/1152506176] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote1, LSN [21][946437]
[1307872771:130460][6510/1162996032] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [21][986728]
[1307872771:130467][6510/1152506176] CLIENT: Updating gen from 68 to 70
[1307872771:130482][6510/1162996032] CLIENT: Received ALIVE egen of 71, mine 69
[1307872771:130503][6510/1162996032] CLIENT: Election finished in 0.003602000 sec
[1307872771:130515][6510/1162996032] CLIENT: Election done; egen 70
[1307872771:130534][6510/1152506176] CLIENT: Received vote1 egen 71, egen 71
[1307872771:130581][6510/1152506176] CLIENT: Tallying VOTE1[0] (0, 71)
[1307872771:130593][6510/1089075520] CLIENT: starting election thread
[1307872771:130619][6510/1152506176] CLIENT: Incoming vote: (eid)0 (pri)100 ELECTABLE (gen)70 (egen)71 [21,946437]
[1307872771:130642][6510/1152506176] CLIENT: Not in election, but received vote1 0x282c 0x8
[1307872771:130674][6510/1089075520] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872771:130692][6510/1089075520] CLIENT: Election thread owns egen 71
[1307872771:130704][6510/1194465600] CLIENT: starting election thread
[1307872771:130733][6510/1194465600] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872771:132922][6510/1089075520] CLIENT: Tallying VOTE1[1] (2147483647, 71)
[1307872771:132949][6510/1089075520] CLIENT: Accepting new vote
[1307872771:132958][6510/1089075520] CLIENT: Beginning an election
[1307872771:132973][6510/1089075520] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type vote1, LSN [21][986728] nobuf
[1307872771:132985][6510/1194465600] CLIENT: election thread is exiting
[1307872771:133012][6510/1089075520] CLIENT: Tallying VOTE2[0] (2147483647, 71)
[1307872771:133037][6510/1089075520] CLIENT: Counted my vote 1
[1307872771:133048][6510/1089075520] CLIENT: Skipping phase2 wait: already got 1 votes
[1307872771:133060][6510/1089075520] CLIENT: Got enough votes to win; election done; (prev) gen 70
[1307872771:133071][6510/1089075520] CLIENT: Election finished in 0.002367000 sec
[1307872771:133084][6510/1089075520] CLIENT: Election done; egen 72
[1307872771:133111][6510/1089075520] CLIENT: Ended election with 0, e_th 1, egen 72, flag 0x2a2c, e_fl 0x0, lo_fl 0x6
[1307872771:133170][6510/1173485888] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [0][0]
[1307872771:133187][6510/1173485888] CLIENT: Racing replication msg lockout, ignore message.
[1307872771:173744][6510/1162996032] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote2, LSN [0][0]
[1307872771:173769][6510/1162996032] CLIENT: Racing replication msg lockout, ignore message.
[1307872771:231593][6510/1183975744] CLIENT: Ended election with 0, e_th 0, egen 72, flag 0x2a2c, e_fl 0x0, lo_fl 0x1c
[1307872771:231629][6510/1183975744] CLIENT: election thread is exiting
[1307872777:443794][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307872971:644194][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873165:844583][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873360:44955][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873554:245347][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873748:445736][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873942:646117][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307874136:846509][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
.... and infinite stay to this situation
My question is why the Master is suddenly transformed into CLIENT and why it's never returning to the MASTER
Thanks in advance ...
here is the log for the client
[1307872315:455113][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][984396]
[1307872315:455134][1282/1160603968] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][984483] perm
[1307872315:609962][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][984733] perm
[1307872315:764958][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][984986] perm
[1307872315:919962][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985238] perm
[1307872316:75018][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985491] perm
[1307872316:229959][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985741] perm
[1307872316:384949][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985993] perm
[1307872316:499899][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][986141] perm
[1307872316:539895][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986221]
[1307872316:540078][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986307]
[1307872316:540100][1282/1160603968] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986394] perm
[1307872316:694950][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][986648] perm
[1307872316:847349][1282/1129134400] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type log, LSN [21][946345]
[1307872316:847698][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type dupmaster, LSN [0][0]
[1307872316:847999][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type newclient, LSN [0][0]
[1307872316:848168][1282/1171093824] MASTER: rep_start: Found old version log 17
[1307872316:848222][1282/1181583680] CLIENT: Racing replication msg lockout, ignore message.
[1307872316:848398][1282/1171093824] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type newclient, LSN [0][0] nobuf
[1307872316:848504][1282/1192073536] CLIENT: starting election thread
[1307872316:848542][1282/1192073536] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872316:848566][1282/1192073536] CLIENT: Election thread owns egen 71
[1307872316:849634][1282/1192073536] CLIENT: Tallying VOTE1[0] (2147483647, 71)
[1307872316:849654][1282/1192073536] CLIENT: Beginning an election
[1307872316:849680][1282/1192073536] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type vote1, LSN [21][946437] nobuf
[1307872316:851403][1282/1160603968] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type vote1, LSN [21][986728]
[1307872316:851448][1282/1160603968] CLIENT: Received vote1 egen 69, egen 71
[1307872316:851470][1282/1160603968] CLIENT: Received old vote 69, egen 71, ignoring vote1
[1307872316:851481][1282/1160603968] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [21][986728] nobuf
[1307872316:851538][1282/1171093824] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type master_req, LSN [0][0]
[1307872316:851558][1282/1171093824] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [0][0] nobuf
[1307872316:854254][1282/1160603968] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote1, LSN [21][986728]
[1307872316:854275][1282/1160603968] CLIENT: Received vote1 egen 71, egen 71
[1307872316:854317][1282/1160603968] CLIENT: Tallying VOTE1[1] (0, 71)
[1307872316:854339][1282/1160603968] CLIENT: Incoming vote: (eid)0 (pri)100 ELECTABLE (gen)70 (egen)71 [21,986728]
[1307872316:854353][1282/1160603968] CLIENT: Existing vote: (eid)2147483647 (pri)100 (gen)70 (sites)2 [21,946437]
[1307872316:854369][1282/1160603968] CLIENT: Accepting new vote
[1307872316:854379][1282/1160603968] CLIENT: Phase1 election done
[1307872316:854395][1282/1160603968] CLIENT: Voting for 0
[1307872316:854407][1282/1160603968] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type vote2, LSN [0][0] nobuf
[1307872317:960344][1282/1192073536] CLIENT: After phase 2: votes 0, nvotes 1, nsites 2
[1307872317:960389][1282/1192073536] CLIENT: Election finished in 1.111809000 sec
[1307872317:960401][1282/1192073536] CLIENT: Election done; egen 72
[1307872317:960412][1282/1192073536] CLIENT: Ended election with -30974, e_th 0, egen 72, flag 0x282c, e_fl 0x0, lo_fl 0x0
Kill me !!
--- my source
on the master I run manually :
txn_rate 1
loop_rate 10
loop 1 20000
* See the file LICENSE for redistribution information.
* Copyright (c) 2001, 2010 Oracle and/or its affiliates. All rights reserved.
* $Id$
* In this application, we specify all communication via the command line. In
* a real application, we would expect that information about the other sites
* in the system would be maintained in some sort of configuration file. The
* critical part of this interface is that we assume at startup that we can
* find out
*      1) what our Berkeley DB home environment is,
*      2) what host/port we wish to listen on for connections; and
*      3) an optional list of other sites we should attempt to connect to.
* These pieces of information are expressed by the following flags.
* -h home (required; h stands for home directory)
* -l host:port (required; l stands for local)
* -C or -M (optional; start up as client or master)
* -r host:port (optional; r stands for remote; any number of these may be
*     specified)
* -R host:port (optional; R stands for remote peer; only one of these may
* be specified)
* -a all|quorum (optional; a stands for ack policy)
* -b (optional; b stands for bulk)
* -n nsites (optional; number of sites in replication group; defaults to 0
*     to try to dynamically compute nsites)
* -p priority (optional; defaults to 100)
* -v (optional; v stands for verbose)
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <string>
#include <sstream>
#include <sys/types.h>
#include <signal.h>
#include <db_cxx.h>
#include "RepConfigInfo.h"
#include "dbc_auto.h"
using std::cout;
using std::cin;
using std::cerr;
using std::endl;
using std::ends;
using std::flush;
using std::istream;
using std::istringstream;
using std::ostringstream;
using std::string;
using std::getline;
#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>
#define     CACHESIZE     (10 * 1024 * 1024)
#define     DATABASE     "quote.db"
#define     DATABASE2     "quote2.db"
const char *progname = "excxx_repquote";
#include <errno.h>
#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#define     snprintf          _snprintf
#define     sleep(s)          Sleep(1000 * (s))
extern "C" {
extern int getopt(int, char * const *, const char *);
extern char *optarg;
typedef HANDLE thread_t;
typedef DWORD thread_exit_status_t;
#define     thread_create(thrp, attr, func, arg)                    \
(((*(thrp) = CreateThread(NULL, 0,                         \
     (LPTHREAD_START_ROUTINE)(func), (arg), 0, NULL)) == NULL) ? -1 : 0)
#define     thread_join(thr, statusp)                         \
((WaitForSingleObject((thr), INFINITE) == WAIT_OBJECT_0) &&          \
GetExitCodeThread((thr), (LPDWORD)(statusp)) ? 0 : -1)
#else /* !_WIN32 */
#include <pthread.h>
typedef pthread_t thread_t;
typedef void* thread_exit_status_t;
#define     thread_create(thrp, attr, func, arg)                    \
pthread_create((thrp), (attr), (func), (arg))
#define     thread_join(thr, statusp) pthread_join((thr), (statusp))
#endif
// Struct used to store information in Db app_private field.
typedef struct {
     bool app_finished;
     bool in_client_sync;
     bool is_master;
     bool no_dummy_wr;
} APP_DATA;
static void log(const char *);
void checkpoint_thread (void );
void log_archive_thread (void );
void dummy_write_thread (void );
class RepQuoteExample {
public:
     RepQuoteExample();
     void init(RepConfigInfo* config);
     void doloop();
     int terminate();
     static void event_callback(DbEnv* dbenv, u_int32_t which, void *info);
     void print_stocks_size(Db *dbp);
private:
     // disable copy constructor.
     RepQuoteExample(const RepQuoteExample &);
     void operator = (const RepQuoteExample &);
     // internal data members.
     APP_DATA          app_data;
     RepConfigInfo *app_config;
     DbEnv          cur_env;
     thread_t ckp_thr;
     thread_t lga_thr;
     thread_t dmy_thr;
     // private methods.
     void print_stocks(Db *dbp);
     void print_env(DbEnv *dbenv);
     void prompt();
RepQuoteExample *g_runner=NULL;
RepConfigInfo *g_config=NULL;
class DbHolder {
public:
     DbHolder(DbEnv env, const char _dbname) : env(env)
          dbp = 0;
          if (_dbname) dbname=_dbname;
          else dbname=DATABASE;
     ~DbHolder() {
     try {
          close();
     } catch (...) {
          // Ignore: this may mean another exception is pending
     bool ensure_open(bool creating) {
     if (dbp)
          return (true);
     dbp = new Db(env, 0);
     u_int32_t flags = DB_AUTO_COMMIT;
     if (creating)
          flags |= DB_CREATE;
     try {
          //dbp->open(NULL, DATABASE, NULL, DB_BTREE, flags, 0);
          //dbp->open(NULL, dbname, NULL, DB_BTREE, flags, 0);
          dbp->open(NULL, NULL, dbname, DB_BTREE, flags, 0);
          return (true);
     } catch (DbDeadlockException e) {
     } catch (DbRepHandleDeadException e) {
     } catch (DbException e) {
          if (e.get_errno() == DB_REP_LOCKOUT) {
          // Just fall through.
          } else if (e.get_errno() == ENOENT && !creating) {
          // Provide a bit of extra explanation.
          log("Stock DB does not yet exist");
          } else
          throw;
     // (All retryable errors fall through to here.)
     log("please retry the operation");
     close();
     return (false);
     void close() {
     if (dbp) {
          try {
          dbp->close(0);
          delete dbp;
          dbp = 0;
          } catch (...) {
          delete dbp;
          dbp = 0;
          throw;
     operator Db *() {
     return dbp;
     Db *operator->() {
     return dbp;
private:
     Db *dbp;
     DbEnv *env;
     const char *dbname;
class StringDbt : public Dbt {
public:
#define GET_STRING_OK 0
#define GET_STRING_INVALID_PARAM 1
#define GET_STRING_SMALL_BUFFER 2
#define GET_STRING_EMPTY_DATA 3
     int get_string(char **buf, size_t buf_len)
          size_t copy_len;
          int ret = GET_STRING_OK;
          if (buf == NULL) {
               cerr << "Invalid input buffer to get_string" << endl;
               return GET_STRING_INVALID_PARAM;
          // make sure the string is null terminated.
          memset(*buf, 0, buf_len);
          // if there is no string, just return.
          if (get_data() == NULL || get_size() == 0)
               return GET_STRING_OK;
          if (get_size() >= buf_len) {
               ret = GET_STRING_SMALL_BUFFER;
               copy_len = buf_len - 1; // save room for a terminator.
          } else
               copy_len = get_size();
          memcpy(*buf, get_data(), copy_len);
          return ret;
     size_t get_string_length()
          if (get_size() == 0)
               return 0;
          return strlen((char *)get_data());
     void set_string(char *string)
          set_data(string);
          set_size((u_int32_t)strlen(string));
     StringDbt(char *string) :
     Dbt(string, (u_int32_t)strlen(string)) {};
     StringDbt() : Dbt() {};
     ~StringDbt() {};
     // Don't add extra data to this sub-class since we want it to remain
     // compatible with Dbt objects created internally by Berkeley DB.
Db *g_repquote=NULL;
RepQuoteExample::RepQuoteExample() : app_config(0), cur_env(0) {
     app_data.app_finished = 0;
     app_data.in_client_sync = 0;
     app_data.is_master = 0; // assume I start out as client
     app_data.no_dummy_wr = 0 ; //prevent to run dummy write
int (*old_rep_process_message)
          __P((DB_ENV *, DBT *, DBT *, int, DB_LSN *));
int my_rep_process_message __P((DB_ENV arg1, DBT arg2, DBT arg3, int arg4, DB_LSN arg5))
     printf("EZ->>> my_rep_process_message:%p\n",arg5);
     old_rep_process_message(arg1,arg2,arg3,arg4,arg5);
void RepQuoteExample::init(RepConfigInfo *config) {
     app_config = config;
     cur_env.set_app_private(&app_data);
     cur_env.set_errfile(stderr);
     app_data.no_dummy_wr=config->no_dummy_wr;
     if (app_data.no_dummy_wr)
          printf("No dummy !!!\n");
     //EZ->cur_env.set_errpfx(progname);
     cur_env.set_event_notify(event_callback);
     // Configure bulk transfer to send groups of records to clients
     // in a single network transfer. This is useful for master sites
     // and clients participating in client-to-client synchronization.
     if (app_config->bulk)
          cur_env.rep_set_config(DB_REP_CONF_BULK, 1);
     // Set the total number of sites in the replication group.
     // This is used by repmgr internal election processing.
     if (app_config->totalsites > 0)
          cur_env.rep_set_nsites(app_config->totalsites);
     // Turn on debugging and informational output if requested.
     if (app_config->verbose)
          cur_env.set_verbose(DB_VERB_REPLICATION, 1);
     cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
     cur_env.set_verbose(DB_VERB_RECOVERY, 1);
     cur_env.set_verbose(DB_VERB_REPLICATION, 1);
     cur_env.set_verbose(DB_VERB_REP_ELECT, 1);
     cur_env.set_verbose(DB_VERB_REP_LEASE, 1);
     cur_env.set_verbose(DB_VERB_REP_SYNC, 1);
     cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
     // Set replication group election priority for this environment.
     // An election first selects the site with the most recent log
     // records as the new master. If multiple sites have the most
     // recent log records, the site with the highest priority value
     // is selected as master.
     cur_env.rep_set_priority(app_config->priority);
     // Set the policy that determines how master and client sites
     // handle acknowledgement of replication messages needed for
     // permanent records. The default policy of "quorum" requires only
     // a quorum of electable peers sufficient to ensure a permanent
     // record remains durable if an election is held. The "all" option
     // requires all clients to acknowledge a permanent replication
     // message instead.
     cur_env.repmgr_set_ack_policy(app_config->ack_policy);
     // Set the threshold for the minimum and maximum time the client
     // waits before requesting retransmission of a missing message.
     // Base these values on the performance and load characteristics
     // of the master and client host platforms as well as the round
     // trip message time.
     cur_env.rep_set_request(20000, 500000);
     // Configure deadlock detection to ensure that any deadlocks
     // are broken by having one of the conflicting lock requests
     // rejected. DB_LOCK_DEFAULT uses the lock policy specified
     // at environment creation time or DB_LOCK_RANDOM if none was
     // specified.
     cur_env.set_lk_detect(DB_LOCK_DEFAULT);
     // The following base replication features may also be useful to your
     // application. See Berkeley DB documentation for more details.
     // - Master leases: Provide stricter consistency for data reads
     // on a master site.
     // - Timeouts: Customize the amount of time Berkeley DB waits
     // for such things as an election to be concluded or a master
     // lease to be granted.
     // - Delayed client synchronization: Manage the master site's
     // resources by spreading out resource-intensive client
     // synchronizations.
     // - Blocked client operations: Return immediately with an error
     // instead of waiting indefinitely if a client operation is
     // blocked by an ongoing client synchronization.
     cur_env.repmgr_set_local_site(app_config->this_host.host,
     app_config->this_host.port, 0);
     for ( REP_HOST_INFO *cur = app_config->other_hosts; cur != NULL;
          cur = cur->next) {
          cur_env.repmgr_add_remote_site(cur->host, cur->port,
          NULL, cur->peer ? DB_REPMGR_PEER : 0);
     // Configure heartbeat timeouts so that repmgr monitors the
     // health of the TCP connection. Master sites broadcast a heartbeat
     // at the frequency specified by the DB_REP_HEARTBEAT_SEND timeout.
     // Client sites wait for message activity the length of the
     // DB_REP_HEARTBEAT_MONITOR timeout before concluding that the
     // connection to the master is lost. The DB_REP_HEARTBEAT_MONITOR
     // timeout should be longer than the DB_REP_HEARTBEAT_SEND timeout.
     cur_env.rep_set_timeout(DB_REP_HEARTBEAT_SEND, 5000000);
     cur_env.rep_set_timeout(DB_REP_HEARTBEAT_MONITOR, 10000000);
     // The following repmgr features may also be useful to your
     // application. See Berkeley DB documentation for more details.
     // - Two-site strict majority rule - In a two-site replication
     // group, require both sites to be available to elect a new
     // master.
     // - Timeouts - Customize the amount of time repmgr waits
     // for such things as waiting for acknowledgements or attempting
     // to reconnect to other sites.
     // - Site list - return a list of sites currently known to repmgr.
     // We can now open our environment, although we're not ready to
     // begin replicating. However, we want to have a dbenv around
     // so that we can send it into any of our message handlers.
     cur_env.set_cachesize(0, CACHESIZE, 0);
     cur_env.set_flags(DB_REP_PERMANENT, 1);
     //cur_env.set_flags(DB_TXN_WRITE_NOSYNC, 1);
/*     u_int32_t maxlocks=300000;
     if (maxlocks != 0)
          cur_env.set_lk_max_locks(maxlocks);
     u_int32_t maxlocks_o=300000;
     if (maxlocks_o != 0)
          cur_env.set_lk_max_objects(maxlocks_o);
     u_int32_t maxmutex=300000;
     if (maxmutex != 0)
          cur_env.mutex_set_max(maxmutex);
     DbEnv          *m_env=&cur_env;
     m_env->set_flags(DB_TXN_NOSYNC, 1);
     m_env->set_lk_max_lockers(60000);
     m_env->set_lk_max_objects(60000);
     m_env->set_lk_max_locks(60000);
     m_env->set_tx_max(60000);
     //m_env->repmgr_set_ack_policy(DB_REPMGR_ACKS_NONE);
     m_env->rep_set_timeout(DB_REP_ACK_TIMEOUT, 50 * 1000); //50ms
     m_env->rep_set_timeout(DB_REP_CHECKPOINT_DELAY, 0);
     //m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 30 * 1000 * 1000); // 30 seconds
     m_env->rep_set_timeout(DB_REP_ELECTION_TIMEOUT, 1 * 1000 * 1000); // 5 seconds
     m_env->rep_set_timeout(DB_REP_FULL_ELECTION_TIMEOUT, 5 * 1000 * 1000); // 5 seconds
     m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 5 * 1000 * 1000);
     //m_env->rep_set_timeout(DB_REP_ELECTION_RETRY, 10 * 1000 * 1000); //10 seconds
     //m_env->rep_set_timeout(DB_REP_HEARTBEAT_MONITOR, 80 * 1000 * 1000); //80 seconds
     //m_env->rep_set_timeout(DB_REP_HEARTBEAT_SEND, 500 * 1000); //500 milli seconds
     //The minimum number of microseconds a client waits before requesting retransmission
     u_int32_t rep_req_min = 40000; //40 000 microsec = 40 mili
     //The maximum number of microseconds a client waits before requesting retransmission
     u_int32_t rep_req_max = 1280000;// 1 280 000 microsec = 1.28 sec
     u_int32_t rep_limit_gbytes = 0;
     u_int32_t rep_limit_bytes = 100 * 1024 * 1024; // 100MB
     m_env->rep_set_request(rep_req_min, rep_req_max);
     m_env->rep_set_limit(rep_limit_gbytes, rep_limit_bytes);
     cur_env.open(app_config->home, DB_CREATE | DB_RECOVER |
     DB_THREAD | DB_INIT_REP | DB_INIT_LOCK | DB_INIT_LOG |
     DB_INIT_MPOOL | DB_INIT_TXN , 0);
     //keep old function for chain
     //old_rep_process_message=cur_env.get_DB_ENV()->rep_process_message;
     //derouting
     //cur_env.get_DB_ENV()->rep_process_message=my_rep_process_message;
     /*int _i;
     cur_env.log_get_config(DB_LOG_DIRECT, &_i);printf ("DB_LOG_DIRECT = %d\n",_i);
     cur_env.log_get_config(DB_LOG_DSYNC, &_i);printf ("DB_LOG_DSYNC = %d\n",_i);
     cur_env.log_get_config(DB_LOG_AUTO_REMOVE, &_i);printf ("DB_LOG_AUTO_REMOVE = %d\n",_i);
     cur_env.log_get_config(DB_LOG_IN_MEMORY, &_i);printf ("DB_LOG_IN_MEMORY = %d\n",_i);
     cur_env.log_get_config(DB_LOG_ZERO,&_i);printf ("DB_LOG_ZERO = %d\n",_i);
     // Start checkpoint and log archive support threads.
     (void)thread_create(&ckp_thr, NULL, checkpoint_thread, &cur_env);
     (void)thread_create(&lga_thr, NULL, log_archive_thread, &cur_env);
     (void)thread_create(&dmy_thr, NULL, dummy_write_thread, &cur_env);
     cur_env.repmgr_start(3, app_config->start_policy);
}

int RepQuoteExample::terminate() {
     try {
          // Wait for checkpoint and log archive threads to finish.
          // Windows does not allow NULL pointer for exit code variable.
          thread_exit_status_t exstat;
          (void)thread_join(lga_thr, &exstat);
          (void)thread_join(ckp_thr, &exstat);
          (void)thread_join(dmy_thr, &exstat);
          // We have used the DB_TXN_NOSYNC environment flag for
          // improved performance without the usual sacrifice of
          // transactional durability, as discussed in the
          // "Transactional guarantees" page of the Reference
          // Guide: if one replication site crashes, we can
          // expect the data to exist at another site. However,
          // in case we shut down all sites gracefully, we push
          // out the end of the log here so that the most
          // recent transactions don't mysteriously disappear.
          cur_env.log_flush(NULL);
          cur_env.close(0);
     } catch (DbException dbe) {
          cout << "error closing environment: " << dbe.what() << endl;
     return 0;
void RepQuoteExample::prompt() {
     cout << "QUOTESERVER";
     if (!app_data.is_master)
          cout << "(read-only)";
     cout << "> " << flush;
void log(const char *msg) {
time_t currentTime;
// get and print the current time
time (&currentTime); // fill now with the current time
     char buff[255];
     strncpy(buff,ctime(&currentTime),sizeof(buff));
     char *p;
     for(p =buff ; *p != '\n'; p++);
     *p = '\0';
     cerr << buff << " - " << msg << endl;
// Simple command-line user interface:
// - enter "<stock symbol> <price>" to insert or update a record in the
//     database;
// - just press Return (i.e., blank input line) to print out the contents of
//     the database;
// - enter "quit" or "exit" to quit.
void RepQuoteExample::doloop() {
     DbHolder dbh1(&cur_env,DATABASE);
     DbHolder dbh2(&cur_env,DATABASE2);
     DbHolder *dbh=&dbh1;
     DbTxn *txn;
     string input;
bool truncate = false;
     char *c;
     using_history();
     g_repquote=*dbh;
     int loop_rate = 0;
     int txn_rate = 500;
     while (prompt(), /*getline(cin, input)*/c=readline(NULL)) {
          input=std::string(c);
          add_history(c);
          free(c);
          int start_loop = 0;
          int end_loop = 0;
          int start_loop_d = 0;
          int end_loop_d = 0;
          istringstream is(input);
          string token1, token2, token3;
truncate = false;
start_loop = 0;
end_loop = 0;
          // Read 0, 1 or 2 tokens from the input.
          int count = 0;
          if (is >> token1) {
               count++;
               if (is >> token2)
               count++;
               if (is >> token3)
               count++;
          if (count == 1) {
     if (token1 == "truncate" ) {
                    truncate = true;
               else if (token1 == "env" ){
                    print_env(&cur_env);
                    continue;
     else if (token1 == "verbose" ) {
                    app_config->verbose = !app_config->verbose;
                    if (app_config->verbose)
                         cur_env.set_verbose(DB_VERB_REPLICATION, 1);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
                         cur_env.set_verbose(DB_VERB_RECOVERY, 1);
                         cur_env.set_verbose(DB_VERB_REP_ELECT, 1);
                         cur_env.set_verbose(DB_VERB_REP_LEASE, 1);
                         cur_env.set_verbose(DB_VERB_REP_SYNC, 1);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
                         log("verbose is on");
                    else
                         cur_env.set_verbose(DB_VERB_REPLICATION, 0);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 0);
                         cur_env.set_verbose(DB_VERB_RECOVERY, 0);
                         cur_env.set_verbose(DB_VERB_REP_ELECT, 0);
                         cur_env.set_verbose(DB_VERB_REP_LEASE, 0);
                         cur_env.set_verbose(DB_VERB_REP_SYNC, 0);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 0);
                         log("verbose is off");
                    continue;
     else if (token1 == "print" ) {
               print_stocks(*dbh);
                    count = 0;
     else if (token1 == "db1" ) {
                    dbh=&dbh1;
                    g_repquote=*dbh;
                    log( "switch to Db1");
                    count = 0;
     else if (token1 == "db2" ) {
                    dbh=&dbh2;
                    g_repquote=*dbh;
                    log( "switch to Db2");
                    count = 0;
               else if (token1 == "exit" || token1 == "quit") {
                    app_data.app_finished = 1;
                    break;
               } else {
                    log("Format: <stock> <price>");
                    continue;
else if (count == 2)
               if (token1 == "loop_rate" ){
     loop_rate = atoi(token2.c_str());
                    continue;
               if (token1 == "txn_rate" ){
     txn_rate = atoi(token2.c_str());
                    continue;
else if (count == 3)
if (token1 == "loop" ) {
start_loop = atoi(token2.c_str());
end_loop = start_loop + atoi(token3.c_str());
if (token1 == "delete" ) {
start_loop_d = atoi(token2.c_str());
end_loop_d = start_loop_d + atoi(token3.c_str());
          // Here we know count is either 0 or 2, so we're about to try a
          // DB operation.
          // Open database with DB_CREATE only if this is a master
          // database. A client database uses polling to attempt
          // to open the database without DB_CREATE until it is
          // successful.
          // This DB_CREATE polling logic can be simplified under
          // some circumstances. For example, if the application can
          // be sure a database is already there, it would never need
          // to open it with DB_CREATE.
          if (!dbh->ensure_open(app_data.is_master))
               continue;
          try {
               if (count == 0)
                    if (app_data.in_client_sync)
                         log( "Cannot read data during client initialization - please try again.");
                    else
                         print_stocks_size(*dbh);
               else if (!app_data.is_master)
                    log("Can't update at client");
               else {
                    if (truncate)
u_int32_t no_remove;
                    txn = NULL;
cur_env.txn_begin(NULL, &txn, DB_TXN_NOWAIT);
                         try
          (*dbh)->truncate(txn, &no_remove, 0);
// commit
txn->commit(0);
txn = NULL;
} catch (DbException &e) {
std::cout << "Error on txn commit: " << e.what() << std::endl;
                    //     } catch (DbDeadlockException &) {
                    if (txn != NULL)
                         (void)txn->abort();
// std::cout << "Error on txn commit: " << std::endl;
else if (start_loop)
int j=0;
for (int i=start_loop; i<=end_loop; i=i+txn_rate)
//transaction begin
               txn = NULL;
               cur_env.txn_begin(NULL, &txn, 0);
for (j=i; j<=end_loop && j<=(i+txn_rate); j++)
                              Dbt key, value;
     std::string key1, value1;
     std::stringstream sstrm;
     sstrm << "key" << j << ends;
     key1 = sstrm.str();
               key.set_data((void *)key1.c_str());
               key.set_size((u_int32_t)strlen(key1.c_str()));
     sstrm.str("");
     int payload = rand() + j;
                              sstrm << "price" << payload << ends;
     value1 = sstrm.str();
               value.set_data((void *)value1.c_str());
               value.set_size((u_int32_t)strlen(value1.c_str()));
     // Perform the database put
     (*dbh)->put(txn, &key, &value, 0);
                         printf("Kill me !!\n");
                         kill(getpid(),-9);
                         exit(0);
     try
                              // commit
                    txn->commit(0);
                    txn = NULL;
               } catch (DbException &e) {
                    std::cout << "Error on txn commit: " << e.what() << std::endl;
                         if (loop_rate>0)
                              usleep(txn_rate * 1000 * 1000 / loop_rate);
                    else if (start_loop_d)
int j=0;
for (int i=start_loop_d; i<=end_loop_d; i=i+100)
//transaction begin
               txn = NULL;
               cur_env.txn_begin(NULL, &txn, 0);
for (j=i; j<=end_loop_d && j<=(i+100); j++)
                              Dbt key, value;
     std::string key1, value1;
     std::stringstream sstrm;
     sstrm << "key" << j << ends;
     key1 = sstrm.str();
               key.set_data((void *)key1.c_str());
               key.set_size((u_int32_t)strlen(key1.c_str()));
     // Perform the database put
     (*dbh)->del(txn, &key, 0);
     try
                              // commit
                    txn->commit(0);
                    txn = NULL;
               } catch (DbException &e) {
                    std::cout << "Error on txn commit: " << e.what() << std::endl;
                    else
                         const char *symbol = token1.c_str();
                         StringDbt key(const_cast<char*>(symbol));
                         const char *price = token2.c_str();
                         StringDbt data(const_cast<char*>(price));
                         (*dbh)->put(NULL, &key, &data, 0);
          } catch (DbDeadlockException e) {
               log("please retry the operation");
               dbh->close();
          } catch (DbRepHandleDeadException e) {
               log("please retry the operation");
               dbh->close();
          } catch (DbException e) {
               if (e.get_errno() == DB_REP_LOCKOUT) {
               log("please retry the operation");
               dbh->close();
               } else
               throw;
     dbh->close();
void RepQuoteExample::event_callback(DbEnv* dbenv, u_int32_t which, void *info)
     static char buf[256];
     APP_DATA app = (APP_DATA)dbenv->get_app_private();
     info = NULL;          /* Currently unused. */
     switch (which) {
     case DB_EVENT_REP_CLIENT:
          app->is_master = 0;
          app->in_client_sync = 1;
          sprintf(buf,"%s - %s",progname,"CLIENT");
          //EZ->dbenv->set_errpfx(buf);
          log("DB_EVENT_REP_CLIENT.");
          break;
     case DB_EVENT_REP_MASTER:
          app->is_master = 1;
          app->in_client_sync = 0;
          sprintf(buf,"%s - %s",progname,"MASTER");
          //EZ->dbenv->set_errpfx(buf);
          log("DB_EVENT_REP_MASTER.");
          break;
     case DB_EVENT_REP_NEWMASTER:
          log("DB_EVENT_REP_NEWMASTER.");
          app->in_client_sync = 1;
          break;
     case DB_EVENT_REP_PERM_FAILED:
          // Did not get enough acks to guarantee transaction
          // durability based on the configured ack policy. This
          // transaction will be flushed to the master site's
          // local disk storage for durability.
          log("DB_EVENT_REP_PERM_FAILED.");
          log("Insufficient acknowledgements to guarantee transaction durability.");
          break;
     case DB_EVENT_REP_STARTUPDONE:
          app->in_client_sync = 0;
          log("DB_EVENT_REP_STARTUPDONE.");
          break;
     case DB_EVENT_REP_ELECTION_FAILED:
          log("DB_EVENT_REP_ELECTION_FAILED.");
          //g_runner->init(g_config);
          printf("Kill me !!\n");
          kill(getpid(),-9);
          exit(0);
          break;
     case DB_EVENT_REP_DUPMASTER:
          log("DB_EVENT_REP_DUPMASTER.");
          break;
     default:
          dbenv->errx("ignoring event %d", which);
void RepQuoteExample::print_stocks_size(Db *dbp) {
     DB_BTREE_STAT *statp;
dbp->stat(NULL, &statp, 0);
     log("db_stat");
cout << "***************************************** >>>>>>>>>>> : database contains " << (u_long)statp->bt_ndata << " records\n";
void RepQuoteExample::print_env(DbEnv *dbenv) {
     dbenv->stat_print(DB_STAT_ALL);
void RepQuoteExample::print_stocks(Db *dbp) {
     StringDbt key, data;
#define     MAXKEYSIZE     10
#define     MAXDATASIZE     20
     char keybuf[MAXKEYSIZE + 1], databuf[MAXDATASIZE + 1];
     char kbuf, dbuf;
     memset(&key, 0, sizeof(key));
     memset(&data, 0, sizeof(data));
     kbuf = keybuf;
     dbuf = databuf;
     DbcAuto dbc(dbp, 0, 0);
     cout << "\tSymbol\tPrice" << endl
          << "\t======\t=====" << endl;
int no_records =0;
     for (int ret = dbc->get(&key, &data, DB_FIRST);
          ret == 0;
          ret = dbc->get(&key, &data, DB_NEXT)) {
          key.get_string(&kbuf, MAXKEYSIZE);
          data.get_string(&dbuf, MAXDATASIZE);
no_records++;
          cout << "\t" << keybuf << "\t" << databuf << endl;
cout << "********************** NO Records " << no_records << endl;
     cout << endl << flush;
     dbc.close();
static void usage() {
     cerr << "usage: " << progname << " -h home -l host:port [-CM]"
     << "[-r host:port][-R host:port]" << endl
     << " [-a all|quorum][-b][-n nsites][-p priority][-v]" << endl;
     cerr << "\t -h home (required; h stands for home directory)" << endl
     << "\t -l host:port (required; l stands for local)" << endl
     << "\t -C or -M (optional; start up as client or master)" << endl
     << "\t -r host:port (optional; r stands for remote; any "
     << "number of these" << endl
     << "\t may be specified)" << endl
     << "\t -R host:port (optional; R stands for remote peer; only "
     << "one of" << endl
     << "\t these may be specified)" << endl
     << "\t -a all|quorum (optional; a stands for ack policy)" << endl
     << "\t -b (optional; b stands for bulk)" << endl
     << "\t -n nsites (optional; number of sites in replication "
     << "group; defaults " << endl
     << "\t     to 0 to try to dynamically compute nsites)" << endl
     << "\t -p priority (optional; defaults to 100)" << endl
     << "\t -v (optional; v stands for verbose)" << endl;
     exit(EXIT_FAILURE);
int main(int argc, char **argv) {
     RepConfigInfo config;
     char ch, portstr, tmphost;
     int tmpport;
     bool tmppeer;
     config.no_dummy_wr = false;
     // Extract the command line parameters
     while ((ch = getopt(argc, argv, "E:a:bCh:l:Mn:p:R:r:vw")) != EOF) {
          tmppeer = false;
          switch (ch) {
          case 'a':
               if (strncmp(optarg, "all", 3) == 0)
                    config.ack_policy = DB_REPMGR_ACKS_ALL;
               else if (strncmp(optarg, "quorum", 6) != 0)
                    usage();
               break;
          case 'b':
               config.bulk = true;
               break;
          case 'C':
               config.start_policy = DB_REP_CLIENT;
               break;
          case 'E':
config.start_policy = DB_REP_ELECTION;
break;
          case 'h':
               config.home = optarg;
               break;
          case 'l':
               config.this_host.host = strtok(optarg, ":");
               if ((portstr = strtok(NULL, ":")) == NULL) {
                    cerr << "Bad host specification." << endl;
                    usage();
               config.this_host.port = (unsigned short)atoi(portstr);
               config.got_listen_address = true;
               break;
          case 'M':
               config.start_policy = DB_REP_MASTER;
               break;
          case 'n':
               config.totalsites = atoi(optarg);
               break;
          case 'p':
               config.priority = atoi(optarg);
               break;
          case 'R':
               tmppeer = true; // FALLTHROUGH
          case 'r':
               tmphost = strtok(optarg, ":");
               if ((portstr = strtok(NULL, ":")) == NULL) {
                    cerr << "Bad host specification." << endl;
                    usage();
               tmpport = (unsigned short)atoi(portstr);
               config.addOtherHost(tmphost, tmpport, tmppeer);
               break;
          case 'v':
               config.verbose = true;
               break;
          case 'w':
               config.no_dummy_wr = true;
               //config.priority = 2;
               break;
          case '?':
          default:
               usage();
     // Error check command line.
     if ((!config.got_listen_address) || config.home == NULL)
          usage();
     RepQuoteExample runner;
     g_runner=&runner;
     g_config=&config;
     try {
          runner.init(&config);
          runner.doloop();
     } catch (DbException dbe) {
          cerr << "Caught an exception during initialization or"
               << " processing: " << dbe.what() << endl;
     runner.terminate();
     return 0;
// This is a very simple thread that performs checkpoints at a fixed
// time interval. For a master site, the time interval is one minute
// plus the duration of the checkpoint_delay timeout (30 seconds by
// default.) For a client site, the time interval is one minute.
void checkpoint_thread(void args)
     DbEnv *env;
     APP_DATA *app;
     int i, ret;
     env = (DbEnv *)args;
     app = (APP_DATA *)env->get_app_private();
     for (;;) {
          // Wait for one minute, polling once per second to see if
          // application has finished. When application has finished,
          // terminate this thread.
          for (i = 0; i < 60; i++) {
               sleep(1);
               if (app->app_finished == 1)
                    return ((void *)EXIT_SUCCESS);
          // Perform a checkpoint.
          // original line
          if ((ret = env->txn_checkpoint(0, 0, 0)) != 0) {
          //if ((ret = env->txn_checkpoint(0, 0, DB_FORCE)) != 0) {
               env->err(ret, "Could not perform checkpoint.\n");
               return ((void *)EXIT_FAILURE);
// This is a simple log archive thread. Once per minute, it removes all but
// the most recent 3 logs that are safe to remove according to a call to
// DBENV->log_archive().
// Log cleanup is needed to conserve disk space, but aggressive log cleanup
// can cause more frequent client initializations if a client lags too far
// behind the current master. This can happen in the event of a slow client,
// a network partition, or a new master that has not kept as many logs as the
// previous master.
// The approach in this routine balances the need to mitigate against a
// lagging client by keeping a few more of the most recent unneeded logs
// with the need to conserve disk space by regularly cleaning up log files.
// Use of automatic log removal (DBENV->log_set_config() DB_LOG_AUTO_REMOVE
// flag) is not recommended for replication due to the risk of frequent
// client initializations.
void log_archive_thread(void args)
     DbEnv *env;
     APP_DATA *app;
     char **begin, **list;
     int i, listlen, logs_to_keep, minlog, ret;
     env = (DbEnv *)args;
     app = (APP_DATA *)env->get_app_private();
     logs_to_keep = 3;
     for (;;) {
          // Wait for one minute, polling once per second to see if
          // application has finished. When application has finished,
          // terminate this thread.
          for (i = 0; i < 60; i++) {
               sleep(1);
               if (app->app_finished == 1)
                    return ((void *)EXIT_SUCCESS);
          // Get the list of unneeded log files.
          if ((ret = env->log_archive(&list, DB_ARCH_ABS)) != 0) {
               env->err(ret, "Could not get log archive list.");
               return ((void *)EXIT_FAILURE);
          if (list != NULL) {
               listlen = 0;
               // Get the number of logs in the list.
               for (begin = list; *begin != NULL; begin++, listlen++);
               // Remove all but the logs_to_keep most recent
               // unneeded log files.
               minlog = listlen - logs_to_keep;
               for (begin = list, i= 0; i < minlog; list++, i++) {
                    if ((ret = unlink(*list)) != 0) {
                         env->err(ret,
                         "logclean: remove %s", *list);
                         env->errx(
                         "logclean: Error remove %s", *list);
                         free(begin);
                         return ((void *)EXIT_FAILURE);
               free(begin);
#define DATABASE_DUMMY "dummy.db"
void create_dummy_db(DB_ENV env, DB *dbp)
DB_ENV *dbenv=env;
int ret;
u_int32_t db_flags;
if ((ret = db_create(dbp, dbenv, 0)) != 0)
dbenv->err(dbenv, ret, "create_dummy_db: db_create");
db_flags = DB_AUTO_COMMIT | DB_CREATE;
//if ((ret = (*dbp)->open(*dbp,NULL, DATABASE, NULL, DB_BTREE, db_flags, 0)) != 0)
if ((ret = (*dbp)->open(*dbp,NULL, NULL, DATABASE_DUMMY, DB_BTREE, db_flags, 0)) != 0)
dbenv->err(dbenv, ret, "create_dummy_db: DB->open");
void reopen_dummy_db(DB_ENV env, DB *dbp)
DB_ENV *dbenv=env;
int ret;
u_int32_t db_flags;
if ((ret = db_create(dbp, dbenv, 0)) != 0)
dbenv->err(dbenv, ret, "create_dummy_db: db_create");
db_flags = DB_AUTO_COMMIT | DB_CREATE;
//if ((ret = (*dbp)->open(*dbp,NULL, DATABASE, NULL, DB_BTREE, db_flags, 0)) != 0)
if ((ret = (*dbp)->open(*dbp,NULL, NULL, DATABASE_DUMMY, DB_BTREE, db_flags, 0)) != 0)
dbenv->err(dbenv, ret, "reopen_dummy_db: DB->open");
void perform_db_operation(DB_ENV env, DB *dbp, bool bRead)
//main loop
//DB *dbp=NULL;
DB_ENV *dbenv=env;
int ret;
u_int32_t db_flags;
DBT key, data;
char buf[20]="dummy", *rbuf;
rbuf=buf;
if (*dbp == NULL)
create_dummy_db(dbenv, dbp);
if (! bRead)
     memset(&key, 0, sizeof(key));
     memset(&data, 0, sizeof(data));
     key.data = buf;
     key.size = (u_int32_t)strlen(buf);
     data.data = rbuf;
     data.size = (u_int32_t)strlen(rbuf);
     if ((ret = (*dbp)->put(*dbp, NULL, &key, &data, 0)) != 0)
          if (ret == DB_REP_HANDLE_DEAD)
               //create_dummy_db(dbenv, dbp);
               reopen_dummy_db(dbenv, dbp);
               (*dbp)->err(*dbp, ret, "DB->put :");
          else
          if (ret != DB_KEYEXIST)
               (*dbp)->err(*dbp, ret, "perform_db_operation: DB->put");
     else
          DB_BTREE_STAT *statp;
          (*dbp)->stat(*dbp,NULL, &statp, 0);
          std::cout<<"dbp read stats: key#"<< statp->bt_nkeys <<std::endl;
void dummy_write_thread(void args)
     DbEnv *env;
     APP_DATA *app;
     char **begin, **list;
     int i, listlen, logs_to_keep, minlog, ret;
     DB *m_dbp; // a pointer
     env = (DbEnv *)args;
     app = (APP_DATA *)env->get_app_private();
     logs_to_keep = 3;
     for (;;) {
          if (! app->no_dummy_wr)
               if (app->is_master)
               perform_db_operation(env->get_DB_ENV(),&m_dbp,false);
                    //env->txn_checkpoint(0, 0, DB_FORCE);
          usleep(1 * 1000 * 1000);
          else
               if (app->is_master)
                    //DB *db_quote=g_repquote->get_DB();
                    //perform_db_operation(env->get_DB_ENV(),&db_quote,true);
                    //if (g_repquote)
                    //     g_runner->print_stocks_size(g_repquote);
                    //env->txn_checkpoint(0, 0, DB_FORCE);
                    //perform_db_operation(env->get_DB_ENV(),&m_dbp,false);
                    env->rep_flush();
          usleep(4 * 1000 * 1000);
my script to simulate the split brain
#!/bin/sh
[ -z "$node1" ] && node1=10.10.32.121
[ -z "$node2" ] && node2=10.10.32.91
trap myend 0 1 2 3 6 9 14 15
myend()
     echo "Receive signal to stop test..."
     un_split_brain
     echo "done"
     exit 1
split_brain()
     echo -n "Split-Brain at node $node..."
     snmpset -m ALL -v 2c -c svil 10.10.0.100 ifAdminStatus.41 i 2 >/dev/null 2>&1
     echo "done"
un_split_brain()
     echo -n "Undo Split-Brain at node $node..."
     snmpset -m ALL -v 2c -c svil 10.10.0.100 ifAdminStatus.41 i 1 >/dev/null 2>&1
     echo "done"
is_slave()
     local r=$(ssh root@$1 "tail -2 /tmp/BDB.log" | grep -c CLIENT)
     [ $r -gt 1 ] && ret=1 || ret=0
     return $ret
is_master()
     local r=$(ssh root@$1 "tail -2 /tmp/BDB.log" | grep -c MASTER)
     [ $r -gt 1 ] && ret=1 || ret=0
     return $ret
wait_for_master()
     echo -n "Waiting for MASTER at node $node ... "
     is_master $node
     r=$?
     while ( [ ! $r -eq 1 ] )
     do
     usleep 500000
     is_master $node
     r=$?
     echo -n "."
     done
     echo "done"
wait_for_slave()
     local r
     local tm
     tm=0
     echo -n "Waiting for SLAVE at node $node ... "
     is_slave $node
     r=$?
     while ( [ ! $r -eq 1 ] )
     do
          usleep 500000
          is_slave $node
          r=$?
          echo -n "."
          tm=$((tm+1))
          [ $tm -gt 120 ] && break
     done
     [ $tm -gt 120 ] && ret=0 || ret=1
     echo "done"
     return $ret
run_test_split_brain()
     local nt
     nt=1
     nfails=0
     x=4
     [ -z "$1" ] && node=$node2
     while ((1))
     do
          printf "*************** TEST [%02d] ********************\n" $nt
          split_brain
          wait_for_master
          x=$((RANDOM%9))
          echo -n " waiting $x sec ..."
          sleep $x
          echo "done"
          un_split_brain
          wait_for_slave
          r=$?
          [ ! $r -eq 1 ] && echo "`date` - test [$nt] - fails ..." || echo "`date` - test [$nt] - OK ."
          [ ! $r -eq 1 ] && nfails=$((nfails+1))
          perc_failure=$(echo "100.0 - $nfails / $nt * 100.0" | bc -l)
          echo "************************************************ [% Success test $perc_failure % ]"
          nt=$((nt+1))
          x=$((RANDOM%9))
          echo -n " waiting $x sec ..."
          sleep $x
     done
run_test_split_brain
here is the makefile to run to two environments
i run:
- make run
and in another window sh test_split_brain.sh
node1?=10.10.32.121
node2?=10.10.32.91
nsite?=2
debug?=0
all: RepQuoteExampleEric install
RepConfigInfo.o: RepConfigInfo.cpp RepConfigInfo.h
     g++ -I/usr/local/BerkeleyDB.5.1/include/ -g -O0 -c RepConfigInfo.cpp -o RepConfigInfo.o
RepQuoteExampleEric: RepQuoteExampleEric.cpp RepConfigInfo.o
     g++ -I/usr/local/BerkeleyDB.5.1/include/ -g -O0 RepQuoteExampleEric.cpp RepConfigInfo.o -o RepQuoteExampleEric -L /usr/local/BerkeleyDB.5.1/lib/ -lreadline -lcurses -ldb_cxx
kill:
     -ssh -X root@$(node1) "killall -9 /root/RepQuoteExampleEric"
     -ssh -X root@$(node2) "killall -9 /root/RepQuoteExampleEric"
run: RepQuoteExampleEric kill install clean_env
     ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.110:12345 -r 2.0.0.210:12345 -a quorum -b -n $(nsite) -v | tee /tmp/BDB.log\"" &
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
run_node2: clean_env2
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
debug_node2: clean_env2
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
     sleep 3
     ssh -X root@$(node2) /sbin/pidof RepQuoteExampleEric >/tmp/pid
     ssh -X root@$(node2) ~/kdbg /root/db-5.1.19/examples/cxx/excxx_repquote/RepQuoteExampleEric -p `cat /tmp/pid`
run_debug_node1: RepQuoteExampleEric kill install clean_env
     ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/kdbg /root/RepQuoteExampleEric\" " &
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v\"" &
run_debug_node2: RepQuoteExampleEric kill install clean_env
     ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.110:12345 -r 2.0.0.210:12345 -a quorum -b -n $(nsite) -v\" " &
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/kdbg /root/RepQuoteExampleEric\"" &
install: RepQuoteExampleEric
     scp RepQuoteExampleEric root@$(node1):~
     scp RepQuoteExampleEric root@$(node2):~
clean_env: clean_env1 clean_env2
clean_env1:
     ssh -X root@$(node1) rm -rf /opt/bdb/*
clean_env2:
     ssh -X root@$(node2) rm -rf /opt/bdb/*

Application dying with core dump on what appears to be berkeley

I have a web app being run on a solaris server. This web app ran for approximately 20 hours before crashing. I have the core dump file, but it is quite large (11GB). Using mdb I am able to get the stack trace from the core dump. Could this be an issue with a corrupted BDB? Or could it be corrupted environment file(s)? A similar, but different (slightly different stack trace) error that happened on another server a few hours later. We have 3 other servers that continue to run successfully though with the same BDB objects.
We are not writing to any of the BDBs. They are read only. Same with secondary BDBs. It is a java webapp interacting with the native solaris libraries. I forgot to add, this is a 64 bit machine.
If you have any ideas/suggestions/need more info please let me know.
Top of the stack trace below.
libc.so.1`_lwp_kill+8(6, 0, ffffffff7ef45538, ffffffffffffffff, ffffffff7ef3a000, 0)
libc.so.1`abort+0x118(1, 1d8, ffffffff7e2fc6f8, 1ef13c, 0, 0)
libjvm.so`__1cCosFabort6Fb_v_+0x58(1, 1, 2dbc8, ffffffff7e69e000, 3abb94, 2d800)
libjvm.so`__1cHVMErrorOreport_and_die6M_v_+0xcb4(ffffffff7e700480, 0, 1, ffffffff7e70ace0, ffffffff7e6cbb80, ffffffff7e55c873)
libjvm.so`JVM_handle_solaris_signal+0xa6c(a, fffffffccbefd500, fffffffccbefd220, 1a0c00, 101b8a800, 280000)
libc.so.1`__sighndlr+0xc(a, fffffffccbefd500, fffffffccbefd220, ffffffff7ddf5df0, 0, 9)
libc.so.1`call_user_handler+0x3e0(ffffffff7bc15a00, ffffffff7bc15a00, fffffffccbefd220, c, 0, 0)
libc.so.1`sigacthandler+0x54(0, fffffffccbefd500, fffffffccbefd220, ffffffff7bc15a00, 0, ffffffff7ef3a000)
libdb_java-4.6.so`__env_alloc_free+0x140(100bab1c0, fffffffc6c0c75f8, fffffffc8bcff5dc, 1a, 66e, fffffffccbefe461)
libdb_java-4.6.so`__memp_free+0x1c(100bab1c0, fffffffc82123140, fffffffc6c0c75f8, fffffffccbefe710, 0, fffffffc6a00f1d0)
libdb_java-4.6.so`__memp_bhfree+0x6fc(100fc46e0, 100bab1c0, fffffffc6a00f1c8, fffffffc6c0c75f8, 1, 3)
libdb_java-4.6.so`__memp_alloc+0x1df8(100fc46e0, 100bab1c0, fffffffc82100698, 0, 0, fffffffccbefdeb0)
libdb_java-4.6.so`__memp_fget+0x233c(100ba5e50, 10142eb84, 0, 1, 10142eb78, 0)
libdb_java-4.6.so`__ham_get_cpage+0x33c(101258860, 1, 4, 1, 10142ebc8, 10142ebb0)
libdb_java-4.6.so`__ham_lookup+0x104(101258860, fffffffccbefe770, 0, 1, fffffffccbefe34c, 4c000)
libdb_java-4.6.so`__hamc_get+0x278(101258860, fffffffccbefe770, fffffffccbefe710, 1a, fffffffccbefe34c, 0)
libdb_java-4.6.so`__dbc_get+0x81c(101258860, fffffffccbefe770, fffffffccbefe710, 1a, 66e, fffffffccbefe461)
libdb_java-4.6.so`__db_get+0x1a4(1012b2570, 0, fffffffccbefe770, fffffffccbefe710, 0, 4c000)
libdb_java-4.6.so`__db_get_pp+0x3e0(1012b2570, 0, fffffffccbefe770, fffffffccbefe710, 0, fffffffccbefe700)
libdb_java-4.6.so`Db_get+0x40(1012b2570, 0, fffffffccbefe770, fffffffccbefe710, 0, 0)
libdb_java-4.6.so`Java_com_sleepycat_db_internal_db_1javaJNI_Db_1get+0x128(101b8a9b8, fffffffccbefe8e8, 1012b2570, 0, fffffffccbefe8c8, fffffffccbefe8d0)
0xffffffff78391410(1012b2570, 0, ffffffff11fc9a38, ffffffff11fc9a70, 0, fffffffccbefe101)
0xffffffff78005eac(ffffffff559ee358, b6, fffffffce2f93180, ffffffff78018100, 7124, fffffffccbefe221)
0xffffffff78005eac(ffffffff559ee338, b6, fffffffce2f92f60, ffffffff78017d20, ae1, fffffffccbefe331)
0xffffffff78005e60(ffffffff5fad4f80, b7, fffffffce2ff6ac0, ffffffff78017d28, 66e, fffffffccbefe461)
0xffffffff78005e60(ffffffff5fad4f80, fffffffce150e618, fffffffce2f9dd20, ffffffff78017f60, 5, fffffffccbefe5e1)
0xffffffff780063b8(ffffffff5fad4f20, b7, 0, ffffffff78018200, 1e400, fffffffccbefe701)
0xffffffff78005e60(ffffffff5fad4f20, fffffffce14a5828, 0, ffffffff78017f60, ffffffff11ff0cf0, fffffffccbefe811)
0xffffffff780063b8(ffffffff5f995ce0, fffffffce145f868, 0, ffffffff78017ce0, ffffffff11ff0cf0, fffffffccbefe931)
0xffffffff780063b8(ffffffff5f995d78, b6, 0, ffffffff78018200, ffffffff11ff0cf0, fffffffccbefea51)
0xffffffff78005fdc(fffffffef354e068, fffffffce00427e0, 0, ffffffff78017ce0, 0, fffffffccbefeb71)
0xffffffff78006534(fffffffef354e0f8, b7, 0, ffffffff78018200, 0, fffffffccbefecb1)
0xffffffff78005fdc(fffffffef354e0f8, fffffffce00427e0, 0, ffffffff78017f60, 912c14, fffffffccbefedc1)
0xffffffff78006534(fffffffccbefff50, 60800, 0, ffffffff78018200, fffffffef354e178, fffffffccbefeeb1)
0xffffffff78000240(fffffffccbeff8a0, fffffffccbeffd50, a, fffffffce00441e8, ffffffff7800bda0, fffffffccbeffb48)
libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_+0x1f4(1, 101b8a800, fffffffccbeffb38, a,
ffffffff780001e0, fffffffccbeff870)
libjvm.so`__1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_+0x130(fffffffccbeffd48,
ffffffff7e6ea148, fffffffef354e178, 4c148, fffffffccbeffb38, ffffffffff6f4950)
libjvm.so`__1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_+0x50(fffffffccbeffd48, 100c8f880, 100c8f888,
ffffffff7e71c2b0, ffffffff7e71c798, 101b8a800)
libjvm.so`__1cMthread_entry6FpnKJavaThread_pnGThread__v_+0xf0(fffffffef354e178, 101b8a800, 7dd50, 85fc64, ffffffff7e71bd50, 7dc00)
Edited by: user11287228 on Jun 9, 2011 11:33 AM

Hello,
If you can one thing that might help is rebuilding the Berkeley DB library with enable-debug and enable-diagnostic. With enable-debug we might get line numbers and see exact where we are in __env_alloc_free and see what the input parameters leading up to the abort are. With enable-diagnostic run-time checking is in place which might also help see what is causing this. (see http://download.oracle.com/docs/cd/E17076_02/html/installation/build_unix_conf.html) Another thing to do if you are not using a private environment is to look at the db_stat -E output:
http://download.oracle.com/docs/cd/E17076_02/html/api_reference/C/db_stat.html
Thank you,
Sandra

What is the best way to use Berkeley DB, C or C++ interface ?

Hello,
I'm using C++ interface but much samples, solutions and utils are in C.
What is the best way to use Berkeley DB, C or C++ interface ?
Lets talk a little about this...which is the interface you prefer and why ?
Thanks
DelNeto

Hi DelNeto,
There is a complete documentation set for C, C++ and Java. There are also examples in all 3 languages in the examples directories in your kit.
http://www.oracle.com/technology/documentation/berkeley-db/db/index.html
Ron

The DbEnv memery missing in win7 x64(may be a berkeley'Env bug in x64)

I am a newer programer in Berkeley,this is my first use it.
I create a BDB for png image, about 40gb, the key is used ACE_UINT64, the value is ACE_Message_Block.
I used LoadRunner create 100 user to get the image by my program.
It is correctly in win7 32bit, but it is lost memory in 64bit.
I open the Env with DB_Private | DB_init_pool | DB_thread, and set the cachesize to 1gb, also the DBt of value is set_flags(DB_DBT_MALLOC), also use free(DBt.getdata()).
My server thread's commit memory in taskmgr.exe is keep at 1gb, but the memory in used of system increase never stop, at last all of memey has been used, and my server thread stop at berkeleydb.
I find my used memory is 8gb, my system+loadruner+vs2008 at most 1.5gb, and my server thread keep in 1gb, what the other memory who used?
So I shut down the server thread, all memory came back.
So I change Berkeley DB Storage to Read my image.png direct in file system, the memory is correctly.
So must some wrong in my code to used berkeleydb, must in DBt’ alloc，so how can i free the memory in x64?
So I need helper， what’s the wrong with my DBEnv？How can I free the DBt in 64 bit？
int IMG_Storage_Environment::Initialize( ISF_Profile_Visitor & Profile )
     Env = new DbEnv( 0 );
     int env_flags = DB_CREATE | // If the environment does not exist, create it
          DB_PRIVATE |
          DB_INIT_MPOOL | // Initialize the cache
          DB_THREAD ; // Free-thread the env handle
     if ( Env->set_cachesize( 1, 0, 1) == 0 &&     Env->open( NULL, env_flags, 0 ) == 0 )
          return ERR_SUCCESS;
int IMG_Storage_BerkeleyDB::Initialize( ACE_StrItrT Layer , ACE_StrItrT Path )
     this->db = new Db( IMG_Storage_Environment::Instance()->getDbEnv(), 0 );
     if (
          0 == db->open( NULL, STR_T2A( Path ) , NULL ,DB_UNKNOWN, DB_RDONLY ,NULL)
          ISF_DEBUG( "Open DB: %s Succeed" , Path );
          return ERR_SUCCESS;
int IMG_Storage_BerkeleyDB::GetTile( int x , int y , int z , ACE_Message_Block & Data )
     ACE_UINT64 uKey=this->Key( x, y, z);
     Dbt dbKey(&uKey, sizeof(uKey));
     Dbt dbData;
     dbData.set_flags( DB_DBT_MALLOC );
     int err = db->get(NULL, & dbKey, & dbData, 0);
     if ( 0 == err )
          Data.size( dbData.get_size( ) );
          Data.rd_ptr( Data.base( ) );
          Data.wr_ptr( dbData.get_size( ) );
          ACE_OS::memcpy( Data.rd_ptr( ) , dbData.get_data( ) , dbData.get_size( ) );
     else
          ISF_DEBUG( "Image Not exist, Using Empty Image" , err );
     free(dbData.get_data());
     return ERR_SUCCESS;
Edited by: 886522 on 2011-9-21 上午1:31
Edited by: 886522 on 2011-9-21 上午1:39

I encounter the same problem, although I run Berkeley DB (Ver 6.0.20, C#) under .NET Framework and Windows server 2008(x64). Any BDB application of win32 runs well but will encounter trouble under platform of x64 when compile BDB to x64, even though the DLL compiled and linked with win32. The bug is that Berkeley DB take amount of memory as the size of databases and regardless of cacheSize. My estimation is that all memory for BDB malloced and NOT freed.

Load an existing Berkeley DB file into memory

Dear Experts,
I have created some Berkeley DB (BDB) files onto disk.
I noticed that when I issue key-value retrievals, the page faults are substantial, and the CPU utilization is low.
One sample of the time command line output is as follow:
1.36user 1.45system 0:10.83elapsed 26%CPU (0avgtext+0avgdata 723504maxresident)k
108224inputs+528outputs (581major+76329minor)pagefaults 0swaps
I suspect that the bottleneck is the high frequency of file I/O.
This may be because of page faults of the BDB file, and the pages are loaded in/out of disk fairly frequently.
I wish to explore how to reduce this page fault, and hence expedite the retrieval time.
One way I have read is to load the entire BDB file into main memory.
There are some example programs on docs.oracle.com, under the heading "Writing In-Memory Berkeley DB Applications".
However, I could not get them to work.
I enclosed below my code:
--------------- start of code snippets ---------------
/* Initialize our handles */
DB *dbp = NULL;
DB_ENV *envp = NULL;
DB_MPOOLFILE *mpf = NULL;
const char *db_name = "db.id_url"; // A BDB file on disk, size 66,813,952
u_int32_t open_flags;
/* Create the environment */
db_env_create(&envp, 0);
open_flags =
DB_CREATE | /* Create the environment if it does not exist */
DB_INIT_LOCK | /* Initialize the locking subsystem */
DB_INIT_LOG | /* Initialize the logging subsystem */
DB_INIT_MPOOL | /* Initialize the memory pool (in-memory cache) */
DB_INIT_TXN |
DB_PRIVATE; /* Region files are not backed by the filesystem.
* Instead, they are backed by heap memory. */
* Specify the size of the in-memory cache.
envp->set_cachesize(envp, 0, 70 * 1024 * 1024, 1); // 70 Mbytes, more than the BDB file size of 66,813,952
* Now actually open the environment. Notice that the environment home
* directory is NULL. This is required for an in-memory only application.
envp->open(envp, NULL, open_flags, 0);
/* Open the MPOOL file in the environment. */
envp->memp_fcreate(envp, &mpf, 0);
int pagesize = 4096;
if ((ret = mpf->open(mpf, "db.id_url", 0, 0, pagesize)) != 0) {
envp->err(envp, ret, "DB_MPOOLFILE->open: ");
goto err;
int cnt, hits = 66813952/pagesize;
void *p=0;
for (cnt = 0; cnt < hits; ++cnt) {
db_pgno_t pageno = cnt;
mpf->get(mpf, &pageno, NULL, 0, &p);
fprintf(stderr,"\n\nretrieve %5d pages\n",cnt);
/* Initialize the DB handle */
db_create(&dbp, envp, 0);
* Set the database open flags. Autocommit is used because we are
* transactional.
open_flags = DB_CREATE | DB_AUTO_COMMIT;
dbp->open(dbp, // Pointer to the database
NULL, // Txn pointer
NULL, // File name -- NULL for inmemory
db_name, // Logical db name
DB_BTREE, // Database type (using btree)
open_flags, // Open flags
0); // File mode. defaults is 0
DBT key,data; int test_key=103456;
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));
key.data = (int*)&test_key;
key.size = sizeof(test_key);
dbp->get(dbp, NULL, &key, &data, 0);
printf("%d --> %s ", *((int*)key.data),(char*)data.data );
/* Close our database handle, if it was opened. */
if (dbp != NULL) {
dbp->close(dbp, 0);
if (mpf != NULL) (void)mpf->close(mpf, 0);
/* Close our environment, if it was opened. */
if (envp != NULL) {
envp->close(envp, 0);
/* Final status message and return. */
printf("I'm all done.\n");
--------------- end of code snippets ---------------
After compilation, the code output is:
retrieve 16312 pages
103456 --> (null) I'm all done.
However, the test_key input did not get the correct value retrieval.
I have been reading and trying this for the past 3 days.
I will appreciate any help/tips.
Thank you for your kind attention.
WAN
Singapore

Hi Mike
Thank you for your 3 steps:
-- create the database
-- load the database
-- run you retrievals
Recall that my original intention is to load in an existing BDB file (70Mbytes) completely into memory.
So following your 3 steps above, this is what I did:
Step-1 (create the database)
I have followed the oracle article on http://docs.oracle.com/cd/E17076_02/html/articles/inmemory/C/index.html
In this step, I have created the environment, set the cachesize to be bigger than the BDB file.
However, I have some problem with the code that opens the DB handle.
The code on the oracle page is as follow:
* Open the database. Note that the file name is NULL.
* This forces the database to be stored in the cache only.
* Also note that the database has a name, even though its
* file name is NULL.
ret = dbp->open(dbp, /* Pointer to the database */
NULL, /* Txn pointer */
NULL, /* File name is not specified on purpose */
db_name, /* Logical db name. */
DB_BTREE, /* Database type (using btree) */
db_flags, /* Open flags */
0); /* File mode. Using defaults */
Note that the open(..) API does not include the BDB file name.
The documentation says that this is so that the API will know that it needs an in-memory database.
However, how do I tell the API the source of the existing BDB file from which I wish to load entirely into memory ?
Do I need to create another DB handle (non-in-memory, with a file name as argument) that reads from this BDB file, and then call DB->put(.) that inserts the records into the in-memory DB ?
Step-2 (load the database)
My question in this step-2 is the same as my last question in step-1, on how do I tell the API to load in my existing BDB file into memory?
That is, should I create another DB handle (non-in-memory) that reads from the existing BDB file, use a cursor to read in EVERY key-value pair, and then insert into the in-memory DB?
Am I correct to say that by using the cursor to read in EVERY key-value pair, I am effectively warming the file cache, so that the BDB retrieval performance can be maximized ?
Step-3 (run your retrievals)
Are the retrieval API, e.g. c_get(..), get(..), for the in-memory DB, the same as the file-based DB ?
Thank you and always appreciative for your tips.
WAN
Singapore

What is the best form to control a open of database?

Hi,
what is the best form to open a database?
I´m understood that Environment is unique for all aplication, and I think that it is good to put on Singleton Pattern, but I don´t know the best form to do it with the databases.
I think two forms, the first is create a singleton open database, with this form I don´t will close the database and all application will use it. So, the second form is open a data base at the moment that I will use it, for example, into a DAO when I will do inserts, deletes, etc, and at the final of the operation I close the database.
So, I don´t know what form is the best, and if those forms are corrects, if exist another form to do it I would like to know.

Hugo,
An application can choose to use one or more Environment and Database instances in a single process. In other words, it is the application's choice whether it wants to instantiate a single Environment or Database and share it among multiple threads, or whether it wants to instantiate multiple instances.
Choosing one pattern or another is a function of your application's design, and performance considerations. As you say, you could instantiate a single Database instance and pass it around, or you could open and close the Database every time you get a data record. A third choice is to open a Database instance per thread, and reuse that for the life of the application.
The second choice of opening and closing the Database per data record access is pretty heavyweight from a performance point of view. There is a fair bit of overhead to the initial open of a database, and to the final close. Follow-on opens are not as heavyweight, but do have some cost. For example:
// expensive
thread 1 calls Environment.openDatabase() or new EntityStore()
// less expensive, database is already open, thread 2 is really just getting a handle
// onto the database, but still more expensive than a read or write of a data record
thread 2 calls Environment.openDatabase() or new EntityStore()
The first choice of using a single Database instance in your process will perform much better. If your application has a high level of concurrency, there can be some contention on the Database instance, described in this FAQ - http://www.oracle.com/technology/products/berkeley-db/faq/je_faq.html#32 and you may prefer to try the third option of using a Database instance per thread.
Regards,
Linda

Multiple issues, Berkeley JE 3.2.15, 3.2.76

Hi all,
We have been using JE 3.2.15 with great satisfaction for more than a year, but we've been recently running into what seems to be a deadlock with the following stack trace:
#1) "Thread-12" daemon prio=4 tid=0x0b174320 nid=0x72 in Object.wait() [0x208cd000..0x208cdbb8]
at java.lang.Object.wait(Native Method)
- waiting on <0xf4202fc0> (a com.sleepycat.je.txn.Txn)
at com.sleepycat.je.txn.LockManager.lock(LockManager.java:227)
- locked <0xf4202fc0> (a com.sleepycat.je.txn.Txn)
at com.sleepycat.je.txn.Txn.lockInternal(Txn.java:295)
at com.sleepycat.je.txn.Locker.lock(Locker.java:283)
at com.sleepycat.je.dbi.CursorImpl.lockLNDeletedAllowed(CursorImpl.java:2375)
at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2297)
at com.sleepycat.je.dbi.CursorImpl.searchAndPosition(CursorImpl.java:1983)
at com.sleepycat.je.Cursor.searchInternal(Cursor.java:1188)
at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:1158)
at com.sleepycat.je.Cursor.search(Cursor.java:1024)
at com.sleepycat.je.Database.get(Database.java:557)
The call does not seem to return, not for as long as 22 hours anyway. Or maybe it's just very slow performing a single get (these are done in loops), giving the appearance that it's stuck. It appears more or less randomly (every couple of months, maybe) at various sites. We have not been able to reproduce it on our test systems.
We recently upgraded to 3.2.76, kinda hoping that this issue had been fixed. While the new version appeared to work fine during internal testing and then for 2 deployments, if failed repeatedly during the third deployment. The original issue just manifested itself again (the above stack trace is from 3.2.76). We also experienced the following:
#2) Environment invalid because of previous exception: com.sleepycat.je.log.DbChecksumException: (JE 3.2.76) Read invalid log entry type: 49
at com.sleepycat.je.log.LogEntryHeader.<init>(LogEntryHeader.java:69)
at com.sleepycat.je.log.LogManager.getLogEntryFromLogSource(LogManager.java:631)
at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:597)
at com.sleepycat.je.tree.IN.fetchTarget(IN.java:958)
at com.sleepycat.je.dbi.CursorImpl.searchAndPosition(CursorImpl.java:1963)
at com.sleepycat.je.Cursor.searchInternal(Cursor.java:1188)
at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:1158)
at com.sleepycat.je.Cursor.search(Cursor.java:1024)
at com.sleepycat.je.Database.get(Database.java:557)
#3) <DaemonThread name="Checkpointer"/> caught exception: com.sleepycat.je.DatabaseException: (JE 3.2.76) fetchTarget of 0x61d/0xa2d452 parent IN=147872417 lastFullVersion=0x69c/0x27e251 parent.getDirty()=true state=0 com.sleepycat.je.log.DbChecksumException: (JE 3.2.76) Read invalid log entry type: 58
com.sleepycat.je.DatabaseException: (JE 3.2.76) fetchTarget of 0x61d/0xa2d452 parent IN=147872417 lastFullVersion=0x69c/0x27e251 parent.getDirty()=true state=0 com.sleepycat.je.log.DbChecksumException: (JE 3.2.76) Read invalid log entry type: 58
at com.sleepycat.je.tree.IN.fetchTarget(IN.java:989)
at com.sleepycat.je.cleaner.Cleaner.migrateLN(Cleaner.java:1100)
at com.sleepycat.je.cleaner.Cleaner.lazyMigrateLNs(Cleaner.java:928)
at com.sleepycat.je.tree.BIN.logInternal(BIN.java:1117)
at com.sleepycat.je.tree.IN.log(IN.java:2657)
at com.sleepycat.je.recovery.Checkpointer.logTargetAndUpdateParent(Checkpointer.java:975)
at com.sleepycat.je.recovery.Checkpointer.flushIN(Checkpointer.java:810)
at com.sleepycat.je.recovery.Checkpointer.flushDirtyNodes(Checkpointer.java:670)
at com.sleepycat.je.recovery.Checkpointer.doCheckpoint(Checkpointer.java:442)
at com.sleepycat.je.recovery.Checkpointer.onWakeup(Checkpointer.java:211)
at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:191)
at java.lang.Thread.run(Thread.java:595)
Caused by: com.sleepycat.je.log.DbChecksumException: (JE 3.2.76) Read invalid log entry type: 58
at com.sleepycat.je.log.LogEntryHeader.<init>(LogEntryHeader.java:69)
at com.sleepycat.je.log.LogManager.getLogEntryFromLogSource(LogManager.java:631)
at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:597)
at com.sleepycat.je.tree.IN.fetchTarget(IN.java:958)
... 11 more
* An out of memory condition:
#4) Exception in thread "Cleaner-1" java.lang.OutOfMemoryError: Java heap space
at com.sleepycat.je.log.LogUtils.readByteArray(LogUtils.java:204)
at com.sleepycat.je.log.entry.LNLogEntry.readEntry(LNLogEntry.java:104)
at com.sleepycat.je.log.FileReader.readEntry(FileReader.java:238)
at com.sleepycat.je.log.CleanerFileReader.processEntry(CleanerFileReader.java:140)
at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:321)
at com.sleepycat.je.cleaner.FileProcessor.processFile(FileProcessor.java:411)
at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:259)
at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:161)
at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:191)
at java.lang.Thread.run(Thread.java:595)
It is possible, of course, that the actual memory issue is somewhere else in the application and Berkeley just happened to be the one getting the error. However the load is very much 24 hours-periodic and the most significant change was the Berkeley upgrade, which does not seem to be running entirely correctly...
I was wondering:
- is issue #1 indeed a deadlock? Has anyone else experienced it? Is there a fix/workaround?
- are #2 and #3 known issues? Note that these happened both while processing 3.2.15 files and, after failure and deletion, pure 3.2.76 files.
- is it possible that an invalid log entry would cause #4? Or is there a memory leak/incorrect allocation in BDB?
If that helps, the standard installation is on Sun x86 hardware running Solaris and (Sun) Java 1.5.0_11. Berkeley is run as an XA resource. Like I said we haven't seen any of these issues on our internal test systems. The customer that experienced #2, #3 and #4 has since been rolled back to 3.2.15, and #1 is rather infrequent. So that's pretty much all the information we have and are going to get, but if there's anything else you'd like to know...
Thanks in advance,
Matthieu Bentot

Hello Matthieu,
Thank you for your clear explanation.
On #1, the deadlock, I don't see any known JE bug that could cause this, in the current release or that has been fixed since 3.2.15. The thread dump you sent implies that a thread is waiting to acquire a record lock, which means that some other thread or transaction holds the record lock at that time. There are a few things that come to mind:
1) If you are setting the lock timeout to a large value, or to zero which means "wait forever", this kind of problem could occur as the result of normal record lock contention. I assume you're not doing this or you would have mentioned it, but I thought I should ask. Are you calling EnvironmentConfig or TransactionConfig.setLockTimeout, or setting the lock timeout via the je.properties file?
2) Are you performing retries when DeadlockException is thrown (as suggested in our documentation)? By retry I mean that when you catch DeadlockException, you close any open cursors, abort the transaction (if you're using transactions), and then start the operation or transaction from the beginning. One possibility is that two threads are continuously retrying, both trying to access the same record(s). If so, one possible solution is to delay for a small time interval before retrying, to give other threads a chance to finish their operation or transaction.
3) The most common reason for the problem you're seeing is that a cursor or transaction is accidentally left open. A cursor or transaction holds record locks until it is closed (or committed or aborted in the case of a transaction). If you have a cursor or transaction that has "leaked" -- been discarded without being closed -- or you have a thread that keeps a cursor or transaction open for a long period, another thread trying to access the record will be unable to lock it. Normally, DeadlockException will be thrown in one thread or the other. But if you are setting the lock timeout (1) or retrying operations (2), this could cause the problem you're seeing. Are you seeing DeadlockException?
If you can describe more about how you handle DeadlockException, send your environment, database, cursor and transaction configuration settings, and describe how you close cursors and commit/abort transactions, hopefully we will be able to narrow this down.
Your #2 and #3 are the same problem, which is very likely a known problem that we're currently focussed on. It is interesting that you never saw this problem in 3.2.15 -- that could be a clue for us.
When you started seeing the problem, were there any changes other than moving to JE 3.2.76? Did you start using a network file system of some kind? Or use different hardware that provides more concurrency? Or update your application to perform with more concurrency?
You mentioned that this occurred in a deployment but not in your test environment. If you have the JE log files from that deployment and/or you can reproduce the problem there, we would very much like to work with you to resolve this. If so, please send me an email -- mark.hayes at the obvious .com -- and I'll ask you to send log files and/or run with a debugging JE jar file when trying to reproduce the problem.
It is possible that #4 is a side effect of the earlier problems, but it's very difficult to say for sure. I don't see any known bugs since 3.2.76 that could cause an OutOfMemoryError.
Please send us more information on #1 as I mentioned above, and send me email on the checksum problem (#2 and #3).
Thanks,
--mark

Berkeley DB XML as Message Store

Hi,
I need to build a 'Message Store' (as referred to in 'Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions') for our SOA. Components of the SOA are coupled by JMS / sockets and send XML messages to each other. The central message store would consume all messages and so provide a message history and allow comparison of messages for support, performance measurement, troubleshooting, etc.
One approach which occured to me would be lightweight Java framework on top of Berkeley DB XML, which would handle connectivity to all integration points.
All of our messages are small (< 5K) but some types are high frequency (say 2K/s).
Can anyone comment on this approach pls or share experiences pls?
Many thanks!
Pete

The short answer is that you need to shut down the FastCGI Perl script before you copy the files over, and then re-start it once the file copy is complete. Basically, what's happening is that FastCGI Perl script has cached data in file-system backed shared memory. When you replace the underlying files via an file system copy "under the covers" so to speak, the in-memory and on-disk data becomes out of sync. Subsequent access to the repository can fail in many ways, a core dump is often going to be the result.
If you don't want to shut down the FastCGI Perl script during the copy, there are several other options that you could consider.
1) Copy the new files into a new/alternate directory location, stop/re-start the FastCGI Perl script when the copy is complete, pointing to the new location. That minimizes the "downtime" or,
2) Delete and insert documents one at a time through the API or,
3) Update documents one at a time through the API or,
4) Truncate the container and insert the documents.
Options 2, 3 & 4 will allow the FastCGI Perl script to keep on running, but will probably take more time and may result in larger database files.
Replication really won't help much if your goal is to completely replace the repository once every 24 hours. The same action would have to be applied to the master and replicated to the other repository locations. The same action of "replace everything" is still occurring.
I hope that this helps.
Regards,
Dave

Berkeley db fatal region error detected run recovery

Hi,
I have initiated d Berkeley DB object.
Then, I am using multithreading to put data into the DB.
Here is how I open the one data handler.
ret = dbp->open(dbp, /* Pointer to the database */
NULL, /* Txn pointer */
db_file_name, /* File name */
db_logical_name , /* Logical db name (unneeded) */
DB_BTREE, /* Database type (using btree) */
DB_CREATE, /* Open flags */
0644); /* File mode. Using defaults */
each threads would put data into the same handler when it needs to.
I am getting "berkeley db fatal region error detected run recovery".
What is the problem? Does it have anything to do with the way the handler is created or whether multiple threads can put data into the DB?
jb

Hi jb,
user8712854 wrote:
I am getting "berkeley db fatal region error detected run recovery".This is a generic Berkeley DB panic message that means something went wrong. When do you get this error? Did you enable the verbose error messages? Are there any other warning/error messages reported? By just posting the way you open the database doesn't help much at all. I reviewed the other questions you asked on the forum, and it seems that you are not following the advices and consult the documentation, but you prefer to open new threads. I advice you to first work on configuring Berkeley DB for your needs and read the following pages in order to understand which Berkeley DB Product best suits your application and how it should be configured for a multithreaded application:
[The Berkeley DB products|http://www.oracle.com/technology/documentation/berkeley-db/db/ref/intro/products.html]
[Concurrent Data Store introduction|http://www.oracle.com/technology/documentation/berkeley-db/db/ref/cam/intro.html]
[Multithreaded applications|http://www.oracle.com/technology/documentation/berkeley-db/db/ref/program/mt.html]
[Berkeley DB handles|http://www.oracle.com/technology/documentation/berkeley-db/db/ref/program/scope.html]
On the other hand, if the code that's calling the Berkeley DB APIs is not too big and is not private, you can post it here so we can review it for you and let you know what's wrong with it.
The procedures you should follow in order to fix a "run recovery" error are described here:
[Recovery procedures|http://www.oracle.com/technology/documentation/berkeley-db/db/ref/transapp/recovery.html]
Thanks,
Bogdan Coman

Berkeley DB C++ query on floating index

Im using Berkeley DB C++ API 6.0 on OSX. My application creates a database with the following tables:
Primary table: (int, myStruct) -> myStruct is a buffer.
Secondary index: (float, myStruct) -> The float key is an information that I retrieve in myStruct buffer with the following callback.
int meanExtractor(Db *sdbp, const Dbt *pkey, const Dbt *pdata, Dbt *skey) { Dbt data = *pdata; feature<float> f; restoreDescriptor(f, data); void* mean = malloc( sizeof(float) ); memcpy( mean, &f.mean, sizeof(float) ); skey->set_data(mean); skey->set_size(sizeof(float)); skey->set_flags( DB_DBT_APPMALLOC ); return 0; }
When I iterate over the secondary index and print the key/data pairs, the float keys are well stored. My problem is that I can't query this table. I would like to execute this SQL Query for example:
SELECT * FROM secondary index WHERE keys > 1.5 && keys < 3.4
My table is filled by 50000 keys between 0.001 and 49.999. The thing is when I use this method for example:
I assume the Db and the table are already opened float i = 0.05; Dbt key = Dbt(&i, sizeof(float)); Dbc* dbc; db->cursor( txn, &dbc, 0 ); int ret; ret = dbc->get( key, &vald, DB_SET_RANGE));
Its retrieved this key: 0.275. It should retrieve 0.05 (because it exists) or at least 0.051. And for any other floating value in the Dbt key, it gives me some stupid values. If I put the DB_SET flag, it just doesn't find any keys. My idea was to set the cursor to the smallest key greater than or equal to my key, and then to iterate with the flag DB_NEXT until I reach the end of my range.
This must come from the searching algorithm of BerkeleyDB but I saw some (usefull but not enough) examples that do exactly what I need but with the Java API, so it proves that is possible to do...
I'm pretty stucked with this one, so if anybody already had this problem before, thx for helping me. I can put other parts of my code if necessary.

Hi,
Since the default byte-comparison does not reflect the sort order of float numbers, have you set the bt_compare function for your secondary database ? From your description, your query relies on the correct float number sort order, so I think you should set a custom bt_compare function.
Also, as you do not do exact search, and just do range search, DBC->get(DB_SET) does not work for you. I think you need to use DB_SET_RANGE flag to get the nearest(just >=) item. You can check the documentation for DBC(or Dbc in C++) for more information.
Regards,
Winter, Oracle Berkeley DB

Berkeley DB and Tuxedo

Dear all,
I am trying to set up Berkeley DB from Sleepycat Software (an open
source database implementation) as a backend database for Tuxedo with
X/Open transaction support on a HP-UX 11 System. According to the
documentation, this should work. I have successfully compiled and
started the resource manager (called DBRM) and from the logs
everything looks fine.
The trouble starts, however, when I try to start services that use
DBRM. The startup call for opening the database enviroment ("database
enviroment" is a Berkeley DB specific term that refers to a grouping
of files that are opened together with transaction support) fails with
the error message
error: 12 (Not enough space)
Some digging in the documentation for Berkeley DB reveals the
following OS specific snippet (DBENV->open is the function call that
causes the error message above):
<quote>
An ENOMEM error is returned from DBENV->open or DBENV->remove.
Due to the constraints of the PA-RISC memory architecture, HP-UX
does not allow a process to map a file into its address space
multiple times. For this reason, each Berkeley DB environment may
be opened only once by a process on HP-UX, i.e., calls to
DBENV->open will fail if the specified Berkeley DB environment
has been opened and not subsequently closed.
</quote>
OK. So it appears that a call to DBENV->open does a mmap and that
cannot happen twice on the same file in the same process. Looking at
the source for the resource manager DBRM it appears, that there is
indeed a Berkeley DB enviroment that is opened (once), otherwise
transactions would not work. A ps -l on the machine in question looks
like this (I have snipped a couple of columns to fit into a newsreader):
UID PID PPID C PRI NI ADDR SZ TIME COMD
101 29791 1 0 155 20 1017d2c00 84 0:00 DBRM
101 29787 1 0 155 20 10155bb00 81 0:00 TMS_QM
101 29786 1 0 155 20 106d54400 81 0:00 TMS_QM
101 29790 1 0 155 20 100ed2200 84 0:00 DBRM
0 6742 775 0 154 20 1016e3f00 34 0:00 telnetd
101 29858 6743 2 178 20 100ef3900 29 0:00 ps
101 29788 1 0 155 20 100dfc500 81 0:00 TMS_QM
101 29789 1 0 155 20 1024c8c00 84 0:00 DBRM
101 29785 1 0 155 20 1010d7e00 253 0:00 BBL
101 6743 6742 0 158 20 1017d2e00 222 0:00 bash
So every DBRM is started as its own process and the service process
(which does not appear above) would be its own process as well. So how
can it happen that mmap on the same file is called twice in the same
process? What exactly does tmboot do in terms of startup code? Is it
just a couple of fork/execs or there more involved?
Thanks for any suggestions,
Joerg Lenneis
email: [email protected]

Peter Holditch:
Joerg,
Comments in-line.
Joerg Lenneis wrote:[snip]
I have no experience of Berkley DB. Normally the xa_open routine provided by
your database, and called by tx_open, will connect the server process itself to
the database. What that means is database specific. I expect in the case of
Berkley DB, it has done the mmap for you. I guess the open parameters in your
code above are also in your OPENINFO string in the Tuxedo ubbconfig file?
It does not sound to me like you have a problem.Fortunately, I do not any more. Your comments and looking at the
source for the xa interface have put me on the right track. What I did
not realise is that (as you point out in the praragraph above) a
Tuxedo service process that uses a resource manager gets the
following structure linked in:
const struct xa_switch_t db_xa_switch = {
"Berkeley DB", /* name[RMNAMESZ] */
TMNOMIGRATE, /* flags */
0, /* version */
__db_xa_open, /* xa_open_entry */
__db_xa_close, /* xa_close_entry */
__db_xa_start, /* xa_start_entry */
__db_xa_end, /* xa_end_entry */
__db_xa_rollback, /* xa_rollback_entry */
__db_xa_prepare, /* xa_prepare_entry */
__db_xa_commit, /* xa_commit_entry */
__db_xa_recover, /* xa_recover_entry */
__db_xa_forget, /* xa_forget_entry */
__db_xa_complete /* xa_complete_entry */
This is database specific, of course, so it would look different for,
say, Oracle. The entries in that structure are pointers to various
functions which are called by Tuxedo on behalf of the server process
on startup and whenever transaction management is necessary. xa_open
does indeed open the database, which means opening an enviroment
with a mmap somwhere in the case of Berkeley DB. In my code I then
tried to open the enviroment again (you are right, the OPENINFO string
is the same in ubbconfig as in my code) which led to the error message
posted in my initial message.
I had previously thougt that the service process would contact the
resource manager via some IPC mechanism for opening the database.
>>
>>
If I am mistaken, then things look a bit dire. Provided that this is
even the correct thing to do I could move the tx_open() after the call
to env->open, but this would still mean there are two mmaps in the
same process. I also need both calls to i) initiate the transaction
subsystem and ii) get hold of the pointer DB_ENV *env which is the
handle for all subsequent DB access.
In the case or servers using OCI to access Oracle, there is an OCI API that
allows a connection established through xa to be associated with an OCI
connection endpoint. I suspect there is an equivalent function provided by
Berkley DB?There is not, but see my comments below about how to get to the
Berkeley DB enviroment.
[snip]
I doubt it. xa works because xa routines are called in the same thread as the
data access routines. Typically, a server thread will run like this...
xa_start(Tuxedo Transaction ID) /* this is done by the Tux. service dispatcher
before your code is executed */
manipulate_data(whatever parameters necessary) /* this is the code you wrote in
your service routine */
xa_end() /* Tuxedo calls this after your service calls tpreturn or tpforward */
The association between the Tuxedo Transaction ID and the data manipulation is
made by the database because of this calling sequence.OK, this makes sense. Good to know this as well ...
[snip]
For somebody else trying this, here is the correct way:
==================================================
int
tpsvrinit(int argc, char *argv[])
int ret;
if (tpopen() < 0)
     userlog("error tpopen");
userlog("startup, opening database\n");
if (ret = db_create(&dbp, NULL, DB_XA_CREATE)) {
     userlog("error %i db_create: %s", ret, db_strerror(ret));
     return -1;
if (ret = dbp->open(dbp, "sometablename", NULL, DB_BTREE, DB_CREATE, 0644)) {
     userlog("error %i db->open", ret);
     return -1;
return(0);
==================================================
What happens is that the call to the xa_open() function implicitly
opens the Berkely DB enviroment for the database in question, which is
given in the OPENINFO string in the configuration file. It is an error
to specify the enviroment in the call to db_create() in such a
context. All calls to change the database do not need an enviroment
specified and the calls to begin/commit/abort transactions that are
normally used by Berkeley DB which use the enviroment are superseded
by tpopen(), tpclose() and friends. It would be an error to use those
calls anyway.
Thank you very much Peter for your comments which have helped a lot.
Joerg Lenneis
email: [email protected]

Berkeley DB XML crash with multiple readers (dbxml-2.5.16 and db-4.8.26)

I am using Berkeley DB XML (v. 2.5.16 and the bundled underlying Berkeley DB 4.8.26, which I suppose is now fairly old) to manage an XML database which is read by a large number (order 100) of independent worker processes communicating via MPI. These processes only read from the database; a single master process performs writes.
Everything works as expected with one or two worker processes. But with three or more, I am experiencing database panics with the error
pthread lock failed: Invalid argument
PANIC: Invalid argument
From searching with Google I can see that issues arising from incorrectly setting up the environment to support concurrency are are fairly common. But I have not been able to find a match for this problem, and as far as I can make out from the documentation I am using the correct combination of flags; I use DB_REGISTER and DB_RECOVER to handle the fact that multiple processes join the environment independently. Each process uses on a single environment handle, and joins using
DB_ENV* env;
db_env_create(&env, 0);
u_int32_t env_flags = DB_INIT_LOG | DB_INIT_MPOOL | DB_REGISTER | DB_RECOVER | DB_INIT_TXN | DB_CREATE;
env->open(env, path to environment, env_flags, 0);
Although the environment requests DB_INIT_TXN, I am not currently using transactions. There is an intention to implement this later, but my understanding was that concurrent reads would function correctly without the full transaction infrastructure.
All workers seem to join the environment correctly, but then fail when an attempt is made to read from the database. They will all try to access the same XML document in the same container (because it gives them instructions about what work to perform). However, the worker processes open each container setting the read-only flag:
DbXml::XmlContainerConfig models_config;
models_config.setReadOnly(true);
DbXml::XmlContainer models = this->mgr->openContainer(path to container, models_config);
Following the database panic, the stack trace is
[lcd-ds283:27730] [ 0] 2   libsystem_platform.dylib            0x00007fff8eed35aa _sigtramp + 26
[lcd-ds283:27730] [ 1] 3   ???                                 0x0000000000000000 0x0 + 0
[lcd-ds283:27730] [ 2] 4   libsystem_c.dylib                   0x00007fff87890bba abort + 125
[lcd-ds283:27730] [ 3] 5   libc++abi.dylib                     0x00007fff83aff141 __cxa_bad_cast + 0
[lcd-ds283:27730] [ 4] 6   libc++abi.dylib                     0x00007fff83b24aa4 _ZL25default_terminate_handlerv + 240
[lcd-ds283:27730] [ 5] 7   libobjc.A.dylib                     0x00007fff89ac0322 _ZL15_objc_terminatev + 124
[lcd-ds283:27730] [ 6] 8   libc++abi.dylib                     0x00007fff83b223e1 _ZSt11__terminatePFvvE + 8
[lcd-ds283:27730] [ 7] 9   libc++abi.dylib                     0x00007fff83b21e6b _ZN10__cxxabiv1L22exception_cleanup_funcE19_Unwind_Reason_CodeP17_Unwind_Exception + 0
[lcd-ds283:27730] [ 8] 10 libdbxml-2.5.dylib                  0x000000010f30e4de _ZN5DbXml18DictionaryDatabaseC2EP8__db_envPNS_11TransactionERKNSt3__112basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEERKNS_15ContainerConfigEb + 1038
[lcd-ds283:27730] [ 9] 11 libdbxml-2.5.dylib                  0x000000010f2f348c _ZN5DbXml9Container12openInternalEPNS_11TransactionERKNS_15ContainerConfigEb + 1068
[lcd-ds283:27730] [10] 12 libdbxml-2.5.dylib                  0x000000010f2f2dec _ZN5DbXml9ContainerC2ERNS_7ManagerERKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEEPNS_11TransactionERKNS_15ContainerConfigEb + 492
[lcd-ds283:27730] [11] 13 libdbxml-2.5.dylib                  0x000000010f32a0af _ZN5DbXml7Manager14ContainerStore13findContainerERS0_RKNSt3__112basic_stringIcNS3_11char_traitsIcEENS3_9allocatorIcEEEEPNS_11TransactionERKNS_15ContainerConfigEb + 175
[lcd-ds283:27730] [12] 14 libdbxml-2.5.dylib                  0x000000010f329f75 _ZN5DbXml7Manager13openContainerERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEPNS_11TransactionERKNS_15ContainerConfigEb + 101
[lcd-ds283:27730] [13] 15 libdbxml-2.5.dylib                  0x000000010f34cd46 _ZN5DbXml10XmlManager13openContainerERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEERKNS_18XmlContainerConfigE + 102
Can I ask if it's clear to anyone what I am doing wrong?

Is it possible that the root problem to this is in the MPI code or usage? Because if the writer process crashes while holding an active transaction or open database handles, it could leave the environment in an inconsistent state that would result in the readers throwing a PANIC error when they notice the inconsistent environment.
Thanks for looking into this.
It looks like there was a small typo in the code I quoted, and I think it was this which caused the segmentation fault or memory corruption. Although I checked a few times that the code snippet produced expected results before posting it, I must have been unlucky that it just happened not to cause a segfault on those attempts.
This is a corrected version:
#include <iostream>
#include <vector>
#include "dbxml/db.h"
#include "dbxml/dbxml/DbXml.hpp"
#include "boost/mpi.hpp"
static std::string envname = std::string("test");
static std::string pkgname = std::string("packages.dbxml");
static std::string intname = std::string("integrations.dbxml");
int main(int argc, char *argv[])
    boost::mpi::environment mpi_env;
    boost::mpi::communicator mpi_world;
    if(mpi_world.rank() == 0)
        std::cerr << "-- Writer creating environment" << std::endl;
        DB_ENV *env;
        int dberr = ::db_env_create(&env, 0);
        std::cerr << "**   creation response = " << dberr << std::endl;
        if(dberr > 0) std::cerr << "**   " << ::db_strerror(dberr) << std::endl;
        std::cerr << "-- Writer opening environment" << std::endl;
        u_int32_t env_flags = DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_REGISTER | DB_RECOVER | DB_INIT_TXN | DB_CREATE;
        dberr = env->open(env, envname.c_str(), env_flags, 0);
        std::cerr << "**   opening response = " << dberr << std::endl;
        if(dberr > 0) std::cerr << "**   " << ::db_strerror(dberr) << std::endl;
        // set up XmlManager object
        DbXml::XmlManager *mgr = new DbXml::XmlManager(env, DbXml::DBXML_ADOPT_DBENV | DbXml::DBXML_ALLOW_EXTERNAL_ACCESS);
        // create containers - these will be used by the workers
        DbXml::XmlContainerConfig pkg_config;
        DbXml::XmlContainerConfig int_config;
        pkg_config.setTransactional(true);
        int_config.setTransactional(true);
        std::cerr << "-- Writer creating containers" << std::endl;
        DbXml::XmlContainer packages       = mgr->createContainer(pkgname.c_str(), pkg_config);
        DbXml::XmlContainer integrations   = mgr->createContainer(intname.c_str(), int_config);
        std::cerr << "-- Writer instructing workers" << std::endl;
        std::vector<boost::mpi::request> reqs(mpi_world.size() - 1);
        for(unsigned int                 i = 1; i < mpi_world.size(); i++)
            reqs[i - 1] = mpi_world.isend(i, 0); // instruct workers to open the environment
        // wait for all messages to be received
        boost::mpi::wait_all(reqs.begin(), reqs.end());
        std::cerr << "-- Writer waiting for termination responses" << std::endl;
        // wait for workers to advise successful termination
        unsigned int outstanding_workers = mpi_world.size() - 1;
        while(outstanding_workers > 0)
            boost::mpi::status stat = mpi_world.probe();
            switch(stat.tag())
                case 1:
                    mpi_world.recv(stat.source(), 1);
                    outstanding_workers--;
                    break;
        delete mgr; // exit, closing database and environment
    else
        mpi_world.recv(0, 0);
        std::cerr << "++ Reader " << mpi_world.rank() << " beginning work" << std::endl;
        DB_ENV *env;
        ::db_env_create(&env, 0);
        u_int32_t env_flags = DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_REGISTER | DB_RECOVER | DB_INIT_TXN | DB_CREATE;
        env->open(env, envname.c_str(), env_flags, 0);
        // set up XmlManager object
        DbXml::XmlManager *mgr = new DbXml::XmlManager(env, DbXml::DBXML_ADOPT_DBENV | DbXml::DBXML_ALLOW_EXTERNAL_ACCESS);
        // open containers which were set up by the master
        DbXml::XmlContainerConfig pkg_config;
        DbXml::XmlContainerConfig int_config;
        pkg_config.setTransactional(true);
        pkg_config.setReadOnly(true);
        int_config.setTransactional(true);
        int_config.setReadOnly(true);
        DbXml::XmlContainer packages     = mgr->openContainer(pkgname.c_str(), pkg_config);
        DbXml::XmlContainer integrations = mgr->openContainer(intname.c_str(), int_config);
        mpi_world.isend(0, 1);
        delete mgr; // exit, closing database and environment
    return (EXIT_SUCCESS);
This repeatably causes the crash on OS X Mavericks 10.9.1. Also, I have checked that it repeatably causes the crash on a virtualized OS X Mountain Lion 10.8.5. But I do not see any crashes on a virtualized Ubuntu 13.10. My full code likewise works as expected with a large number of readers under the virtualized Ubuntu. I am compiling with clang and libc++ on OS X, and gcc 4.8.1 and libstdc++ on Ubuntu, but using openmpi in both cases. Edit: I have also compiled with clang and libc++ on Ubuntu, and it works equally well.
Because the virtualized OS X experiences the crash, I hope the fact that it works on Ubuntu is not just an artefact of virtualization. (Unfortunately I don't currently have a physical Linux machine with which to check.) In that case the implication would seem to be that it's an OS X-specific problem. 2nd edit (14 Feb 2014): I have now managed to test on a physical Linux cluster, and it appears to work as expected. Therefore it does appear to be an OS X-specific issue.
In either OS X 10.8 or 10.9, the crash produces this result:
-- Writer creating environment
**   creation response = 0
-- Writer opening environment
**   opening response = 0
-- Writer creating containers
++ Reader 7 beginning work
-- Writer instructing workers
-- Writer waiting for termination responses
++ Reader 1 beginning work
++ Reader 2 beginning work
++ Reader 3 beginning work
++ Reader 4 beginning work
++ Reader 5 beginning work
++ Reader 6 beginning work
pthread lock failed: Invalid argument
PANIC: Invalid argument
PANIC: fatal region error detected; run recovery
PANIC: fatal region error detected; run recovery
PANIC: fatal region error detected; run recovery
PANIC: fatal region error detected; run recovery
PANIC: fatal region error detected; run recovery
PANIC: fatal region error detected; run recovery
PANIC: fatal region error detected; run recovery
libc++abi.dylib: terminate called throwing an exception
[mountainlion-test-rig:00319] *** Process received signal ***
[mountainlion-test-rig:00319] Signal: Abort trap: 6 (6)
[mountainlion-test-rig:00319] Signal code: (0)
David
Message was edited by: ds283

Why multiple log files are created while using transaction in berkeley db

we are using berkeleydb java edition db base api, we have already read/write CDRFile of 9 lack rows with transaction and
without transaction implementing secondary database concept the issues we are getting are as follows:-
with transaction----------size of database environment 1.63gb which is due to no. of log files created each of 10 mb.
without transaction-------size of database environment 588mb and here only one log file is created which is of 10mb. so we want to know how REASON CONCRETE CONCLUSION ..
how log files are created and what is meant of using transaction and not using transaction in db environment and what are this db files db.001,db.002,_db.003,_db.004,__db.005 and log files like log.0000000001.....plz reply soon

we are using berkeleydb java edition db base api, If you are seeing __db.NNN files in your environment root directory, these are environment's shared region files. And since you see these you are using Berkeley DB Core (with the Java/JNI Base API), not Berkeley DB Java Edition.
with transaction ...
without transaction ...First of all, do you need transactions or not? Review the documentation section called "Why transactions?" in the Berkeley DB Programmer's Reference Guide.
without transaction-------size of database environment 588mb and here only one log file is created which is of 10mb.There should be no logs created when transactions are not used. That single log file has likely remained there from the previous transactional run.
how log files are created and what is meant of using transaction and not using transaction in db environment and what are this db files db.001,db.002,_db.003,_db.004,__db.005 and log files like log.0000000001Have you reviewed the basic documentations references for Berkeley DB Core?
- Berkeley DB Programmer's Reference Guide
in particular sections: The Berkeley DB products, Shared memory regions, Chapter 11. Berkeley DB Transactional Data Store Applications, Chapter 17. The Logging Subsystem.
- Getting Started with Berkeley DB (Java API Guide) and Getting Started with Berkeley DB Transaction Processing (Java API Guide).
If so, you would have had the answers to these questions; the __db.NNN files are the environment shared region files needed by the environment's subsystems (transaction, locking, logging, memory pool buffer, mutexes), and the log.MMMMMMMMMM are the log files needed for recoverability and created when running with transactions.
--Andrei

What is Berkeley DB?

Similar Messages

Maybe you are looking for