How to test "Split brain"

Hi ,
I tried to demonstrate the automatic reboot at obnormal cluster
situations but i failed:
Setup:
IAS1, IAS2 and IAS3 running on their own Sun with Solaris 8
Load balancing: round robin
Number of sync backups: 1
restart at abnormal cluster on
I deployed The HaServlet and everything went fine, session data was
distrubuted
I brought down IAS3, which was in Sync alternate role, unplugged the
network cable of the machine on which IAS3 was running and started IAS3
again.
IAS3 came up as the a primary, because it could not detect the other
primary.
So far, according to what I understand, everything OK
After plugging the network cable back in no ias restarts occurred. The
round robin load balancing worked again and as a result of 2 primary
servers everytime i visisted ias3 my session data was lost.
Question: Did i miss someting in my "Split Brain" experiment??
Version SP3 on Solaris 8, only HaServlet deployed and working nicely
DSync distributed before the experiment.
After restarting ias3 it came up as alternate again, as expected and
everything worked as before.
Any hints are welcome, thanks
Robert
Robert Schrijvers
Javix Training & Software development
e-mail: [email protected]
website: www.javix.nl
phone +31 (0) 629594749

Just a quick question before we go any further, did you have the option to restart cluster in case of an abnormal cluster detected turned on? Sorry but I needed to ask.

Similar Messages

  • How to handle split brain problem in BDB?

    In case network between master and replica fails, there will be two master.
    It is a remote replica.
    How to handle the problem? Thanks.

    You should post this in the Berkeley DB High Availability forum: Berkeley DB High Availability (Replication)

  • How do i test split by value functionality in mesage mapping with multiple

    how do i test split by value functionality in mesage mapping with multiple values ?
    regards,
    venkat

    repeat your source node. in mapping editor you can view queues by right clicking to mapped element.. selecting Display Queues option.. this will show u your values .
    You can also select this Display Queue option for splitByValue option
    for example
    source--->splitByValue>target
    Try viewing your queues to each this step... for splitByValue in display Queue you will see context inserted(grey colour) accodingly

  • Election problem after repeated split-brains with two nodes

    Hi
    I'm using a customized source based on BDB-5.1.19 (excxx_repquote)
    with two site one - MASTER and the other SLAVE...
    nsite=2
    ack=quorum
    - the master is writing to quotedb at a rate of 10 txn per sec
    - the test consist to isolate the client from the master (split brain) and reconnect it after a random time include from 1sec to 10sec
    the test run well about 10 times but at a moment the process slave receive DB_EVENT_REP_ELECTION_FAILED
    and the master enter in election mode and never exit from the CLIENT mode. I must say that to freeze the client I decide to kill me (kill -9 my pid) when I receive such event...
    here is the verbose log on the master...
    [1307872770:871621][6510/47655809107168] MASTER: rep_send_function returned: 110
    [1307872770:973655][6510/47655809107168] MASTER: bulk_msg: Send buffer after copy due to PERM
    [1307872770:973667][6510/47655809107168] MASTER: send_bulk: Send 266 (0x10a) bulk buffer bytes
    [1307872770:973672][6510/47655809107168] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type bulk_log, LSN [21][986648] perm
    [1307872770:973693][6510/47655809107168] MASTER: will await acknowledgement: need 1
    [1307872771:26623][6510/47655809107168] MASTER: rep_send_function returned: 110
    [1307872771:126380][6510/1162996032] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type log, LSN [21][946345]
    [1307872771:126407][6510/1162996032] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type dupmaster, LSN [0][0] nobuf
    [1307872771:126695][6510/1162996032] MASTER: rep_start: Found old version log 17
    [1307872771:126753][6510/1162996032] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type newclient, LSN [0][0] nobuf
    [1307872771:126833][6510/1183975744] CLIENT: starting election thread
    [1307872771:126876][6510/1183975744] CLIENT: Start election nsites 2, ack 1, priority 100
    [1307872771:126890][6510/1183975744] CLIENT: Election thread owns egen 69
    [1307872771:127423][6510/1173485888] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type newclient, LSN [0][0]
    [1307872771:130079][6510/1183975744] CLIENT: Tallying VOTE1[0] (2147483647, 69)
    [1307872771:130113][6510/1183975744] CLIENT: Beginning an election
    [1307872771:130134][6510/1183975744] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type vote1, LSN [21][986728] nobuf
    [1307872771:130147][6510/1173485888] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type master_req, LSN [0][0] nobuf
    [1307872771:130438][6510/1152506176] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote1, LSN [21][946437]
    [1307872771:130460][6510/1162996032] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [21][986728]
    [1307872771:130467][6510/1152506176] CLIENT: Updating gen from 68 to 70
    [1307872771:130482][6510/1162996032] CLIENT: Received ALIVE egen of 71, mine 69
    [1307872771:130503][6510/1162996032] CLIENT: Election finished in 0.003602000 sec
    [1307872771:130515][6510/1162996032] CLIENT: Election done; egen 70
    [1307872771:130534][6510/1152506176] CLIENT: Received vote1 egen 71, egen 71
    [1307872771:130581][6510/1152506176] CLIENT: Tallying VOTE1[0] (0, 71)
    [1307872771:130593][6510/1089075520] CLIENT: starting election thread
    [1307872771:130619][6510/1152506176] CLIENT: Incoming vote: (eid)0 (pri)100 ELECTABLE (gen)70 (egen)71 [21,946437]
    [1307872771:130642][6510/1152506176] CLIENT: Not in election, but received vote1 0x282c 0x8
    [1307872771:130674][6510/1089075520] CLIENT: Start election nsites 2, ack 1, priority 100
    [1307872771:130692][6510/1089075520] CLIENT: Election thread owns egen 71
    [1307872771:130704][6510/1194465600] CLIENT: starting election thread
    [1307872771:130733][6510/1194465600] CLIENT: Start election nsites 2, ack 1, priority 100
    [1307872771:132922][6510/1089075520] CLIENT: Tallying VOTE1[1] (2147483647, 71)
    [1307872771:132949][6510/1089075520] CLIENT: Accepting new vote
    [1307872771:132958][6510/1089075520] CLIENT: Beginning an election
    [1307872771:132973][6510/1089075520] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type vote1, LSN [21][986728] nobuf
    [1307872771:132985][6510/1194465600] CLIENT: election thread is exiting
    [1307872771:133012][6510/1089075520] CLIENT: Tallying VOTE2[0] (2147483647, 71)
    [1307872771:133037][6510/1089075520] CLIENT: Counted my vote 1
    [1307872771:133048][6510/1089075520] CLIENT: Skipping phase2 wait: already got 1 votes
    [1307872771:133060][6510/1089075520] CLIENT: Got enough votes to win; election done; (prev) gen 70
    [1307872771:133071][6510/1089075520] CLIENT: Election finished in 0.002367000 sec
    [1307872771:133084][6510/1089075520] CLIENT: Election done; egen 72
    [1307872771:133111][6510/1089075520] CLIENT: Ended election with 0, e_th 1, egen 72, flag 0x2a2c, e_fl 0x0, lo_fl 0x6
    [1307872771:133170][6510/1173485888] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [0][0]
    [1307872771:133187][6510/1173485888] CLIENT: Racing replication msg lockout, ignore message.
    [1307872771:173744][6510/1162996032] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote2, LSN [0][0]
    [1307872771:173769][6510/1162996032] CLIENT: Racing replication msg lockout, ignore message.
    [1307872771:231593][6510/1183975744] CLIENT: Ended election with 0, e_th 0, egen 72, flag 0x2a2c, e_fl 0x0, lo_fl 0x1c
    [1307872771:231629][6510/1183975744] CLIENT: election thread is exiting
    [1307872777:443794][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
    [1307872971:644194][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
    [1307873165:844583][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
    [1307873360:44955][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
    [1307873554:245347][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
    [1307873748:445736][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
    [1307873942:646117][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
    [1307874136:846509][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
    .... and infinite stay to this situation
    My question is why the Master is suddenly transformed into CLIENT and why it's never returning to the MASTER
    Thanks in advance ...
    here is the log for the client
    [1307872315:455113][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][984396]
    [1307872315:455134][1282/1160603968] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][984483] perm
    [1307872315:609962][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][984733] perm
    [1307872315:764958][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][984986] perm
    [1307872315:919962][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985238] perm
    [1307872316:75018][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985491] perm
    [1307872316:229959][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985741] perm
    [1307872316:384949][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985993] perm
    [1307872316:499899][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][986141] perm
    [1307872316:539895][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986221]
    [1307872316:540078][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986307]
    [1307872316:540100][1282/1160603968] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986394] perm
    [1307872316:694950][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][986648] perm
    [1307872316:847349][1282/1129134400] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type log, LSN [21][946345]
    [1307872316:847698][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type dupmaster, LSN [0][0]
    [1307872316:847999][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type newclient, LSN [0][0]
    [1307872316:848168][1282/1171093824] MASTER: rep_start: Found old version log 17
    [1307872316:848222][1282/1181583680] CLIENT: Racing replication msg lockout, ignore message.
    [1307872316:848398][1282/1171093824] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type newclient, LSN [0][0] nobuf
    [1307872316:848504][1282/1192073536] CLIENT: starting election thread
    [1307872316:848542][1282/1192073536] CLIENT: Start election nsites 2, ack 1, priority 100
    [1307872316:848566][1282/1192073536] CLIENT: Election thread owns egen 71
    [1307872316:849634][1282/1192073536] CLIENT: Tallying VOTE1[0] (2147483647, 71)
    [1307872316:849654][1282/1192073536] CLIENT: Beginning an election
    [1307872316:849680][1282/1192073536] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type vote1, LSN [21][946437] nobuf
    [1307872316:851403][1282/1160603968] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type vote1, LSN [21][986728]
    [1307872316:851448][1282/1160603968] CLIENT: Received vote1 egen 69, egen 71
    [1307872316:851470][1282/1160603968] CLIENT: Received old vote 69, egen 71, ignoring vote1
    [1307872316:851481][1282/1160603968] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [21][986728] nobuf
    [1307872316:851538][1282/1171093824] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type master_req, LSN [0][0]
    [1307872316:851558][1282/1171093824] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [0][0] nobuf
    [1307872316:854254][1282/1160603968] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote1, LSN [21][986728]
    [1307872316:854275][1282/1160603968] CLIENT: Received vote1 egen 71, egen 71
    [1307872316:854317][1282/1160603968] CLIENT: Tallying VOTE1[1] (0, 71)
    [1307872316:854339][1282/1160603968] CLIENT: Incoming vote: (eid)0 (pri)100 ELECTABLE (gen)70 (egen)71 [21,986728]
    [1307872316:854353][1282/1160603968] CLIENT: Existing vote: (eid)2147483647 (pri)100 (gen)70 (sites)2 [21,946437]
    [1307872316:854369][1282/1160603968] CLIENT: Accepting new vote
    [1307872316:854379][1282/1160603968] CLIENT: Phase1 election done
    [1307872316:854395][1282/1160603968] CLIENT: Voting for 0
    [1307872316:854407][1282/1160603968] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type vote2, LSN [0][0] nobuf
    [1307872317:960344][1282/1192073536] CLIENT: After phase 2: votes 0, nvotes 1, nsites 2
    [1307872317:960389][1282/1192073536] CLIENT: Election finished in 1.111809000 sec
    [1307872317:960401][1282/1192073536] CLIENT: Election done; egen 72
    [1307872317:960412][1282/1192073536] CLIENT: Ended election with -30974, e_th 0, egen 72, flag 0x282c, e_fl 0x0, lo_fl 0x0
    Kill me !!
    --- my source
    on the master I run manually :
    txn_rate 1
    loop_rate 10
    loop 1 20000
    * See the file LICENSE for redistribution information.
    * Copyright (c) 2001, 2010 Oracle and/or its affiliates. All rights reserved.
    * $Id$
    * In this application, we specify all communication via the command line. In
    * a real application, we would expect that information about the other sites
    * in the system would be maintained in some sort of configuration file. The
    * critical part of this interface is that we assume at startup that we can
    * find out
    *      1) what our Berkeley DB home environment is,
    *      2) what host/port we wish to listen on for connections; and
    *      3) an optional list of other sites we should attempt to connect to.
    * These pieces of information are expressed by the following flags.
    * -h home (required; h stands for home directory)
    * -l host:port (required; l stands for local)
    * -C or -M (optional; start up as client or master)
    * -r host:port (optional; r stands for remote; any number of these may be
    *     specified)
    * -R host:port (optional; R stands for remote peer; only one of these may
    * be specified)
    * -a all|quorum (optional; a stands for ack policy)
    * -b (optional; b stands for bulk)
    * -n nsites (optional; number of sites in replication group; defaults to 0
    *     to try to dynamically compute nsites)
    * -p priority (optional; defaults to 100)
    * -v (optional; v stands for verbose)
    #include <cstdlib>
    #include <cstring>
    #include <iostream>
    #include <string>
    #include <sstream>
    #include <sys/types.h>
    #include <signal.h>
    #include <db_cxx.h>
    #include "RepConfigInfo.h"
    #include "dbc_auto.h"
    using std::cout;
    using std::cin;
    using std::cerr;
    using std::endl;
    using std::ends;
    using std::flush;
    using std::istream;
    using std::istringstream;
    using std::ostringstream;
    using std::string;
    using std::getline;
    #include <stdio.h>
    #include <readline/readline.h>
    #include <readline/history.h>
    #define     CACHESIZE     (10 * 1024 * 1024)
    #define     DATABASE     "quote.db"
    #define     DATABASE2     "quote2.db"
    const char *progname = "excxx_repquote";
    #include <errno.h>
    #ifdef _WIN32
    #define WIN32_LEAN_AND_MEAN
    #include <windows.h>
    #define     snprintf          _snprintf
    #define     sleep(s)          Sleep(1000 * (s))
    extern "C" {
    extern int getopt(int, char * const *, const char *);
    extern char *optarg;
    typedef HANDLE thread_t;
    typedef DWORD thread_exit_status_t;
    #define     thread_create(thrp, attr, func, arg)                    \
    (((*(thrp) = CreateThread(NULL, 0,                         \
         (LPTHREAD_START_ROUTINE)(func), (arg), 0, NULL)) == NULL) ? -1 : 0)
    #define     thread_join(thr, statusp)                         \
    ((WaitForSingleObject((thr), INFINITE) == WAIT_OBJECT_0) &&          \
    GetExitCodeThread((thr), (LPDWORD)(statusp)) ? 0 : -1)
    #else /* !_WIN32 */
    #include <pthread.h>
    typedef pthread_t thread_t;
    typedef void* thread_exit_status_t;
    #define     thread_create(thrp, attr, func, arg)                    \
    pthread_create((thrp), (attr), (func), (arg))
    #define     thread_join(thr, statusp) pthread_join((thr), (statusp))
    #endif
    // Struct used to store information in Db app_private field.
    typedef struct {
         bool app_finished;
         bool in_client_sync;
         bool is_master;
         bool no_dummy_wr;
    } APP_DATA;
    static void log(const char *);
    void checkpoint_thread (void );
    void log_archive_thread (void );
    void dummy_write_thread (void );
    class RepQuoteExample {
    public:
         RepQuoteExample();
         void init(RepConfigInfo* config);
         void doloop();
         int terminate();
         static void event_callback(DbEnv* dbenv, u_int32_t which, void *info);
         void print_stocks_size(Db *dbp);
    private:
         // disable copy constructor.
         RepQuoteExample(const RepQuoteExample &);
         void operator = (const RepQuoteExample &);
         // internal data members.
         APP_DATA          app_data;
         RepConfigInfo *app_config;
         DbEnv          cur_env;
         thread_t ckp_thr;
         thread_t lga_thr;
         thread_t dmy_thr;
         // private methods.
         void print_stocks(Db *dbp);
         void print_env(DbEnv *dbenv);
         void prompt();
    RepQuoteExample *g_runner=NULL;
    RepConfigInfo *g_config=NULL;
    class DbHolder {
    public:
         DbHolder(DbEnv env, const char _dbname) : env(env)
              dbp = 0;
              if (_dbname) dbname=_dbname;
              else dbname=DATABASE;
         ~DbHolder() {
         try {
              close();
         } catch (...) {
              // Ignore: this may mean another exception is pending
         bool ensure_open(bool creating) {
         if (dbp)
              return (true);
         dbp = new Db(env, 0);
         u_int32_t flags = DB_AUTO_COMMIT;
         if (creating)
              flags |= DB_CREATE;
         try {
              //dbp->open(NULL, DATABASE, NULL, DB_BTREE, flags, 0);
              //dbp->open(NULL, dbname, NULL, DB_BTREE, flags, 0);
              dbp->open(NULL, NULL, dbname, DB_BTREE, flags, 0);
              return (true);
         } catch (DbDeadlockException e) {
         } catch (DbRepHandleDeadException e) {
         } catch (DbException e) {
              if (e.get_errno() == DB_REP_LOCKOUT) {
              // Just fall through.
              } else if (e.get_errno() == ENOENT && !creating) {
              // Provide a bit of extra explanation.
              log("Stock DB does not yet exist");
              } else
              throw;
         // (All retryable errors fall through to here.)
         log("please retry the operation");
         close();
         return (false);
         void close() {
         if (dbp) {
              try {
              dbp->close(0);
              delete dbp;
              dbp = 0;
              } catch (...) {
              delete dbp;
              dbp = 0;
              throw;
         operator Db *() {
         return dbp;
         Db *operator->() {
         return dbp;
    private:
         Db *dbp;
         DbEnv *env;
         const char *dbname;
    class StringDbt : public Dbt {
    public:
    #define GET_STRING_OK 0
    #define GET_STRING_INVALID_PARAM 1
    #define GET_STRING_SMALL_BUFFER 2
    #define GET_STRING_EMPTY_DATA 3
         int get_string(char **buf, size_t buf_len)
              size_t copy_len;
              int ret = GET_STRING_OK;
              if (buf == NULL) {
                   cerr << "Invalid input buffer to get_string" << endl;
                   return GET_STRING_INVALID_PARAM;
              // make sure the string is null terminated.
              memset(*buf, 0, buf_len);
              // if there is no string, just return.
              if (get_data() == NULL || get_size() == 0)
                   return GET_STRING_OK;
              if (get_size() >= buf_len) {
                   ret = GET_STRING_SMALL_BUFFER;
                   copy_len = buf_len - 1; // save room for a terminator.
              } else
                   copy_len = get_size();
              memcpy(*buf, get_data(), copy_len);
              return ret;
         size_t get_string_length()
              if (get_size() == 0)
                   return 0;
              return strlen((char *)get_data());
         void set_string(char *string)
              set_data(string);
              set_size((u_int32_t)strlen(string));
         StringDbt(char *string) :
         Dbt(string, (u_int32_t)strlen(string)) {};
         StringDbt() : Dbt() {};
         ~StringDbt() {};
         // Don't add extra data to this sub-class since we want it to remain
         // compatible with Dbt objects created internally by Berkeley DB.
    Db *g_repquote=NULL;
    RepQuoteExample::RepQuoteExample() : app_config(0), cur_env(0) {
         app_data.app_finished = 0;
         app_data.in_client_sync = 0;
         app_data.is_master = 0; // assume I start out as client
         app_data.no_dummy_wr = 0 ; //prevent to run dummy write
    int (*old_rep_process_message)
              __P((DB_ENV *, DBT *, DBT *, int, DB_LSN *));
    int my_rep_process_message __P((DB_ENV arg1, DBT arg2, DBT arg3, int arg4, DB_LSN arg5))
         printf("EZ->>> my_rep_process_message:%p\n",arg5);
         old_rep_process_message(arg1,arg2,arg3,arg4,arg5);
    void RepQuoteExample::init(RepConfigInfo *config) {
         app_config = config;
         cur_env.set_app_private(&app_data);
         cur_env.set_errfile(stderr);
         app_data.no_dummy_wr=config->no_dummy_wr;
         if (app_data.no_dummy_wr)
              printf("No dummy !!!\n");
         //EZ->cur_env.set_errpfx(progname);
         cur_env.set_event_notify(event_callback);
         // Configure bulk transfer to send groups of records to clients
         // in a single network transfer. This is useful for master sites
         // and clients participating in client-to-client synchronization.
         if (app_config->bulk)
              cur_env.rep_set_config(DB_REP_CONF_BULK, 1);
         // Set the total number of sites in the replication group.
         // This is used by repmgr internal election processing.
         if (app_config->totalsites > 0)
              cur_env.rep_set_nsites(app_config->totalsites);
         // Turn on debugging and informational output if requested.
         if (app_config->verbose)
              cur_env.set_verbose(DB_VERB_REPLICATION, 1);
         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
         cur_env.set_verbose(DB_VERB_RECOVERY, 1);
         cur_env.set_verbose(DB_VERB_REPLICATION, 1);
         cur_env.set_verbose(DB_VERB_REP_ELECT, 1);
         cur_env.set_verbose(DB_VERB_REP_LEASE, 1);
         cur_env.set_verbose(DB_VERB_REP_SYNC, 1);
         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
         // Set replication group election priority for this environment.
         // An election first selects the site with the most recent log
         // records as the new master. If multiple sites have the most
         // recent log records, the site with the highest priority value
         // is selected as master.
         cur_env.rep_set_priority(app_config->priority);
         // Set the policy that determines how master and client sites
         // handle acknowledgement of replication messages needed for
         // permanent records. The default policy of "quorum" requires only
         // a quorum of electable peers sufficient to ensure a permanent
         // record remains durable if an election is held. The "all" option
         // requires all clients to acknowledge a permanent replication
         // message instead.
         cur_env.repmgr_set_ack_policy(app_config->ack_policy);
         // Set the threshold for the minimum and maximum time the client
         // waits before requesting retransmission of a missing message.
         // Base these values on the performance and load characteristics
         // of the master and client host platforms as well as the round
         // trip message time.
         cur_env.rep_set_request(20000, 500000);
         // Configure deadlock detection to ensure that any deadlocks
         // are broken by having one of the conflicting lock requests
         // rejected. DB_LOCK_DEFAULT uses the lock policy specified
         // at environment creation time or DB_LOCK_RANDOM if none was
         // specified.
         cur_env.set_lk_detect(DB_LOCK_DEFAULT);
         // The following base replication features may also be useful to your
         // application. See Berkeley DB documentation for more details.
         // - Master leases: Provide stricter consistency for data reads
         // on a master site.
         // - Timeouts: Customize the amount of time Berkeley DB waits
         // for such things as an election to be concluded or a master
         // lease to be granted.
         // - Delayed client synchronization: Manage the master site's
         // resources by spreading out resource-intensive client
         // synchronizations.
         // - Blocked client operations: Return immediately with an error
         // instead of waiting indefinitely if a client operation is
         // blocked by an ongoing client synchronization.
         cur_env.repmgr_set_local_site(app_config->this_host.host,
         app_config->this_host.port, 0);
         for ( REP_HOST_INFO *cur = app_config->other_hosts; cur != NULL;
              cur = cur->next) {
              cur_env.repmgr_add_remote_site(cur->host, cur->port,
              NULL, cur->peer ? DB_REPMGR_PEER : 0);
         // Configure heartbeat timeouts so that repmgr monitors the
         // health of the TCP connection. Master sites broadcast a heartbeat
         // at the frequency specified by the DB_REP_HEARTBEAT_SEND timeout.
         // Client sites wait for message activity the length of the
         // DB_REP_HEARTBEAT_MONITOR timeout before concluding that the
         // connection to the master is lost. The DB_REP_HEARTBEAT_MONITOR
         // timeout should be longer than the DB_REP_HEARTBEAT_SEND timeout.
         cur_env.rep_set_timeout(DB_REP_HEARTBEAT_SEND, 5000000);
         cur_env.rep_set_timeout(DB_REP_HEARTBEAT_MONITOR, 10000000);
         // The following repmgr features may also be useful to your
         // application. See Berkeley DB documentation for more details.
         // - Two-site strict majority rule - In a two-site replication
         // group, require both sites to be available to elect a new
         // master.
         // - Timeouts - Customize the amount of time repmgr waits
         // for such things as waiting for acknowledgements or attempting
         // to reconnect to other sites.
         // - Site list - return a list of sites currently known to repmgr.
         // We can now open our environment, although we're not ready to
         // begin replicating. However, we want to have a dbenv around
         // so that we can send it into any of our message handlers.
         cur_env.set_cachesize(0, CACHESIZE, 0);
         cur_env.set_flags(DB_REP_PERMANENT, 1);
         //cur_env.set_flags(DB_TXN_WRITE_NOSYNC, 1);
    /*     u_int32_t maxlocks=300000;
         if (maxlocks != 0)
              cur_env.set_lk_max_locks(maxlocks);
         u_int32_t maxlocks_o=300000;
         if (maxlocks_o != 0)
              cur_env.set_lk_max_objects(maxlocks_o);
         u_int32_t maxmutex=300000;
         if (maxmutex != 0)
              cur_env.mutex_set_max(maxmutex);
         DbEnv          *m_env=&cur_env;
         m_env->set_flags(DB_TXN_NOSYNC, 1);
         m_env->set_lk_max_lockers(60000);
         m_env->set_lk_max_objects(60000);
         m_env->set_lk_max_locks(60000);
         m_env->set_tx_max(60000);
         //m_env->repmgr_set_ack_policy(DB_REPMGR_ACKS_NONE);
         m_env->rep_set_timeout(DB_REP_ACK_TIMEOUT, 50 * 1000); //50ms
         m_env->rep_set_timeout(DB_REP_CHECKPOINT_DELAY, 0);
         //m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 30 * 1000 * 1000); // 30 seconds
         m_env->rep_set_timeout(DB_REP_ELECTION_TIMEOUT, 1 * 1000 * 1000); // 5 seconds
         m_env->rep_set_timeout(DB_REP_FULL_ELECTION_TIMEOUT, 5 * 1000 * 1000); // 5 seconds
         m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 5 * 1000 * 1000);
         //m_env->rep_set_timeout(DB_REP_ELECTION_RETRY, 10 * 1000 * 1000); //10 seconds
         //m_env->rep_set_timeout(DB_REP_HEARTBEAT_MONITOR, 80 * 1000 * 1000); //80 seconds
         //m_env->rep_set_timeout(DB_REP_HEARTBEAT_SEND, 500 * 1000); //500 milli seconds
         //The minimum number of microseconds a client waits before requesting retransmission
         u_int32_t rep_req_min = 40000; //40 000 microsec = 40 mili
         //The maximum number of microseconds a client waits before requesting retransmission
         u_int32_t rep_req_max = 1280000;// 1 280 000 microsec = 1.28 sec
         u_int32_t rep_limit_gbytes = 0;
         u_int32_t rep_limit_bytes = 100 * 1024 * 1024; // 100MB
         m_env->rep_set_request(rep_req_min, rep_req_max);
         m_env->rep_set_limit(rep_limit_gbytes, rep_limit_bytes);
         cur_env.open(app_config->home, DB_CREATE | DB_RECOVER |
         DB_THREAD | DB_INIT_REP | DB_INIT_LOCK | DB_INIT_LOG |
         DB_INIT_MPOOL | DB_INIT_TXN , 0);
         //keep old function for chain
         //old_rep_process_message=cur_env.get_DB_ENV()->rep_process_message;
         //derouting
         //cur_env.get_DB_ENV()->rep_process_message=my_rep_process_message;
         /*int _i;
         cur_env.log_get_config(DB_LOG_DIRECT, &_i);printf ("DB_LOG_DIRECT = %d\n",_i);
         cur_env.log_get_config(DB_LOG_DSYNC, &_i);printf ("DB_LOG_DSYNC = %d\n",_i);
         cur_env.log_get_config(DB_LOG_AUTO_REMOVE, &_i);printf ("DB_LOG_AUTO_REMOVE = %d\n",_i);
         cur_env.log_get_config(DB_LOG_IN_MEMORY, &_i);printf ("DB_LOG_IN_MEMORY = %d\n",_i);
         cur_env.log_get_config(DB_LOG_ZERO,&_i);printf ("DB_LOG_ZERO = %d\n",_i);
         // Start checkpoint and log archive support threads.
         (void)thread_create(&ckp_thr, NULL, checkpoint_thread, &cur_env);
         (void)thread_create(&lga_thr, NULL, log_archive_thread, &cur_env);
         (void)thread_create(&dmy_thr, NULL, dummy_write_thread, &cur_env);
         cur_env.repmgr_start(3, app_config->start_policy);
    }

    int RepQuoteExample::terminate() {
         try {
              // Wait for checkpoint and log archive threads to finish.
              // Windows does not allow NULL pointer for exit code variable.
              thread_exit_status_t exstat;
              (void)thread_join(lga_thr, &exstat);
              (void)thread_join(ckp_thr, &exstat);
              (void)thread_join(dmy_thr, &exstat);
              // We have used the DB_TXN_NOSYNC environment flag for
              // improved performance without the usual sacrifice of
              // transactional durability, as discussed in the
              // "Transactional guarantees" page of the Reference
              // Guide: if one replication site crashes, we can
              // expect the data to exist at another site. However,
              // in case we shut down all sites gracefully, we push
              // out the end of the log here so that the most
              // recent transactions don't mysteriously disappear.
              cur_env.log_flush(NULL);
              cur_env.close(0);
         } catch (DbException dbe) {
              cout << "error closing environment: " << dbe.what() << endl;
         return 0;
    void RepQuoteExample::prompt() {
         cout << "QUOTESERVER";
         if (!app_data.is_master)
              cout << "(read-only)";
         cout << "> " << flush;
    void log(const char *msg) {
    time_t currentTime;
    // get and print the current time
    time (&currentTime); // fill now with the current time
         char buff[255];
         strncpy(buff,ctime(&currentTime),sizeof(buff));
         char *p;
         for(p =buff ; *p != '\n'; p++);
         *p = '\0';
         cerr << buff << " - " << msg << endl;
    // Simple command-line user interface:
    // - enter "<stock symbol> <price>" to insert or update a record in the
    //     database;
    // - just press Return (i.e., blank input line) to print out the contents of
    //     the database;
    // - enter "quit" or "exit" to quit.
    void RepQuoteExample::doloop() {
         DbHolder dbh1(&cur_env,DATABASE);
         DbHolder dbh2(&cur_env,DATABASE2);
         DbHolder *dbh=&dbh1;
         DbTxn *txn;
         string input;
    bool truncate = false;
         char *c;
         using_history();
         g_repquote=*dbh;
         int loop_rate = 0;
         int txn_rate = 500;
         while (prompt(), /*getline(cin, input)*/c=readline(NULL)) {
              input=std::string(c);
              add_history(c);
              free(c);
              int start_loop = 0;
              int end_loop = 0;
              int start_loop_d = 0;
              int end_loop_d = 0;
              istringstream is(input);
              string token1, token2, token3;
    truncate = false;
    start_loop = 0;
    end_loop = 0;
              // Read 0, 1 or 2 tokens from the input.
              int count = 0;
              if (is >> token1) {
                   count++;
                   if (is >> token2)
                   count++;
                   if (is >> token3)
                   count++;
              if (count == 1) {
         if (token1 == "truncate" ) {
                        truncate = true;     
                   else if (token1 == "env" ){
                        print_env(&cur_env);
                        continue;
         else if (token1 == "verbose" ) {
                        app_config->verbose = !app_config->verbose;
                        if (app_config->verbose)
                             cur_env.set_verbose(DB_VERB_REPLICATION, 1);
                             cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
                             cur_env.set_verbose(DB_VERB_RECOVERY, 1);
                             cur_env.set_verbose(DB_VERB_REP_ELECT, 1);
                             cur_env.set_verbose(DB_VERB_REP_LEASE, 1);
                             cur_env.set_verbose(DB_VERB_REP_SYNC, 1);
                             cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
                             log("verbose is on");
                        else
                             cur_env.set_verbose(DB_VERB_REPLICATION, 0);
                             cur_env.set_verbose(DB_VERB_REPMGR_MISC, 0);
                             cur_env.set_verbose(DB_VERB_RECOVERY, 0);
                             cur_env.set_verbose(DB_VERB_REP_ELECT, 0);
                             cur_env.set_verbose(DB_VERB_REP_LEASE, 0);
                             cur_env.set_verbose(DB_VERB_REP_SYNC, 0);
                             cur_env.set_verbose(DB_VERB_REPMGR_MISC, 0);
                             log("verbose is off");
                        continue;
         else if (token1 == "print" ) {
                   print_stocks(*dbh);
                        count = 0;      
         else if (token1 == "db1" ) {
                        dbh=&dbh1;
                        g_repquote=*dbh;
                        log( "switch to Db1");
                        count = 0;      
         else if (token1 == "db2" ) {
                        dbh=&dbh2;
                        g_repquote=*dbh;
                        log( "switch to Db2");
                        count = 0;      
                   else if (token1 == "exit" || token1 == "quit") {
                        app_data.app_finished = 1;
                        break;
                   } else {
                        log("Format: <stock> <price>");
                        continue;
    else if (count == 2)
                   if (token1 == "loop_rate" ){
         loop_rate = atoi(token2.c_str());
                        continue;
                   if (token1 == "txn_rate" ){
         txn_rate = atoi(token2.c_str());
                        continue;
    else if (count == 3)
    if (token1 == "loop" ) {
    start_loop = atoi(token2.c_str());
    end_loop = start_loop + atoi(token3.c_str());
    if (token1 == "delete" ) {
    start_loop_d = atoi(token2.c_str());
    end_loop_d = start_loop_d + atoi(token3.c_str());
              // Here we know count is either 0 or 2, so we're about to try a
              // DB operation.
              // Open database with DB_CREATE only if this is a master
              // database. A client database uses polling to attempt
              // to open the database without DB_CREATE until it is
              // successful.
              // This DB_CREATE polling logic can be simplified under
              // some circumstances. For example, if the application can
              // be sure a database is already there, it would never need
              // to open it with DB_CREATE.
              if (!dbh->ensure_open(app_data.is_master))
                   continue;
              try {
                   if (count == 0)
                        if (app_data.in_client_sync)
                             log( "Cannot read data during client initialization - please try again.");
                        else
                             print_stocks_size(*dbh);
                   else if (!app_data.is_master)
                        log("Can't update at client");
                   else {
                        if (truncate)
    u_int32_t no_remove;
                        txn = NULL;
    cur_env.txn_begin(NULL, &txn, DB_TXN_NOWAIT);
                             try
              (*dbh)->truncate(txn, &no_remove, 0);
    // commit
    txn->commit(0);
    txn = NULL;
    } catch (DbException &e) {
    std::cout << "Error on txn commit: " << e.what() << std::endl;
                        //     } catch (DbDeadlockException &) {
                        if (txn != NULL)
                             (void)txn->abort();
    // std::cout << "Error on txn commit: " << std::endl;
    else if (start_loop)
    int j=0;
    for (int i=start_loop; i<=end_loop; i=i+txn_rate)
    //transaction begin
                   txn = NULL;
                   cur_env.txn_begin(NULL, &txn, 0);
    for (j=i; j<=end_loop && j<=(i+txn_rate); j++)
                                  Dbt key, value;
         std::string key1, value1;
         std::stringstream sstrm;
         sstrm << "key" << j << ends;
         key1 = sstrm.str();
                   key.set_data((void *)key1.c_str());
                   key.set_size((u_int32_t)strlen(key1.c_str()));
         sstrm.str("");
         int payload = rand() + j;
                                  sstrm << "price" << payload << ends;
         value1 = sstrm.str();
                   value.set_data((void *)value1.c_str());
                   value.set_size((u_int32_t)strlen(value1.c_str()));
         // Perform the database put
         (*dbh)->put(txn, &key, &value, 0);
                             printf("Kill me !!\n");
                             kill(getpid(),-9);
                             exit(0);
         try
                                  // commit
                        txn->commit(0);
                        txn = NULL;
                   } catch (DbException &e) {
                        std::cout << "Error on txn commit: " << e.what() << std::endl;
                             if (loop_rate>0)
                                  usleep(txn_rate * 1000 * 1000 / loop_rate);
                        else if (start_loop_d)
    int j=0;
    for (int i=start_loop_d; i<=end_loop_d; i=i+100)
    //transaction begin
                   txn = NULL;
                   cur_env.txn_begin(NULL, &txn, 0);
    for (j=i; j<=end_loop_d && j<=(i+100); j++)
                                  Dbt key, value;
         std::string key1, value1;
         std::stringstream sstrm;
         sstrm << "key" << j << ends;
         key1 = sstrm.str();
                   key.set_data((void *)key1.c_str());
                   key.set_size((u_int32_t)strlen(key1.c_str()));
         // Perform the database put
         (*dbh)->del(txn, &key, 0);
         try
                                  // commit
                        txn->commit(0);
                        txn = NULL;
                   } catch (DbException &e) {
                        std::cout << "Error on txn commit: " << e.what() << std::endl;
                        else
                             const char *symbol = token1.c_str();
                             StringDbt key(const_cast<char*>(symbol));
                             const char *price = token2.c_str();
                             StringDbt data(const_cast<char*>(price));
                             (*dbh)->put(NULL, &key, &data, 0);
              } catch (DbDeadlockException e) {
                   log("please retry the operation");
                   dbh->close();
              } catch (DbRepHandleDeadException e) {
                   log("please retry the operation");
                   dbh->close();
              } catch (DbException e) {
                   if (e.get_errno() == DB_REP_LOCKOUT) {
                   log("please retry the operation");
                   dbh->close();
                   } else
                   throw;
         dbh->close();
    void RepQuoteExample::event_callback(DbEnv* dbenv, u_int32_t which, void *info)
         static char buf[256];
         APP_DATA app = (APP_DATA)dbenv->get_app_private();
         info = NULL;          /* Currently unused. */
         switch (which) {
         case DB_EVENT_REP_CLIENT:
              app->is_master = 0;
              app->in_client_sync = 1;
              sprintf(buf,"%s - %s",progname,"CLIENT");
              //EZ->dbenv->set_errpfx(buf);
              log("DB_EVENT_REP_CLIENT.");
              break;
         case DB_EVENT_REP_MASTER:
              app->is_master = 1;
              app->in_client_sync = 0;
              sprintf(buf,"%s - %s",progname,"MASTER");
              //EZ->dbenv->set_errpfx(buf);
              log("DB_EVENT_REP_MASTER.");
              break;
         case DB_EVENT_REP_NEWMASTER:
              log("DB_EVENT_REP_NEWMASTER.");
              app->in_client_sync = 1;
              break;
         case DB_EVENT_REP_PERM_FAILED:
              // Did not get enough acks to guarantee transaction
              // durability based on the configured ack policy. This
              // transaction will be flushed to the master site's
              // local disk storage for durability.
              log("DB_EVENT_REP_PERM_FAILED.");
              log("Insufficient acknowledgements to guarantee transaction durability.");
              break;
         case DB_EVENT_REP_STARTUPDONE:
              app->in_client_sync = 0;
              log("DB_EVENT_REP_STARTUPDONE.");
              break;
         case DB_EVENT_REP_ELECTION_FAILED:
              log("DB_EVENT_REP_ELECTION_FAILED.");
              //g_runner->init(g_config);
              printf("Kill me !!\n");
              kill(getpid(),-9);
              exit(0);
              break;
         case DB_EVENT_REP_DUPMASTER:
              log("DB_EVENT_REP_DUPMASTER.");
              break;
         default:
              dbenv->errx("ignoring event %d", which);
    void RepQuoteExample::print_stocks_size(Db *dbp) {
         DB_BTREE_STAT *statp;
    dbp->stat(NULL, &statp, 0);
         log("db_stat");
    cout << "***************************************** >>>>>>>>>>> : database contains " << (u_long)statp->bt_ndata << " records\n";
    void RepQuoteExample::print_env(DbEnv *dbenv) {
         dbenv->stat_print(DB_STAT_ALL);
    void RepQuoteExample::print_stocks(Db *dbp) {
         StringDbt key, data;
    #define     MAXKEYSIZE     10
    #define     MAXDATASIZE     20
         char keybuf[MAXKEYSIZE + 1], databuf[MAXDATASIZE + 1];
         char kbuf, dbuf;
         memset(&key, 0, sizeof(key));
         memset(&data, 0, sizeof(data));
         kbuf = keybuf;
         dbuf = databuf;
         DbcAuto dbc(dbp, 0, 0);
         cout << "\tSymbol\tPrice" << endl
              << "\t======\t=====" << endl;
    int no_records =0;
         for (int ret = dbc->get(&key, &data, DB_FIRST);
              ret == 0;
              ret = dbc->get(&key, &data, DB_NEXT)) {
              key.get_string(&kbuf, MAXKEYSIZE);
              data.get_string(&dbuf, MAXDATASIZE);
    no_records++;
              cout << "\t" << keybuf << "\t" << databuf << endl;
    cout << "********************** NO Records " << no_records << endl;
         cout << endl << flush;
         dbc.close();
    static void usage() {
         cerr << "usage: " << progname << " -h home -l host:port [-CM]"
         << "[-r host:port][-R host:port]" << endl
         << " [-a all|quorum][-b][-n nsites][-p priority][-v]" << endl;
         cerr << "\t -h home (required; h stands for home directory)" << endl
         << "\t -l host:port (required; l stands for local)" << endl
         << "\t -C or -M (optional; start up as client or master)" << endl
         << "\t -r host:port (optional; r stands for remote; any "
         << "number of these" << endl
         << "\t may be specified)" << endl
         << "\t -R host:port (optional; R stands for remote peer; only "
         << "one of" << endl
         << "\t these may be specified)" << endl
         << "\t -a all|quorum (optional; a stands for ack policy)" << endl
         << "\t -b (optional; b stands for bulk)" << endl
         << "\t -n nsites (optional; number of sites in replication "
         << "group; defaults " << endl
         << "\t     to 0 to try to dynamically compute nsites)" << endl
         << "\t -p priority (optional; defaults to 100)" << endl
         << "\t -v (optional; v stands for verbose)" << endl;
         exit(EXIT_FAILURE);
    int main(int argc, char **argv) {
         RepConfigInfo config;
         char ch, portstr, tmphost;
         int tmpport;
         bool tmppeer;
         config.no_dummy_wr = false;
         // Extract the command line parameters
         while ((ch = getopt(argc, argv, "E:a:bCh:l:Mn:p:R:r:vw")) != EOF) {
              tmppeer = false;
              switch (ch) {
              case 'a':
                   if (strncmp(optarg, "all", 3) == 0)
                        config.ack_policy = DB_REPMGR_ACKS_ALL;
                   else if (strncmp(optarg, "quorum", 6) != 0)
                        usage();
                   break;
              case 'b':
                   config.bulk = true;
                   break;
              case 'C':
                   config.start_policy = DB_REP_CLIENT;
                   break;
              case 'E':
    config.start_policy = DB_REP_ELECTION;
    break;
              case 'h':
                   config.home = optarg;
                   break;
              case 'l':
                   config.this_host.host = strtok(optarg, ":");
                   if ((portstr = strtok(NULL, ":")) == NULL) {
                        cerr << "Bad host specification." << endl;
                        usage();
                   config.this_host.port = (unsigned short)atoi(portstr);
                   config.got_listen_address = true;
                   break;
              case 'M':
                   config.start_policy = DB_REP_MASTER;
                   break;
              case 'n':
                   config.totalsites = atoi(optarg);
                   break;
              case 'p':
                   config.priority = atoi(optarg);
                   break;
              case 'R':
                   tmppeer = true; // FALLTHROUGH
              case 'r':
                   tmphost = strtok(optarg, ":");
                   if ((portstr = strtok(NULL, ":")) == NULL) {
                        cerr << "Bad host specification." << endl;
                        usage();
                   tmpport = (unsigned short)atoi(portstr);
                   config.addOtherHost(tmphost, tmpport, tmppeer);
                   break;
              case 'v':
                   config.verbose = true;
                   break;
              case 'w':
                   config.no_dummy_wr = true;
                   //config.priority = 2;
                   break;
              case '?':
              default:
                   usage();
         // Error check command line.
         if ((!config.got_listen_address) || config.home == NULL)
              usage();
         RepQuoteExample runner;
         g_runner=&runner;
         g_config=&config;
         try {
              runner.init(&config);
              runner.doloop();
         } catch (DbException dbe) {
              cerr << "Caught an exception during initialization or"
                   << " processing: " << dbe.what() << endl;
         runner.terminate();
         return 0;
    // This is a very simple thread that performs checkpoints at a fixed
    // time interval. For a master site, the time interval is one minute
    // plus the duration of the checkpoint_delay timeout (30 seconds by
    // default.) For a client site, the time interval is one minute.
    void checkpoint_thread(void args)
         DbEnv *env;
         APP_DATA *app;
         int i, ret;
         env = (DbEnv *)args;
         app = (APP_DATA *)env->get_app_private();
         for (;;) {
              // Wait for one minute, polling once per second to see if
              // application has finished. When application has finished,
              // terminate this thread.
              for (i = 0; i < 60; i++) {
                   sleep(1);
                   if (app->app_finished == 1)
                        return ((void *)EXIT_SUCCESS);
              // Perform a checkpoint.
              // original line
              if ((ret = env->txn_checkpoint(0, 0, 0)) != 0) {
              //if ((ret = env->txn_checkpoint(0, 0, DB_FORCE)) != 0) {
                   env->err(ret, "Could not perform checkpoint.\n");
                   return ((void *)EXIT_FAILURE);
    // This is a simple log archive thread. Once per minute, it removes all but
    // the most recent 3 logs that are safe to remove according to a call to
    // DBENV->log_archive().
    // Log cleanup is needed to conserve disk space, but aggressive log cleanup
    // can cause more frequent client initializations if a client lags too far
    // behind the current master. This can happen in the event of a slow client,
    // a network partition, or a new master that has not kept as many logs as the
    // previous master.
    // The approach in this routine balances the need to mitigate against a
    // lagging client by keeping a few more of the most recent unneeded logs
    // with the need to conserve disk space by regularly cleaning up log files.
    // Use of automatic log removal (DBENV->log_set_config() DB_LOG_AUTO_REMOVE
    // flag) is not recommended for replication due to the risk of frequent
    // client initializations.
    void log_archive_thread(void args)
         DbEnv *env;
         APP_DATA *app;
         char **begin, **list;
         int i, listlen, logs_to_keep, minlog, ret;
         env = (DbEnv *)args;
         app = (APP_DATA *)env->get_app_private();
         logs_to_keep = 3;
         for (;;) {
              // Wait for one minute, polling once per second to see if
              // application has finished. When application has finished,
              // terminate this thread.
              for (i = 0; i < 60; i++) {
                   sleep(1);
                   if (app->app_finished == 1)
                        return ((void *)EXIT_SUCCESS);
              // Get the list of unneeded log files.
              if ((ret = env->log_archive(&list, DB_ARCH_ABS)) != 0) {
                   env->err(ret, "Could not get log archive list.");
                   return ((void *)EXIT_FAILURE);
              if (list != NULL) {
                   listlen = 0;
                   // Get the number of logs in the list.
                   for (begin = list; *begin != NULL; begin++, listlen++);
                   // Remove all but the logs_to_keep most recent
                   // unneeded log files.
                   minlog = listlen - logs_to_keep;
                   for (begin = list, i= 0; i < minlog; list++, i++) {
                        if ((ret = unlink(*list)) != 0) {
                             env->err(ret,
                             "logclean: remove %s", *list);
                             env->errx(
                             "logclean: Error remove %s", *list);
                             free(begin);
                             return ((void *)EXIT_FAILURE);
                   free(begin);
    #define DATABASE_DUMMY "dummy.db"
    void create_dummy_db(DB_ENV env, DB *dbp)
    DB_ENV *dbenv=env;
    int ret;
    u_int32_t db_flags;
    if ((ret = db_create(dbp, dbenv, 0)) != 0)
    dbenv->err(dbenv, ret, "create_dummy_db: db_create");
    db_flags = DB_AUTO_COMMIT | DB_CREATE;
    //if ((ret = (*dbp)->open(*dbp,NULL, DATABASE, NULL, DB_BTREE, db_flags, 0)) != 0)
    if ((ret = (*dbp)->open(*dbp,NULL, NULL, DATABASE_DUMMY, DB_BTREE, db_flags, 0)) != 0)
    dbenv->err(dbenv, ret, "create_dummy_db: DB->open");
    void reopen_dummy_db(DB_ENV env, DB *dbp)
    DB_ENV *dbenv=env;
    int ret;
    u_int32_t db_flags;
    if ((ret = db_create(dbp, dbenv, 0)) != 0)
    dbenv->err(dbenv, ret, "create_dummy_db: db_create");
    db_flags = DB_AUTO_COMMIT | DB_CREATE;
    //if ((ret = (*dbp)->open(*dbp,NULL, DATABASE, NULL, DB_BTREE, db_flags, 0)) != 0)
    if ((ret = (*dbp)->open(*dbp,NULL, NULL, DATABASE_DUMMY, DB_BTREE, db_flags, 0)) != 0)
    dbenv->err(dbenv, ret, "reopen_dummy_db: DB->open");
    void perform_db_operation(DB_ENV env, DB *dbp, bool bRead)
    //main loop
    //DB *dbp=NULL;
    DB_ENV *dbenv=env;
    int ret;
    u_int32_t db_flags;
    DBT key, data;
    char buf[20]="dummy", *rbuf;
    rbuf=buf;
    if (*dbp == NULL)
    create_dummy_db(dbenv, dbp);
    if (! bRead)
         memset(&key, 0, sizeof(key));
         memset(&data, 0, sizeof(data));
         key.data = buf;
         key.size = (u_int32_t)strlen(buf);
         data.data = rbuf;
         data.size = (u_int32_t)strlen(rbuf);
         if ((ret = (*dbp)->put(*dbp, NULL, &key, &data, 0)) != 0)
              if (ret == DB_REP_HANDLE_DEAD)
                   //create_dummy_db(dbenv, dbp);
                   reopen_dummy_db(dbenv, dbp);
                   (*dbp)->err(*dbp, ret, "DB->put :");
              else
              if (ret != DB_KEYEXIST)
                   (*dbp)->err(*dbp, ret, "perform_db_operation: DB->put");
         else
              DB_BTREE_STAT *statp;
              (*dbp)->stat(*dbp,NULL, &statp, 0);
              std::cout<<"dbp read stats: key#"<< statp->bt_nkeys <<std::endl;
    void dummy_write_thread(void args)
         DbEnv *env;
         APP_DATA *app;
         char **begin, **list;
         int i, listlen, logs_to_keep, minlog, ret;
         DB *m_dbp; // a pointer
         env = (DbEnv *)args;
         app = (APP_DATA *)env->get_app_private();
         logs_to_keep = 3;
         for (;;) {
              if (! app->no_dummy_wr)
                   if (app->is_master)
                   perform_db_operation(env->get_DB_ENV(),&m_dbp,false);
                        //env->txn_checkpoint(0, 0, DB_FORCE);
              usleep(1 * 1000 * 1000);
              else
                   if (app->is_master)
                        //DB *db_quote=g_repquote->get_DB();
                        //perform_db_operation(env->get_DB_ENV(),&db_quote,true);
                        //if (g_repquote)
                        //     g_runner->print_stocks_size(g_repquote);
                        //env->txn_checkpoint(0, 0, DB_FORCE);
                        //perform_db_operation(env->get_DB_ENV(),&m_dbp,false);
                        env->rep_flush();
              usleep(4 * 1000 * 1000);
    my script to simulate the split brain
    #!/bin/sh
    [ -z "$node1" ] && node1=10.10.32.121
    [ -z "$node2" ] && node2=10.10.32.91
    trap myend 0 1 2 3 6 9 14 15
    myend()
         echo "Receive signal to stop test..."
         un_split_brain
         echo "done"
         exit 1
    split_brain()
         echo -n "Split-Brain at node $node..."
         snmpset -m ALL -v 2c -c svil 10.10.0.100 ifAdminStatus.41 i 2 >/dev/null 2>&1
         echo "done"
    un_split_brain()
         echo -n "Undo Split-Brain at node $node..."
         snmpset -m ALL -v 2c -c svil 10.10.0.100 ifAdminStatus.41 i 1 >/dev/null 2>&1
         echo "done"
    is_slave()
         local r=$(ssh root@$1 "tail -2 /tmp/BDB.log" | grep -c CLIENT)
         [ $r -gt 1 ] && ret=1 || ret=0
         return $ret
    is_master()
         local r=$(ssh root@$1 "tail -2 /tmp/BDB.log" | grep -c MASTER)
         [ $r -gt 1 ] && ret=1 || ret=0
         return $ret
    wait_for_master()
         echo -n "Waiting for MASTER at node $node ... "
         is_master $node
         r=$?
         while ( [ ! $r -eq 1 ] )
         do
         usleep 500000
         is_master $node
         r=$?
         echo -n "."
         done
         echo "done"
    wait_for_slave()
         local r
         local tm
         tm=0
         echo -n "Waiting for SLAVE at node $node ... "
         is_slave $node
         r=$?
         while ( [ ! $r -eq 1 ] )
         do
              usleep 500000
              is_slave $node
              r=$?
              echo -n "."
              tm=$((tm+1))
              [ $tm -gt 120 ] && break
         done
         [ $tm -gt 120 ] && ret=0 || ret=1
         echo "done"
         return $ret
    run_test_split_brain()
         local nt
         nt=1
         nfails=0
         x=4
         [ -z "$1" ] && node=$node2
         while ((1))
         do
              printf "*************** TEST [%02d] ********************\n" $nt
              split_brain
              wait_for_master
              x=$((RANDOM%9))
              echo -n " waiting $x sec ..."
              sleep $x
              echo "done"
              un_split_brain
              wait_for_slave
              r=$?
              [ ! $r -eq 1 ] && echo "`date` - test [$nt] - fails ..." || echo "`date` - test [$nt] - OK ."
              [ ! $r -eq 1 ] && nfails=$((nfails+1))
              perc_failure=$(echo "100.0 - $nfails / $nt * 100.0" | bc -l)
              echo "************************************************ [% Success test $perc_failure % ]"
              nt=$((nt+1))
              x=$((RANDOM%9))
              echo -n " waiting $x sec ..."
              sleep $x
         done
    run_test_split_brain
    here is the makefile to run to two environments
    i run:
    - make run
    and in another window sh test_split_brain.sh
    node1?=10.10.32.121
    node2?=10.10.32.91
    nsite?=2
    debug?=0
    all: RepQuoteExampleEric install
    RepConfigInfo.o: RepConfigInfo.cpp RepConfigInfo.h
         g++ -I/usr/local/BerkeleyDB.5.1/include/ -g -O0 -c RepConfigInfo.cpp -o RepConfigInfo.o
    RepQuoteExampleEric: RepQuoteExampleEric.cpp RepConfigInfo.o
         g++ -I/usr/local/BerkeleyDB.5.1/include/ -g -O0 RepQuoteExampleEric.cpp RepConfigInfo.o -o RepQuoteExampleEric -L /usr/local/BerkeleyDB.5.1/lib/ -lreadline -lcurses -ldb_cxx
    kill:
         -ssh -X root@$(node1) "killall -9 /root/RepQuoteExampleEric"
         -ssh -X root@$(node2) "killall -9 /root/RepQuoteExampleEric"
    run: RepQuoteExampleEric kill install clean_env
         ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.110:12345 -r 2.0.0.210:12345 -a quorum -b -n $(nsite) -v | tee /tmp/BDB.log\"" &
         ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
    run_node2: clean_env2
         ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
    debug_node2: clean_env2
         ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
         sleep 3
         ssh -X root@$(node2) /sbin/pidof RepQuoteExampleEric >/tmp/pid
         ssh -X root@$(node2) ~/kdbg /root/db-5.1.19/examples/cxx/excxx_repquote/RepQuoteExampleEric -p `cat /tmp/pid`
    run_debug_node1: RepQuoteExampleEric kill install clean_env
         ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/kdbg /root/RepQuoteExampleEric\" " &
         ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v\"" &
    run_debug_node2: RepQuoteExampleEric kill install clean_env
         ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.110:12345 -r 2.0.0.210:12345 -a quorum -b -n $(nsite) -v\" " &
         ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/kdbg /root/RepQuoteExampleEric\"" &
    install: RepQuoteExampleEric
         scp RepQuoteExampleEric root@$(node1):~
         scp RepQuoteExampleEric root@$(node2):~
    clean_env: clean_env1 clean_env2
    clean_env1:
         ssh -X root@$(node1) rm -rf /opt/bdb/*
    clean_env2:
         ssh -X root@$(node2) rm -rf /opt/bdb/*

  • How to test if particular character is empty in a row

    Hi All ,
    Requirement : To split the input file into two files .
    If the length of the row is 93 characters and the condition is If characters from 28 to 32 are empty then create a file with the name PCO1 .txt else create a file with the name PCO2 .txt
    Please let me know how to test in a row if particular character has certain value or from 28 to 32 are empty .
    Thanks for your help

    hi Sharma,
    IF line+27(5) EQ space.
    ==> positions 28-32 have no value (i. e. all characters are spaces)
    ELSE.
    ==> positions 28-32 have some value
    ENDIF.
    some explanation:
    line+27(5) ==> this means: 5 characters after position 27
    hope this helps
    ec

  • How can I split long words to feet the line in JEditorPane

    Hi,
    Can someone help me ?
    I'm trying to write long words in JEditorPane,
    and I want the JEditorPane to split them, so it will feet the line...
    (something like word wrapping in JTExtArea).
    How can I fix my code so it will work?
    import java.awt.*;
    import javax.swing.*;
    import javax.swing.text.html.*;
    public class Testing extends JFrame {
        public Testing() {
            setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);
            Dimension screenSize = Toolkit.getDefaultToolkit().getScreenSize();
            setBounds((screenSize.width-210)/2, (screenSize.height-200)/2, 210, 200);
            editorPane = new JEditorPane();
            HTMLEditorKit kit = new HTMLEditorKit();
            editorPane.setEditorKit( kit );
            HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument();
            editorPane.setDocument( doc );
            editorPane.setEditable(false); // or true
            String htmlText =
                    "<html>"
                    +"<body>"
                    +"How can I use word wrapping with html JEditorPane?"
                    +"<br>How can I split the next long word, so it will feet the line?"
                    +"<br>loooooooooooooooooooooooooooooooooooooooong"
                    +"</body>"
                    +"</html>";
            try {
                kit.insertHTML(doc, editorPane.getCaretPosition(), htmlText.substring(6) + "<br>", 0, 0, HTML.Tag.HTML);
            catch( Exception exp ) {
                exp.printStackTrace();
            getContentPane().add(editorPane, BorderLayout.CENTER);
        public static void main(String args[]) {
            new Testing().setVisible(true);
        private JEditorPane editorPane;
    }Thanks,
    Maoz

    Sorry forget about paragraph view.
    Here the new code.
    regards,
    Stas
    import java.awt.*;
    import javax.swing.*;
    import javax.swing.text.html.*;
    import javax.swing.text.*;
    public class Test extends JFrame
        public Test()
            setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);
            Dimension screenSize = Toolkit.getDefaultToolkit().getScreenSize();
            setBounds((screenSize.width-210)/2, (screenSize.height-200)/2, 210, 200);
            editorPane = new JEditorPane();
            MyHTMLEditorKit kit = new MyHTMLEditorKit();
            editorPane.setEditorKit( kit );
            HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument();
            editorPane.setDocument( doc );
    //        editorPane.setEditable(false); // or true
            String htmlText =
                "<html>"
                +"<body>"
                +"<br>How can I split the next long word, so it will feet the line?"
                +"<br>loooooooooooooooooooooooooooooooooooooooong"
                +"</body>"
                +"</html>";
            try
                 kit.insertHTML(doc, editorPane.getCaretPosition(), htmlText.substring(6) + "<br>", 0, 0, HTML.Tag.HTML);
            catch( Exception exp )
                exp.printStackTrace();
            getContentPane().add(new JScrollPane(editorPane), BorderLayout.CENTER);
        public static void main(String args[])
            new Test().setVisible(true);
        private JEditorPane editorPane;
    class MyHTMLEditorKit extends HTMLEditorKit
        private static final ViewFactory defaultFactory = new MyViewFactory ();
        public ViewFactory getViewFactory()
             return defaultFactory;
        static class MyViewFactory extends HTMLFactory
            public View create(Element elem)
                View v=super.create(elem);
                if (v instanceof InlineView )
                    return new LabelView(elem);
                else if (v instanceof javax.swing.text.html.ParagraphView) {
                    return new MyParagraphView(elem);
                return v;
        static class MyParagraphView extends javax.swing.text.html.ParagraphView {
            public MyParagraphView(Element elem) {
                super(elem);
            protected SizeRequirements calculateMinorAxisRequirements(int axis, SizeRequirements r) {
                if (r == null) {
                    r = new SizeRequirements();
                float pref = layoutPool.getPreferredSpan(axis);
                float min = layoutPool.getMinimumSpan(axis);
                // Don't include insets, Box.getXXXSpan will include them.
                r.minimum = (int)min;
                r.preferred = Math.max(r.minimum, (int) pref);
                r.maximum = Short.MAX_VALUE;
                r.alignment = 0.5f;
                return r;
    }

  • Split brain syndrome in RAC

    As per Split brain syndrome in Oracle RAC in case of inter-connect failures the master node will evict other/dead nodes .
    Let say 2 node RAC configuration node 1 is defined as master node (by some parameter like load and others) incase of network failures node 1 will terminate node 2 from cluster.
    ;l
    what happens if master node, in this case node 1 fails. which will terminate node 1 and is node 2 will become master node ?

    Hi,
    It occurs when the instance members in a RAC fail to ping/connect to each other via this private interconnect, but the servers are all pysically up and running and the database instance on each of these servers is also running. These individual nodes are running fine and can conceptually accept user connections and work independently. So basically due to lack of commincation the instance thinks that the other instance that it is not able to connect is down and it needs to do something about the situation. The problem is if we leave these instance running, the sane block might get read, updated in these individual instances and there would be data integrity issue, as the blocks changed in one instance, will not be locked and could be over-written by another instance. Oracle has efficiently implemented check for the split brain syndrome.
    In RAC if any node becomes inactive, or if other nodes are unable to ping/connect to a node in the RAC, then the node which first detects that one of the node is not accessible, it will evict that node from the RAC group. e.g. there are 4 nodes in a rac instance, and node 3 becomes unavailable, and node 1 tries to connect to node 3 and finds it not responding, then node 1 will evict node 3 out of the RAC groups and will leave only Node1, Node2 & Node4 in the RAC group to continue functioning.
    The split brain concepts can become more complicated in large RAC setups. For example there are 10 RAC nodes in a cluster. And say 4 nodes are not able to communicate with the other 6. So there are 2 groups formed in this 10 node RAC cluster ( one group of 4 nodes and other of 6 nodes). Now the nodes will quickly try to affirm their membership by locking controlfile, then the node that lock the controlfile will try to check the votes of the other nodes. The group with the most number of active nodes gets the preference and the others are evicted. Moreover, I have seen this node eviction issue with only 1 node getting evicted and the rest function fine, so I cannot really testify that if thats how it work by experience, but this is the theory behind it.
    When we see that the node is evicted, usually oracle rac will reboot that node and try to do a cluster reconfiguration to include back the evicted node.
    You will see oracle error: ORA-29740, when there is a node eviction in RAC. There are many reasons for a node eviction like heart beat not received by the controlfile, unable to communicate with the clusterware etc.
    And also You can go through Metalink Note ID: 219361.1

  • Split Brain Scenario

    Hello,
    Whilst testing Dataguard with FSFO, I seem to have managed to achieve a split brain situation where 2 databases in the Data Guard configuration were considered as primary databases and both were available to receive client connections. I dont understand why this happened and I'm looking for some assurance that this is not a bug in Oracle.
    My Data Guard Configuration is as follows:-
    1 primary database (DBa)
    3 physical standby databases (DBb, DBc, DBd)
    All databases are single instance, i.e no RAC, and are running Oracle 11g (11.1.0.7) on RHEL 5.3
    DBb is the Fast Start Failover Target for DBa. The FSFO observer process is running on a stand-alone server called OBS1,
    To simulate a 'data centre disaster' i did the following :-
    1) Kill the SMON processes on the servers running DBa and DBb (Note I did not kill the Observer process)
    2) From DGMGRL on the server running DBc issue the following commands :-
    DGMGRL> disable fast_start failover force (Without doing this I could not issue the subsequent failover command)
    DGMGRL> failover to DBc ;
    This worked as expected and DBc was established as the new primary database in the configuration. DBd continued to function correctly as a stand db. Subsequent client connections were routed to DBc as expected.
    3) I then attempted to simulate the two failed databases DBa and DBb rejoining the configuration. Firstly I put DBa into MOUNT status using the STARTUP MOUNT command from the SQLPLUS command line.
    4) Before I did anything with DBb, the Observer process that was still running on OBS1, detected that DBa was 'active' again and OPENed the database. In doing this it took no notice of the fact that DBc was already open and acting as the primary database in the configuration. The result of this was that two databases - DBa and DBc in the configuration were in an an OPEN state and acting as a primary database i.e Split Brain. The TNSNAMES.ora configuration on the Oracle machines meant that it was now perfectly possible for client connection to be spread over both these machines.
    I am very concerned as to why Oracle allowed the above situation to happen. Was my test unreasonable or should Oracle have detected that DBc was the new primary database after I attmepted to restart DBa in MOUNT state?? I now understand that if I had also issued the STOP OBSERVER command in DGMGRL, after issuing the FAILOVER ro DBc command, then the FSFO Observer could not have OPENed DBa once DBc had become the primary database, so is this the only mechanism that must be used in the above scenario to prevent a Split brain ?
    Any advice would be greatly appreciated.
    Thanks,
    Shaun

    Your environment is complex enough, three standbys, I've not worked in a similar environment. My recommendation would be to open an SR at metalink and please let us know what they tell you as we may very well find ourselves with multiple standbys in the future.

  • Some quick help needed with certificates and split brain dns.

    I run exch 2010 and have one cas server(srv03).  I have split brain dns configured and working in my system.  I got a new certificate this year because of the new regulations that won't allow .internal names in the san portion of an ssl cert.
     I have followed several tids on the internet and still when I tried to implement it today the outlook clients started getting a popup that says [the name on the certificate is invalid or does not match the name of the site]  At the top of this popup
    is srv03.abccorp.internal which is what it was before.
    The certificate is for mail.abccorp.com and also includes autodiscover.abccorp.com and srv03.abccorp.com.  
    When I run [Get-clientAccessServer | fl Name,AutoDiscoverServiceInternalUri] the name and the Url is correct and has the .com value.
    When I run the test email autoconfiguration from my Outlook icon, and look at the log, Autodiscover URL found through SCP, is correct and it says Succeeded at the end.  In the results tab however the Server, Availability Service, OOF URL are still showing
    the .internal instead of .com.  The Internal OWA, External OWA and the OAB are correctly displaying the .com.  What commands do I need to run to change these as they seem to be the problem.
    I wasted a lot of time chasing the autodiscover before I found out about this test in outlook and realized the autodiscover url was correct. :-)
    I have two days left on my old cert that has both .com and .internal SANs so I rolled that back into service so the users stop getting messages.  Any help would be appreciated.

    Hi OTS,
    You can run the following command to Change the InternalUrl attribute of the EWS:
    Set-WebServicesVirtualDirectory -Identity "CAS_Server_Name\EWS (Default Web Site)" -InternalUrl https://mail.abccorp.com/ews/exchange.asmx
    Best regards,
    Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact [email protected]
    Niko Cheng
    TechNet Community Support

  • CUC 10.0.1 cluster status stuck in Split Brain Recovery (SBR) on Primary server - HA reports fine.

    Hi,
    Have a 10.01.11900 CUC cluster and everything is working fine (no one having issues with voice mail, etc) but the cluster status reports is not consistent. 
    DBreplication is showing 2 on both servers. 
    Primary unity server cluster status shows Primary/split brain recovery.
    HA Unity server cluster status shows Primary/Secondary.
    utils diagnose test - everything tests fine except the tomcat_connectors test.
    test - tomcat_connectors   : Failed - The HTTPS port is not responding to local requests.  Please collect all of the Tomcat logs for root cause analysis: file get activelog tomcat/logs/*
    We've shutdown the HA server and rebooted primary, and then waited awhile after primary was back up/active before bringing the HA server back up and still same.
    We reset DB replication and same. 
    On the HA server I made the HA primary and the cluster status flipped to Seconday/Primary and I then made primary the primary again, but the primary server cluster status always shows Split Brain Recovery for the secondary/HA server. 
    No core dumps on either server and all services are started. 
    Any one seen this before or have any thoughts?  I have a TAC Case on this but so far in same boat. 
    Would the utils cuc cluster renegotiate command help? Did not replace a server so don't really want to overwrite data to publisher server. Issue seems to be with the publisher since HA shows fine but not sure. I don't want to lose messages/etc so don't want really want to run these commands.  
    Thanks.

    Ok, thanks.
    The SRM logs indicate the Connection Digital Networking Replication Agent service is not running, however when I start it it stops right away and the cuReplicator log states digital networking is not enabled. 
    From SRM Log:
    23:47:20.100 |17755,,,SRM,7,<svcmon> checkServiceStatus: started service monitoring
    23:47:20.100 |17755,,,SRM,7,<svcmon> Service Status: 1 service(s) not running. Service name(s):
    23:47:20.100 |17755,,,SRM,7,<svcmon> Connection Digital Networking Replication Agent
    23:47:24.674 |28471,,,SRM,11,<Timer-3> [snd] Type: Heartbeat
    From Replicator log:
    admin:file tail activelog cuc/diag_CuReplicator_00000049.uc
    23:42:59.208 HDR|09/14/2014 ,Significant
    23:42:59.208 |28914,,,CuReplicator,0,Digital Networking is not enabled. Replicator will stop now.
    There is no digital networking setup to other unity systems, and only one location. 
    Also, the Server role manager can't be restarted from CLI or the GUI so either root or a server reboot. 
    I compared it to another CUC cluster and deactivated the Digital Networking service and the SRM logs seem happier now, will wait a bit and see if it clears the SBR status up. 

  • Split Brain handling in oracle 10g

    How does oracle10g deal with split brain? Does it use voting disk or files or does it use any of the SCSI protocols ?

    It uses the CRS voting disk to determine who should and shouldn't continue.

  • Webform HFM : How can I split a customheader in col  in 2 lines ?

    Dear All,
    How can I split a customheader in column in 2 lines ?
    For row , we can use ";" for return next line. But in column It'sn't work.
    Thanks for your help
    Jean Daniel

    Dear 634 Dp,
    I try it but I have the same result.
    Could you send me a example of script ?
    I have typing that : C4=C1#P020,CustomHeader:Test L1; Test L2
    C4=C1#P020,CustomHeader:Test L1; Test L2
    Thanks a lot for your help
    Jean

  • Quick questions (bulk operations, WAN, provisioning, split brain)

    Thank you for your assistance with the following questions..
    1. Can I use the "disk overflow" feature when using the "partitioned cache" topology? I need to just do puts and gets.
    2. Are bulk operations (i.e. bulk put, bulk get, bulk erase) available in all cache topologies? I am interested in bulk put/get when using a "partitioned cache" and when using your "WAN capability" to connect geographical dispersed clusters (Coherence*Extend). Do you guarantee data and event delivery over the WAN?
    3. Does the WAN capability (Coherence*Extend) allow multiple clusters to talk with each other (N:N) or does it only support a "star topology" (1 main cluster connected to N other "peripheral" clusters)?
    4. As a result of a load bottleneck (i.e. a cache server(s) running too hot) in a "partitioned region" topology, can I "on-the-fly" add new cache servers/boxes to load balancing my "client load"? What client load balancing policies are typically recommended (i.e. sticky, random)? How does dynamic provisioning of new servers (e.g., adding more RAM to my distributed cache) affect the hashing of my map regions and my network bandwidth? What kind of performance would I expect if I add one or more new cache members when I was close to have an OOM on one or more of my already running cache servers? Even if I was not close to an OOM, what kind of network spike would I expect and what kind of client operation latency should I expect?
    5. Could I have more information to understand your "data location transparency"?
    6. How do you solve "split brain" problems (i.e. sets of members get separated and later try to rejoin cluster)? Do you have or is it possible to implement a "quorum-based policy" to decide who should live and who should die? We have seen "split brain" problems with messaging retransmission storms in the past and are looking for options.

    Hi user614602,
    1 - 3. Yes to all.
    4 - 6. Those are quite advanced and open-ended questions (and by no means "quick"). Please contact your sales rep to schedule a conference call to discuss your requirements and how Coherence addresses the issues you rise.
    Regards,
    Gene

  • HSRP "Split Brain" on the STP Topology

    Hello. I'm a network administrator in my company.
    I have a question about HSRP "Split Brain" on the STP Topology.
    verifying to HSRP Down Time that attached network topology.
    Trying "ICMP Ping" from PC to HSRP Virtual IP.
    I have found unexpected senario. PIng goes down when reboot the L2 Core SW 1 and Router 1 HSRP goes Active from Init Status.
    Why ping goes down?
    Router 1's Gratuitouse ARP, It's should not be transffered to L2 Core SW 2?
    Sorry to trouble you, Could you please teach.

    Thank you for your reply, rejeevh. I retried "L2 Core SW 1 Down Test".
    From a result, My verification senario was wrong.
    It was not shown in the figure, there is another link of both routers to other routers over the Core SWs that enable OSPF and "redistribute connected" in practice.
    "ping from PC to HSRP Virtual IP" was wrong. "ping from PC to another OSPF Router's Interface" is correct senario.
    I verified correct senario rebooting L2 Core SW 1. Also, ping goes down.
    but this result was simple, It was dropped at SW 1's EtherChannel Interface in STP LIS/LER status when recieved return packet from another Router. (SW 1's other Interface was enabled Portfast or Portfast Trunk. )
    I was confirmed the result was improved, It was enable Portfast Trunk in Etherchannel Interfaces of SW 2 and SW 1.
    Thank you very much for your reply.

  • Split Brain Syndrome

    As oracle says voting disk is used for avoiding split brain syandrome.
    Can any one please explain me in details how voting disk is used for avoiding split brain syndrome.
    Can i use Solaris ipmp concept to avoid interconnect failure and split brain syandrome...if yes then why to use voting disk
    Regards,
    Yasser.

    The oracle clusterware is responsible for maintaining cluster health.
    In order to do so, the cluster daemons keep track of local and remote health.
    Local health means the cluster deamons can use a kernel module to see if it got 'lost time' which means the system is too busy (and can stable in a cluster). Also if the machine can not write to the voting disk, the machine is rebooted to avoid erroneous writings to disks. the rebooting is called 'stonith' (shoot the other node in the head) and the prevention of writing erroneous information is called 'fencing'.
    Remote health is determined by sending network packets through the private network, and by writing in the voting disk or voting disks.
    Also, the machines in a cluster do a voting every time the number of machines in a cluster modifies, and determine a master. The results of the voting a written to the voting disk, and every machine in the cluster keeps on writing it is still available in the cluster. that way a dead machine can be both by the network and by looking at the voting disk.
    If the network between two machines in a cluster is disturbed, the cluster is said to have a 'split brain'.
    Because of the voting disk or disks, the split brain can be solved by the master by terminating the other or others.
    Since oracle 10, the only clusterware supported for RAC is the oracle clusterware. the network and voting disk are mandatory components in oracle clusterware.
    this means the voting disk is needed, even with ip multipathing (ipmp)

Maybe you are looking for