Recovery from split brain

We are observing the following. In a 2 site replication group
1. Both nodes startup as masters.
2.. They write certain configuration to the DB.
3. They probably do a checkpoint (we have not been able to verify this, it is a time based checkpoint)
4. They connect hold a election.
5. After election one node looses and is demoted to slave
6. Slave should at this point read data from the DB which is consistent with the master but this is not the case
7. When Slave process is restarted (with DB_RECOVER) the slave reads data consistent with master
This seems like a bug with BDB.. any clues on how to handle such a situation?

In order to read data consistent with the new master, the client
should wait until it gets the DB_EVENT_REP_STARTUPDONE event.
When you say waiting several hours, do you mean several hours after
the reconnect, or several hours after the master becomes a client? I
presume you're detecting the master becoming a client by having
received the DB_EVENT_REP_CLIENT event; is that correct?
After the reconnect, if neither site is generating new transactions
for a while, there would be nothing to cause the other site to realize
that there were two masters. Is it possible there are no new
transactions being generated during those several hours?
If you can reproduce this behavior, it should be fairly easy to see
what's going on, by turning on verbose diagnostic output. When either
master detects that there is another master, it sends a message of
"type dupmaster"; you should be able to see that in the verbose
output. After that one or more election attempts should be evident,
and finally you should be able to see "Start-up is done" at the client site.
Ritesh S, wrote:
What happens if the two masters write data to DB and checkpoint and
then one gets demoted to slave?I'm sorry, I don't understand what you mean here. Isn't this the
scenario I've just described?
Alan Bram
Oracle

Similar Messages

CUC 10.0.1 cluster status stuck in Split Brain Recovery (SBR) on Primary server - HA reports fine.

Hi,
Have a 10.01.11900 CUC cluster and everything is working fine (no one having issues with voice mail, etc) but the cluster status reports is not consistent.
DBreplication is showing 2 on both servers.
Primary unity server cluster status shows Primary/split brain recovery.
HA Unity server cluster status shows Primary/Secondary.
utils diagnose test - everything tests fine except the tomcat_connectors test.
test - tomcat_connectors : Failed - The HTTPS port is not responding to local requests. Please collect all of the Tomcat logs for root cause analysis: file get activelog tomcat/logs/*
We've shutdown the HA server and rebooted primary, and then waited awhile after primary was back up/active before bringing the HA server back up and still same.
We reset DB replication and same.
On the HA server I made the HA primary and the cluster status flipped to Seconday/Primary and I then made primary the primary again, but the primary server cluster status always shows Split Brain Recovery for the secondary/HA server.
No core dumps on either server and all services are started.
Any one seen this before or have any thoughts? I have a TAC Case on this but so far in same boat.
Would the utils cuc cluster renegotiate command help? Did not replace a server so don't really want to overwrite data to publisher server. Issue seems to be with the publisher since HA shows fine but not sure. I don't want to lose messages/etc so don't want really want to run these commands.
Thanks.

Ok, thanks.
The SRM logs indicate the Connection Digital Networking Replication Agent service is not running, however when I start it it stops right away and the cuReplicator log states digital networking is not enabled.
From SRM Log:
23:47:20.100 |17755,,,SRM,7,<svcmon> checkServiceStatus: started service monitoring
23:47:20.100 |17755,,,SRM,7,<svcmon> Service Status: 1 service(s) not running. Service name(s):
23:47:20.100 |17755,,,SRM,7,<svcmon> Connection Digital Networking Replication Agent
23:47:24.674 |28471,,,SRM,11,<Timer-3> [snd] Type: Heartbeat
From Replicator log:
admin:file tail activelog cuc/diag_CuReplicator_00000049.uc
23:42:59.208 HDR|09/14/2014 ,Significant
23:42:59.208 |28914,,,CuReplicator,0,Digital Networking is not enabled. Replicator will stop now.
There is no digital networking setup to other unity systems, and only one location.
Also, the Server role manager can't be restarted from CLI or the GUI so either root or a server reboot.
I compared it to another CUC cluster and deactivated the Digital Networking service and the SRM logs seem happier now, will wait a bit and see if it clears the SBR status up.

Recovery from failure question

In order to use two racks of servers (in our case located in separate buildings) and make sure that primary and backup (backup count 1) of each partition are allocated to nodes in different racks I have understood that there is a "machine id" that can be assigned to each node.
     As far as I understand it the cluster will (when configured this way) survive failure of one rack (as long as no more failures occur before the other rack is online again), ie it will execute in a lower "safely mode". I have two questions about this:
     1. Lets say that one rack looses connection with the rest of the network long enough for the cluster to consider the other rack to be down. This will happen with both racks. What happens when the racks once again can communicate? Will they automatically sort out the "split brain" situation?
     2. Lets assume that one of the racks A is hit by a power spike that causes the switch and all the servers except one to rebooted. This will totally cut the failed rack off from the other rack while the switch re-boots and the nodes in rack B will consider all nodes in rack A to be down. After the switch is re-booted the nodes one still working server in rack A will be able to communicate with the nodes in rack B again. My question is then if Coherent in this situation will try to creating backups for all partitions of rack B in the nodes on the single available server in rack A? If this is the case the nodes in that server will most likely quickly run out of heap. What will happen in this situation? Will coherence retry - rebalancing when more nodes join the cluster or will the cluster crash?
     Best Regards
     Magnus

The second answer makes me a bit confused - two     > follow-up questions:
     >
     > 2A - If I use machine id 0 for N servers in rack A
     > and machine id 1 for N servers in rack B I had the
     > impression that primary and backup could NEVER end up
     > in the same rack (even after failures - ie the system
     > would run with only primaries if all nodes with say
     > machine id 1 went down) - is this not the way it
     > works?
     Never is not exactly true. Only if it is theoretically possible to fit all the data in distinct nodes. But e.g. if you have two boxes, with the cluster nodes on one of the boxes not providing enough capacity to hold an entire copy of the data-set, the two copies still have to exist, then inevitably some of the duplicate data will end up on the nodes on the larger box, since they cannot fit into the smaller one. Also be aware that data distribution is done with the granularity of partitions. You cannot distribute a single partition to two nodes, so too uneven partitioning of the data-set can lead to problems.
     > 2B - My concern was that in the case of failures (or
     > as I stated my question during recovery from
     > failures) the number of physical servers (and
     > therefore nodes!) with machine id say zero and one
     > may be different (it seems quite hard to make sure
     > that this cant happen!) and in this case (say for
     > instance that only one out of N servers with machine
     > id zero comes up immediately, the other N - 1 comes
     > up much later because they were forced to do a disk
     > check or required operator intervention to start
     > after a failure) the memory of the single server
     > (heap space of its nodes) will not be enough to
     > accommodate backups for all the primaries on all the
     > N servers with the other machine id.
     >
     While and immediately after you are losing nodes (when their death is detected) partition backups will be attempted to be promoted to primary copies. This might cause OutOfMemoryErrors in those nodes. I do not know if Coherence is able to recover from that (theoretically it should be possible to drop the newly created object references to get back to the state before the attempt to promote the partition backup to a primary partition).
     So during nodes dying you can either temporarily or permanently lose access to data for which primaries were on boxes which died and the reconstruction from the backup caused OutOfMemoryError. If the Coherence JVM is not able to recover from that OutOfMemoryError then you might lose other primaries which reside on this cluster node and have a ripple effect because of this.
     You can reduce the risk of such events by having more than two racks if multiple nodes in racks can fail together, and also sizing the cluster JVMs to have lower memory utilization due to cached data. This way a rack failure will cause for one less primaries lost and for two, more free memory to reconstruct data in. You can read the Production Checklist for some information on sizing the cluster.
     Whatever happens, I don't expect that you will lose data due to OutOfMemory errors when you do not lose a cluster node. So after that blackout on one of the racks, and after the cluster reached a quiescent state as far as redistribution of partitions is concerned, you will not lose data just because you added back a node (I expect a copy of a backup is not dropped before the safe reception of the backup at the transfer destination is acknowledged).
     If there are not enough cluster nodes running on one of the racks to accommodate the entire data-set, your data will not be balanced in a way that copies of the same data are fully on different, but they will still reside on separate cluster nodes. When more nodes are started then gradually they will be rebalanced to hold more data on separate racks.
     Best regards,
     Robert

Election problem after repeated split-brains with two nodes

Hi
I'm using a customized source based on BDB-5.1.19 (excxx_repquote)
with two site one - MASTER and the other SLAVE...
nsite=2
ack=quorum
- the master is writing to quotedb at a rate of 10 txn per sec
- the test consist to isolate the client from the master (split brain) and reconnect it after a random time include from 1sec to 10sec
the test run well about 10 times but at a moment the process slave receive DB_EVENT_REP_ELECTION_FAILED
and the master enter in election mode and never exit from the CLIENT mode. I must say that to freeze the client I decide to kill me (kill -9 my pid) when I receive such event...
here is the verbose log on the master...
[1307872770:871621][6510/47655809107168] MASTER: rep_send_function returned: 110
[1307872770:973655][6510/47655809107168] MASTER: bulk_msg: Send buffer after copy due to PERM
[1307872770:973667][6510/47655809107168] MASTER: send_bulk: Send 266 (0x10a) bulk buffer bytes
[1307872770:973672][6510/47655809107168] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type bulk_log, LSN [21][986648] perm
[1307872770:973693][6510/47655809107168] MASTER: will await acknowledgement: need 1
[1307872771:26623][6510/47655809107168] MASTER: rep_send_function returned: 110
[1307872771:126380][6510/1162996032] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type log, LSN [21][946345]
[1307872771:126407][6510/1162996032] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type dupmaster, LSN [0][0] nobuf
[1307872771:126695][6510/1162996032] MASTER: rep_start: Found old version log 17
[1307872771:126753][6510/1162996032] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type newclient, LSN [0][0] nobuf
[1307872771:126833][6510/1183975744] CLIENT: starting election thread
[1307872771:126876][6510/1183975744] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872771:126890][6510/1183975744] CLIENT: Election thread owns egen 69
[1307872771:127423][6510/1173485888] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type newclient, LSN [0][0]
[1307872771:130079][6510/1183975744] CLIENT: Tallying VOTE1[0] (2147483647, 69)
[1307872771:130113][6510/1183975744] CLIENT: Beginning an election
[1307872771:130134][6510/1183975744] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type vote1, LSN [21][986728] nobuf
[1307872771:130147][6510/1173485888] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 68 eid -1, type master_req, LSN [0][0] nobuf
[1307872771:130438][6510/1152506176] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote1, LSN [21][946437]
[1307872771:130460][6510/1162996032] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [21][986728]
[1307872771:130467][6510/1152506176] CLIENT: Updating gen from 68 to 70
[1307872771:130482][6510/1162996032] CLIENT: Received ALIVE egen of 71, mine 69
[1307872771:130503][6510/1162996032] CLIENT: Election finished in 0.003602000 sec
[1307872771:130515][6510/1162996032] CLIENT: Election done; egen 70
[1307872771:130534][6510/1152506176] CLIENT: Received vote1 egen 71, egen 71
[1307872771:130581][6510/1152506176] CLIENT: Tallying VOTE1[0] (0, 71)
[1307872771:130593][6510/1089075520] CLIENT: starting election thread
[1307872771:130619][6510/1152506176] CLIENT: Incoming vote: (eid)0 (pri)100 ELECTABLE (gen)70 (egen)71 [21,946437]
[1307872771:130642][6510/1152506176] CLIENT: Not in election, but received vote1 0x282c 0x8
[1307872771:130674][6510/1089075520] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872771:130692][6510/1089075520] CLIENT: Election thread owns egen 71
[1307872771:130704][6510/1194465600] CLIENT: starting election thread
[1307872771:130733][6510/1194465600] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872771:132922][6510/1089075520] CLIENT: Tallying VOTE1[1] (2147483647, 71)
[1307872771:132949][6510/1089075520] CLIENT: Accepting new vote
[1307872771:132958][6510/1089075520] CLIENT: Beginning an election
[1307872771:132973][6510/1089075520] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type vote1, LSN [21][986728] nobuf
[1307872771:132985][6510/1194465600] CLIENT: election thread is exiting
[1307872771:133012][6510/1089075520] CLIENT: Tallying VOTE2[0] (2147483647, 71)
[1307872771:133037][6510/1089075520] CLIENT: Counted my vote 1
[1307872771:133048][6510/1089075520] CLIENT: Skipping phase2 wait: already got 1 votes
[1307872771:133060][6510/1089075520] CLIENT: Got enough votes to win; election done; (prev) gen 70
[1307872771:133071][6510/1089075520] CLIENT: Election finished in 0.002367000 sec
[1307872771:133084][6510/1089075520] CLIENT: Election done; egen 72
[1307872771:133111][6510/1089075520] CLIENT: Ended election with 0, e_th 1, egen 72, flag 0x2a2c, e_fl 0x0, lo_fl 0x6
[1307872771:133170][6510/1173485888] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [0][0]
[1307872771:133187][6510/1173485888] CLIENT: Racing replication msg lockout, ignore message.
[1307872771:173744][6510/1162996032] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote2, LSN [0][0]
[1307872771:173769][6510/1162996032] CLIENT: Racing replication msg lockout, ignore message.
[1307872771:231593][6510/1183975744] CLIENT: Ended election with 0, e_th 0, egen 72, flag 0x2a2c, e_fl 0x0, lo_fl 0x1c
[1307872771:231629][6510/1183975744] CLIENT: election thread is exiting
[1307872777:443794][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307872971:644194][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873165:844583][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873360:44955][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873554:245347][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873748:445736][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307873942:646117][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
[1307874136:846509][6510/1131526464] CLIENT: init connection to site 2.0.0.210:12345 with result 115
.... and infinite stay to this situation
My question is why the Master is suddenly transformed into CLIENT and why it's never returning to the MASTER
Thanks in advance ...
here is the log for the client
[1307872315:455113][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][984396]
[1307872315:455134][1282/1160603968] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][984483] perm
[1307872315:609962][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][984733] perm
[1307872315:764958][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][984986] perm
[1307872315:919962][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985238] perm
[1307872316:75018][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985491] perm
[1307872316:229959][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985741] perm
[1307872316:384949][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][985993] perm
[1307872316:499899][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][986141] perm
[1307872316:539895][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986221]
[1307872316:540078][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986307]
[1307872316:540100][1282/1160603968] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type log, LSN [21][986394] perm
[1307872316:694950][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type bulk_log, LSN [21][986648] perm
[1307872316:847349][1282/1129134400] MASTER: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type log, LSN [21][946345]
[1307872316:847698][1282/1171093824] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type dupmaster, LSN [0][0]
[1307872316:847999][1282/1181583680] MASTER: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type newclient, LSN [0][0]
[1307872316:848168][1282/1171093824] MASTER: rep_start: Found old version log 17
[1307872316:848222][1282/1181583680] CLIENT: Racing replication msg lockout, ignore message.
[1307872316:848398][1282/1171093824] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type newclient, LSN [0][0] nobuf
[1307872316:848504][1282/1192073536] CLIENT: starting election thread
[1307872316:848542][1282/1192073536] CLIENT: Start election nsites 2, ack 1, priority 100
[1307872316:848566][1282/1192073536] CLIENT: Election thread owns egen 71
[1307872316:849634][1282/1192073536] CLIENT: Tallying VOTE1[0] (2147483647, 71)
[1307872316:849654][1282/1192073536] CLIENT: Beginning an election
[1307872316:849680][1282/1192073536] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid -1, type vote1, LSN [21][946437] nobuf
[1307872316:851403][1282/1160603968] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type vote1, LSN [21][986728]
[1307872316:851448][1282/1160603968] CLIENT: Received vote1 egen 69, egen 71
[1307872316:851470][1282/1160603968] CLIENT: Received old vote 69, egen 71, ignoring vote1
[1307872316:851481][1282/1160603968] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [21][986728] nobuf
[1307872316:851538][1282/1171093824] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 68 eid 0, type master_req, LSN [0][0]
[1307872316:851558][1282/1171093824] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type alive, LSN [0][0] nobuf
[1307872316:854254][1282/1160603968] CLIENT: /opt/bdb/ rep_process_message: msgv = 5 logv 17 gen = 70 eid 0, type vote1, LSN [21][986728]
[1307872316:854275][1282/1160603968] CLIENT: Received vote1 egen 71, egen 71
[1307872316:854317][1282/1160603968] CLIENT: Tallying VOTE1[1] (0, 71)
[1307872316:854339][1282/1160603968] CLIENT: Incoming vote: (eid)0 (pri)100 ELECTABLE (gen)70 (egen)71 [21,986728]
[1307872316:854353][1282/1160603968] CLIENT: Existing vote: (eid)2147483647 (pri)100 (gen)70 (sites)2 [21,946437]
[1307872316:854369][1282/1160603968] CLIENT: Accepting new vote
[1307872316:854379][1282/1160603968] CLIENT: Phase1 election done
[1307872316:854395][1282/1160603968] CLIENT: Voting for 0
[1307872316:854407][1282/1160603968] CLIENT: /opt/bdb/ rep_send_message: msgv = 5 logv 17 gen = 70 eid 0, type vote2, LSN [0][0] nobuf
[1307872317:960344][1282/1192073536] CLIENT: After phase 2: votes 0, nvotes 1, nsites 2
[1307872317:960389][1282/1192073536] CLIENT: Election finished in 1.111809000 sec
[1307872317:960401][1282/1192073536] CLIENT: Election done; egen 72
[1307872317:960412][1282/1192073536] CLIENT: Ended election with -30974, e_th 0, egen 72, flag 0x282c, e_fl 0x0, lo_fl 0x0
Kill me !!
--- my source
on the master I run manually :
txn_rate 1
loop_rate 10
loop 1 20000
* See the file LICENSE for redistribution information.
* Copyright (c) 2001, 2010 Oracle and/or its affiliates. All rights reserved.
* $Id$
* In this application, we specify all communication via the command line. In
* a real application, we would expect that information about the other sites
* in the system would be maintained in some sort of configuration file. The
* critical part of this interface is that we assume at startup that we can
* find out
*      1) what our Berkeley DB home environment is,
*      2) what host/port we wish to listen on for connections; and
*      3) an optional list of other sites we should attempt to connect to.
* These pieces of information are expressed by the following flags.
* -h home (required; h stands for home directory)
* -l host:port (required; l stands for local)
* -C or -M (optional; start up as client or master)
* -r host:port (optional; r stands for remote; any number of these may be
*     specified)
* -R host:port (optional; R stands for remote peer; only one of these may
* be specified)
* -a all|quorum (optional; a stands for ack policy)
* -b (optional; b stands for bulk)
* -n nsites (optional; number of sites in replication group; defaults to 0
*     to try to dynamically compute nsites)
* -p priority (optional; defaults to 100)
* -v (optional; v stands for verbose)
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <string>
#include <sstream>
#include <sys/types.h>
#include <signal.h>
#include <db_cxx.h>
#include "RepConfigInfo.h"
#include "dbc_auto.h"
using std::cout;
using std::cin;
using std::cerr;
using std::endl;
using std::ends;
using std::flush;
using std::istream;
using std::istringstream;
using std::ostringstream;
using std::string;
using std::getline;
#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>
#define     CACHESIZE     (10 * 1024 * 1024)
#define     DATABASE     "quote.db"
#define     DATABASE2     "quote2.db"
const char *progname = "excxx_repquote";
#include <errno.h>
#ifdef _WIN32
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#define     snprintf          _snprintf
#define     sleep(s)          Sleep(1000 * (s))
extern "C" {
extern int getopt(int, char * const *, const char *);
extern char *optarg;
typedef HANDLE thread_t;
typedef DWORD thread_exit_status_t;
#define     thread_create(thrp, attr, func, arg)                    \
(((*(thrp) = CreateThread(NULL, 0,                         \
     (LPTHREAD_START_ROUTINE)(func), (arg), 0, NULL)) == NULL) ? -1 : 0)
#define     thread_join(thr, statusp)                         \
((WaitForSingleObject((thr), INFINITE) == WAIT_OBJECT_0) &&          \
GetExitCodeThread((thr), (LPDWORD)(statusp)) ? 0 : -1)
#else /* !_WIN32 */
#include <pthread.h>
typedef pthread_t thread_t;
typedef void* thread_exit_status_t;
#define     thread_create(thrp, attr, func, arg)                    \
pthread_create((thrp), (attr), (func), (arg))
#define     thread_join(thr, statusp) pthread_join((thr), (statusp))
#endif
// Struct used to store information in Db app_private field.
typedef struct {
     bool app_finished;
     bool in_client_sync;
     bool is_master;
     bool no_dummy_wr;
} APP_DATA;
static void log(const char *);
void checkpoint_thread (void );
void log_archive_thread (void );
void dummy_write_thread (void );
class RepQuoteExample {
public:
     RepQuoteExample();
     void init(RepConfigInfo* config);
     void doloop();
     int terminate();
     static void event_callback(DbEnv* dbenv, u_int32_t which, void *info);
     void print_stocks_size(Db *dbp);
private:
     // disable copy constructor.
     RepQuoteExample(const RepQuoteExample &);
     void operator = (const RepQuoteExample &);
     // internal data members.
     APP_DATA          app_data;
     RepConfigInfo *app_config;
     DbEnv          cur_env;
     thread_t ckp_thr;
     thread_t lga_thr;
     thread_t dmy_thr;
     // private methods.
     void print_stocks(Db *dbp);
     void print_env(DbEnv *dbenv);
     void prompt();
RepQuoteExample *g_runner=NULL;
RepConfigInfo *g_config=NULL;
class DbHolder {
public:
     DbHolder(DbEnv env, const char _dbname) : env(env)
          dbp = 0;
          if (_dbname) dbname=_dbname;
          else dbname=DATABASE;
     ~DbHolder() {
     try {
          close();
     } catch (...) {
          // Ignore: this may mean another exception is pending
     bool ensure_open(bool creating) {
     if (dbp)
          return (true);
     dbp = new Db(env, 0);
     u_int32_t flags = DB_AUTO_COMMIT;
     if (creating)
          flags |= DB_CREATE;
     try {
          //dbp->open(NULL, DATABASE, NULL, DB_BTREE, flags, 0);
          //dbp->open(NULL, dbname, NULL, DB_BTREE, flags, 0);
          dbp->open(NULL, NULL, dbname, DB_BTREE, flags, 0);
          return (true);
     } catch (DbDeadlockException e) {
     } catch (DbRepHandleDeadException e) {
     } catch (DbException e) {
          if (e.get_errno() == DB_REP_LOCKOUT) {
          // Just fall through.
          } else if (e.get_errno() == ENOENT && !creating) {
          // Provide a bit of extra explanation.
          log("Stock DB does not yet exist");
          } else
          throw;
     // (All retryable errors fall through to here.)
     log("please retry the operation");
     close();
     return (false);
     void close() {
     if (dbp) {
          try {
          dbp->close(0);
          delete dbp;
          dbp = 0;
          } catch (...) {
          delete dbp;
          dbp = 0;
          throw;
     operator Db *() {
     return dbp;
     Db *operator->() {
     return dbp;
private:
     Db *dbp;
     DbEnv *env;
     const char *dbname;
class StringDbt : public Dbt {
public:
#define GET_STRING_OK 0
#define GET_STRING_INVALID_PARAM 1
#define GET_STRING_SMALL_BUFFER 2
#define GET_STRING_EMPTY_DATA 3
     int get_string(char **buf, size_t buf_len)
          size_t copy_len;
          int ret = GET_STRING_OK;
          if (buf == NULL) {
               cerr << "Invalid input buffer to get_string" << endl;
               return GET_STRING_INVALID_PARAM;
          // make sure the string is null terminated.
          memset(*buf, 0, buf_len);
          // if there is no string, just return.
          if (get_data() == NULL || get_size() == 0)
               return GET_STRING_OK;
          if (get_size() >= buf_len) {
               ret = GET_STRING_SMALL_BUFFER;
               copy_len = buf_len - 1; // save room for a terminator.
          } else
               copy_len = get_size();
          memcpy(*buf, get_data(), copy_len);
          return ret;
     size_t get_string_length()
          if (get_size() == 0)
               return 0;
          return strlen((char *)get_data());
     void set_string(char *string)
          set_data(string);
          set_size((u_int32_t)strlen(string));
     StringDbt(char *string) :
     Dbt(string, (u_int32_t)strlen(string)) {};
     StringDbt() : Dbt() {};
     ~StringDbt() {};
     // Don't add extra data to this sub-class since we want it to remain
     // compatible with Dbt objects created internally by Berkeley DB.
Db *g_repquote=NULL;
RepQuoteExample::RepQuoteExample() : app_config(0), cur_env(0) {
     app_data.app_finished = 0;
     app_data.in_client_sync = 0;
     app_data.is_master = 0; // assume I start out as client
     app_data.no_dummy_wr = 0 ; //prevent to run dummy write
int (*old_rep_process_message)
          __P((DB_ENV *, DBT *, DBT *, int, DB_LSN *));
int my_rep_process_message __P((DB_ENV arg1, DBT arg2, DBT arg3, int arg4, DB_LSN arg5))
     printf("EZ->>> my_rep_process_message:%p\n",arg5);
     old_rep_process_message(arg1,arg2,arg3,arg4,arg5);
void RepQuoteExample::init(RepConfigInfo *config) {
     app_config = config;
     cur_env.set_app_private(&app_data);
     cur_env.set_errfile(stderr);
     app_data.no_dummy_wr=config->no_dummy_wr;
     if (app_data.no_dummy_wr)
          printf("No dummy !!!\n");
     //EZ->cur_env.set_errpfx(progname);
     cur_env.set_event_notify(event_callback);
     // Configure bulk transfer to send groups of records to clients
     // in a single network transfer. This is useful for master sites
     // and clients participating in client-to-client synchronization.
     if (app_config->bulk)
          cur_env.rep_set_config(DB_REP_CONF_BULK, 1);
     // Set the total number of sites in the replication group.
     // This is used by repmgr internal election processing.
     if (app_config->totalsites > 0)
          cur_env.rep_set_nsites(app_config->totalsites);
     // Turn on debugging and informational output if requested.
     if (app_config->verbose)
          cur_env.set_verbose(DB_VERB_REPLICATION, 1);
     cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
     cur_env.set_verbose(DB_VERB_RECOVERY, 1);
     cur_env.set_verbose(DB_VERB_REPLICATION, 1);
     cur_env.set_verbose(DB_VERB_REP_ELECT, 1);
     cur_env.set_verbose(DB_VERB_REP_LEASE, 1);
     cur_env.set_verbose(DB_VERB_REP_SYNC, 1);
     cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
     // Set replication group election priority for this environment.
     // An election first selects the site with the most recent log
     // records as the new master. If multiple sites have the most
     // recent log records, the site with the highest priority value
     // is selected as master.
     cur_env.rep_set_priority(app_config->priority);
     // Set the policy that determines how master and client sites
     // handle acknowledgement of replication messages needed for
     // permanent records. The default policy of "quorum" requires only
     // a quorum of electable peers sufficient to ensure a permanent
     // record remains durable if an election is held. The "all" option
     // requires all clients to acknowledge a permanent replication
     // message instead.
     cur_env.repmgr_set_ack_policy(app_config->ack_policy);
     // Set the threshold for the minimum and maximum time the client
     // waits before requesting retransmission of a missing message.
     // Base these values on the performance and load characteristics
     // of the master and client host platforms as well as the round
     // trip message time.
     cur_env.rep_set_request(20000, 500000);
     // Configure deadlock detection to ensure that any deadlocks
     // are broken by having one of the conflicting lock requests
     // rejected. DB_LOCK_DEFAULT uses the lock policy specified
     // at environment creation time or DB_LOCK_RANDOM if none was
     // specified.
     cur_env.set_lk_detect(DB_LOCK_DEFAULT);
     // The following base replication features may also be useful to your
     // application. See Berkeley DB documentation for more details.
     // - Master leases: Provide stricter consistency for data reads
     // on a master site.
     // - Timeouts: Customize the amount of time Berkeley DB waits
     // for such things as an election to be concluded or a master
     // lease to be granted.
     // - Delayed client synchronization: Manage the master site's
     // resources by spreading out resource-intensive client
     // synchronizations.
     // - Blocked client operations: Return immediately with an error
     // instead of waiting indefinitely if a client operation is
     // blocked by an ongoing client synchronization.
     cur_env.repmgr_set_local_site(app_config->this_host.host,
     app_config->this_host.port, 0);
     for ( REP_HOST_INFO *cur = app_config->other_hosts; cur != NULL;
          cur = cur->next) {
          cur_env.repmgr_add_remote_site(cur->host, cur->port,
          NULL, cur->peer ? DB_REPMGR_PEER : 0);
     // Configure heartbeat timeouts so that repmgr monitors the
     // health of the TCP connection. Master sites broadcast a heartbeat
     // at the frequency specified by the DB_REP_HEARTBEAT_SEND timeout.
     // Client sites wait for message activity the length of the
     // DB_REP_HEARTBEAT_MONITOR timeout before concluding that the
     // connection to the master is lost. The DB_REP_HEARTBEAT_MONITOR
     // timeout should be longer than the DB_REP_HEARTBEAT_SEND timeout.
     cur_env.rep_set_timeout(DB_REP_HEARTBEAT_SEND, 5000000);
     cur_env.rep_set_timeout(DB_REP_HEARTBEAT_MONITOR, 10000000);
     // The following repmgr features may also be useful to your
     // application. See Berkeley DB documentation for more details.
     // - Two-site strict majority rule - In a two-site replication
     // group, require both sites to be available to elect a new
     // master.
     // - Timeouts - Customize the amount of time repmgr waits
     // for such things as waiting for acknowledgements or attempting
     // to reconnect to other sites.
     // - Site list - return a list of sites currently known to repmgr.
     // We can now open our environment, although we're not ready to
     // begin replicating. However, we want to have a dbenv around
     // so that we can send it into any of our message handlers.
     cur_env.set_cachesize(0, CACHESIZE, 0);
     cur_env.set_flags(DB_REP_PERMANENT, 1);
     //cur_env.set_flags(DB_TXN_WRITE_NOSYNC, 1);
/*     u_int32_t maxlocks=300000;
     if (maxlocks != 0)
          cur_env.set_lk_max_locks(maxlocks);
     u_int32_t maxlocks_o=300000;
     if (maxlocks_o != 0)
          cur_env.set_lk_max_objects(maxlocks_o);
     u_int32_t maxmutex=300000;
     if (maxmutex != 0)
          cur_env.mutex_set_max(maxmutex);
     DbEnv          *m_env=&cur_env;
     m_env->set_flags(DB_TXN_NOSYNC, 1);
     m_env->set_lk_max_lockers(60000);
     m_env->set_lk_max_objects(60000);
     m_env->set_lk_max_locks(60000);
     m_env->set_tx_max(60000);
     //m_env->repmgr_set_ack_policy(DB_REPMGR_ACKS_NONE);
     m_env->rep_set_timeout(DB_REP_ACK_TIMEOUT, 50 * 1000); //50ms
     m_env->rep_set_timeout(DB_REP_CHECKPOINT_DELAY, 0);
     //m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 30 * 1000 * 1000); // 30 seconds
     m_env->rep_set_timeout(DB_REP_ELECTION_TIMEOUT, 1 * 1000 * 1000); // 5 seconds
     m_env->rep_set_timeout(DB_REP_FULL_ELECTION_TIMEOUT, 5 * 1000 * 1000); // 5 seconds
     m_env->rep_set_timeout(DB_REP_CONNECTION_RETRY, 5 * 1000 * 1000);
     //m_env->rep_set_timeout(DB_REP_ELECTION_RETRY, 10 * 1000 * 1000); //10 seconds
     //m_env->rep_set_timeout(DB_REP_HEARTBEAT_MONITOR, 80 * 1000 * 1000); //80 seconds
     //m_env->rep_set_timeout(DB_REP_HEARTBEAT_SEND, 500 * 1000); //500 milli seconds
     //The minimum number of microseconds a client waits before requesting retransmission
     u_int32_t rep_req_min = 40000; //40 000 microsec = 40 mili
     //The maximum number of microseconds a client waits before requesting retransmission
     u_int32_t rep_req_max = 1280000;// 1 280 000 microsec = 1.28 sec
     u_int32_t rep_limit_gbytes = 0;
     u_int32_t rep_limit_bytes = 100 * 1024 * 1024; // 100MB
     m_env->rep_set_request(rep_req_min, rep_req_max);
     m_env->rep_set_limit(rep_limit_gbytes, rep_limit_bytes);
     cur_env.open(app_config->home, DB_CREATE | DB_RECOVER |
     DB_THREAD | DB_INIT_REP | DB_INIT_LOCK | DB_INIT_LOG |
     DB_INIT_MPOOL | DB_INIT_TXN , 0);
     //keep old function for chain
     //old_rep_process_message=cur_env.get_DB_ENV()->rep_process_message;
     //derouting
     //cur_env.get_DB_ENV()->rep_process_message=my_rep_process_message;
     /*int _i;
     cur_env.log_get_config(DB_LOG_DIRECT, &_i);printf ("DB_LOG_DIRECT = %d\n",_i);
     cur_env.log_get_config(DB_LOG_DSYNC, &_i);printf ("DB_LOG_DSYNC = %d\n",_i);
     cur_env.log_get_config(DB_LOG_AUTO_REMOVE, &_i);printf ("DB_LOG_AUTO_REMOVE = %d\n",_i);
     cur_env.log_get_config(DB_LOG_IN_MEMORY, &_i);printf ("DB_LOG_IN_MEMORY = %d\n",_i);
     cur_env.log_get_config(DB_LOG_ZERO,&_i);printf ("DB_LOG_ZERO = %d\n",_i);
     // Start checkpoint and log archive support threads.
     (void)thread_create(&ckp_thr, NULL, checkpoint_thread, &cur_env);
     (void)thread_create(&lga_thr, NULL, log_archive_thread, &cur_env);
     (void)thread_create(&dmy_thr, NULL, dummy_write_thread, &cur_env);
     cur_env.repmgr_start(3, app_config->start_policy);
}

int RepQuoteExample::terminate() {
     try {
          // Wait for checkpoint and log archive threads to finish.
          // Windows does not allow NULL pointer for exit code variable.
          thread_exit_status_t exstat;
          (void)thread_join(lga_thr, &exstat);
          (void)thread_join(ckp_thr, &exstat);
          (void)thread_join(dmy_thr, &exstat);
          // We have used the DB_TXN_NOSYNC environment flag for
          // improved performance without the usual sacrifice of
          // transactional durability, as discussed in the
          // "Transactional guarantees" page of the Reference
          // Guide: if one replication site crashes, we can
          // expect the data to exist at another site. However,
          // in case we shut down all sites gracefully, we push
          // out the end of the log here so that the most
          // recent transactions don't mysteriously disappear.
          cur_env.log_flush(NULL);
          cur_env.close(0);
     } catch (DbException dbe) {
          cout << "error closing environment: " << dbe.what() << endl;
     return 0;
void RepQuoteExample::prompt() {
     cout << "QUOTESERVER";
     if (!app_data.is_master)
          cout << "(read-only)";
     cout << "> " << flush;
void log(const char *msg) {
time_t currentTime;
// get and print the current time
time (&currentTime); // fill now with the current time
     char buff[255];
     strncpy(buff,ctime(&currentTime),sizeof(buff));
     char *p;
     for(p =buff ; *p != '\n'; p++);
     *p = '\0';
     cerr << buff << " - " << msg << endl;
// Simple command-line user interface:
// - enter "<stock symbol> <price>" to insert or update a record in the
//     database;
// - just press Return (i.e., blank input line) to print out the contents of
//     the database;
// - enter "quit" or "exit" to quit.
void RepQuoteExample::doloop() {
     DbHolder dbh1(&cur_env,DATABASE);
     DbHolder dbh2(&cur_env,DATABASE2);
     DbHolder *dbh=&dbh1;
     DbTxn *txn;
     string input;
bool truncate = false;
     char *c;
     using_history();
     g_repquote=*dbh;
     int loop_rate = 0;
     int txn_rate = 500;
     while (prompt(), /*getline(cin, input)*/c=readline(NULL)) {
          input=std::string(c);
          add_history(c);
          free(c);
          int start_loop = 0;
          int end_loop = 0;
          int start_loop_d = 0;
          int end_loop_d = 0;
          istringstream is(input);
          string token1, token2, token3;
truncate = false;
start_loop = 0;
end_loop = 0;
          // Read 0, 1 or 2 tokens from the input.
          int count = 0;
          if (is >> token1) {
               count++;
               if (is >> token2)
               count++;
               if (is >> token3)
               count++;
          if (count == 1) {
     if (token1 == "truncate" ) {
                    truncate = true;
               else if (token1 == "env" ){
                    print_env(&cur_env);
                    continue;
     else if (token1 == "verbose" ) {
                    app_config->verbose = !app_config->verbose;
                    if (app_config->verbose)
                         cur_env.set_verbose(DB_VERB_REPLICATION, 1);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
                         cur_env.set_verbose(DB_VERB_RECOVERY, 1);
                         cur_env.set_verbose(DB_VERB_REP_ELECT, 1);
                         cur_env.set_verbose(DB_VERB_REP_LEASE, 1);
                         cur_env.set_verbose(DB_VERB_REP_SYNC, 1);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 1);
                         log("verbose is on");
                    else
                         cur_env.set_verbose(DB_VERB_REPLICATION, 0);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 0);
                         cur_env.set_verbose(DB_VERB_RECOVERY, 0);
                         cur_env.set_verbose(DB_VERB_REP_ELECT, 0);
                         cur_env.set_verbose(DB_VERB_REP_LEASE, 0);
                         cur_env.set_verbose(DB_VERB_REP_SYNC, 0);
                         cur_env.set_verbose(DB_VERB_REPMGR_MISC, 0);
                         log("verbose is off");
                    continue;
     else if (token1 == "print" ) {
               print_stocks(*dbh);
                    count = 0;
     else if (token1 == "db1" ) {
                    dbh=&dbh1;
                    g_repquote=*dbh;
                    log( "switch to Db1");
                    count = 0;
     else if (token1 == "db2" ) {
                    dbh=&dbh2;
                    g_repquote=*dbh;
                    log( "switch to Db2");
                    count = 0;
               else if (token1 == "exit" || token1 == "quit") {
                    app_data.app_finished = 1;
                    break;
               } else {
                    log("Format: <stock> <price>");
                    continue;
else if (count == 2)
               if (token1 == "loop_rate" ){
     loop_rate = atoi(token2.c_str());
                    continue;
               if (token1 == "txn_rate" ){
     txn_rate = atoi(token2.c_str());
                    continue;
else if (count == 3)
if (token1 == "loop" ) {
start_loop = atoi(token2.c_str());
end_loop = start_loop + atoi(token3.c_str());
if (token1 == "delete" ) {
start_loop_d = atoi(token2.c_str());
end_loop_d = start_loop_d + atoi(token3.c_str());
          // Here we know count is either 0 or 2, so we're about to try a
          // DB operation.
          // Open database with DB_CREATE only if this is a master
          // database. A client database uses polling to attempt
          // to open the database without DB_CREATE until it is
          // successful.
          // This DB_CREATE polling logic can be simplified under
          // some circumstances. For example, if the application can
          // be sure a database is already there, it would never need
          // to open it with DB_CREATE.
          if (!dbh->ensure_open(app_data.is_master))
               continue;
          try {
               if (count == 0)
                    if (app_data.in_client_sync)
                         log( "Cannot read data during client initialization - please try again.");
                    else
                         print_stocks_size(*dbh);
               else if (!app_data.is_master)
                    log("Can't update at client");
               else {
                    if (truncate)
u_int32_t no_remove;
                    txn = NULL;
cur_env.txn_begin(NULL, &txn, DB_TXN_NOWAIT);
                         try
          (*dbh)->truncate(txn, &no_remove, 0);
// commit
txn->commit(0);
txn = NULL;
} catch (DbException &e) {
std::cout << "Error on txn commit: " << e.what() << std::endl;
                    //     } catch (DbDeadlockException &) {
                    if (txn != NULL)
                         (void)txn->abort();
// std::cout << "Error on txn commit: " << std::endl;
else if (start_loop)
int j=0;
for (int i=start_loop; i<=end_loop; i=i+txn_rate)
//transaction begin
               txn = NULL;
               cur_env.txn_begin(NULL, &txn, 0);
for (j=i; j<=end_loop && j<=(i+txn_rate); j++)
                              Dbt key, value;
     std::string key1, value1;
     std::stringstream sstrm;
     sstrm << "key" << j << ends;
     key1 = sstrm.str();
               key.set_data((void *)key1.c_str());
               key.set_size((u_int32_t)strlen(key1.c_str()));
     sstrm.str("");
     int payload = rand() + j;
                              sstrm << "price" << payload << ends;
     value1 = sstrm.str();
               value.set_data((void *)value1.c_str());
               value.set_size((u_int32_t)strlen(value1.c_str()));
     // Perform the database put
     (*dbh)->put(txn, &key, &value, 0);
                         printf("Kill me !!\n");
                         kill(getpid(),-9);
                         exit(0);
     try
                              // commit
                    txn->commit(0);
                    txn = NULL;
               } catch (DbException &e) {
                    std::cout << "Error on txn commit: " << e.what() << std::endl;
                         if (loop_rate>0)
                              usleep(txn_rate * 1000 * 1000 / loop_rate);
                    else if (start_loop_d)
int j=0;
for (int i=start_loop_d; i<=end_loop_d; i=i+100)
//transaction begin
               txn = NULL;
               cur_env.txn_begin(NULL, &txn, 0);
for (j=i; j<=end_loop_d && j<=(i+100); j++)
                              Dbt key, value;
     std::string key1, value1;
     std::stringstream sstrm;
     sstrm << "key" << j << ends;
     key1 = sstrm.str();
               key.set_data((void *)key1.c_str());
               key.set_size((u_int32_t)strlen(key1.c_str()));
     // Perform the database put
     (*dbh)->del(txn, &key, 0);
     try
                              // commit
                    txn->commit(0);
                    txn = NULL;
               } catch (DbException &e) {
                    std::cout << "Error on txn commit: " << e.what() << std::endl;
                    else
                         const char *symbol = token1.c_str();
                         StringDbt key(const_cast<char*>(symbol));
                         const char *price = token2.c_str();
                         StringDbt data(const_cast<char*>(price));
                         (*dbh)->put(NULL, &key, &data, 0);
          } catch (DbDeadlockException e) {
               log("please retry the operation");
               dbh->close();
          } catch (DbRepHandleDeadException e) {
               log("please retry the operation");
               dbh->close();
          } catch (DbException e) {
               if (e.get_errno() == DB_REP_LOCKOUT) {
               log("please retry the operation");
               dbh->close();
               } else
               throw;
     dbh->close();
void RepQuoteExample::event_callback(DbEnv* dbenv, u_int32_t which, void *info)
     static char buf[256];
     APP_DATA app = (APP_DATA)dbenv->get_app_private();
     info = NULL;          /* Currently unused. */
     switch (which) {
     case DB_EVENT_REP_CLIENT:
          app->is_master = 0;
          app->in_client_sync = 1;
          sprintf(buf,"%s - %s",progname,"CLIENT");
          //EZ->dbenv->set_errpfx(buf);
          log("DB_EVENT_REP_CLIENT.");
          break;
     case DB_EVENT_REP_MASTER:
          app->is_master = 1;
          app->in_client_sync = 0;
          sprintf(buf,"%s - %s",progname,"MASTER");
          //EZ->dbenv->set_errpfx(buf);
          log("DB_EVENT_REP_MASTER.");
          break;
     case DB_EVENT_REP_NEWMASTER:
          log("DB_EVENT_REP_NEWMASTER.");
          app->in_client_sync = 1;
          break;
     case DB_EVENT_REP_PERM_FAILED:
          // Did not get enough acks to guarantee transaction
          // durability based on the configured ack policy. This
          // transaction will be flushed to the master site's
          // local disk storage for durability.
          log("DB_EVENT_REP_PERM_FAILED.");
          log("Insufficient acknowledgements to guarantee transaction durability.");
          break;
     case DB_EVENT_REP_STARTUPDONE:
          app->in_client_sync = 0;
          log("DB_EVENT_REP_STARTUPDONE.");
          break;
     case DB_EVENT_REP_ELECTION_FAILED:
          log("DB_EVENT_REP_ELECTION_FAILED.");
          //g_runner->init(g_config);
          printf("Kill me !!\n");
          kill(getpid(),-9);
          exit(0);
          break;
     case DB_EVENT_REP_DUPMASTER:
          log("DB_EVENT_REP_DUPMASTER.");
          break;
     default:
          dbenv->errx("ignoring event %d", which);
void RepQuoteExample::print_stocks_size(Db *dbp) {
     DB_BTREE_STAT *statp;
dbp->stat(NULL, &statp, 0);
     log("db_stat");
cout << "***************************************** >>>>>>>>>>> : database contains " << (u_long)statp->bt_ndata << " records\n";
void RepQuoteExample::print_env(DbEnv *dbenv) {
     dbenv->stat_print(DB_STAT_ALL);
void RepQuoteExample::print_stocks(Db *dbp) {
     StringDbt key, data;
#define     MAXKEYSIZE     10
#define     MAXDATASIZE     20
     char keybuf[MAXKEYSIZE + 1], databuf[MAXDATASIZE + 1];
     char kbuf, dbuf;
     memset(&key, 0, sizeof(key));
     memset(&data, 0, sizeof(data));
     kbuf = keybuf;
     dbuf = databuf;
     DbcAuto dbc(dbp, 0, 0);
     cout << "\tSymbol\tPrice" << endl
          << "\t======\t=====" << endl;
int no_records =0;
     for (int ret = dbc->get(&key, &data, DB_FIRST);
          ret == 0;
          ret = dbc->get(&key, &data, DB_NEXT)) {
          key.get_string(&kbuf, MAXKEYSIZE);
          data.get_string(&dbuf, MAXDATASIZE);
no_records++;
          cout << "\t" << keybuf << "\t" << databuf << endl;
cout << "********************** NO Records " << no_records << endl;
     cout << endl << flush;
     dbc.close();
static void usage() {
     cerr << "usage: " << progname << " -h home -l host:port [-CM]"
     << "[-r host:port][-R host:port]" << endl
     << " [-a all|quorum][-b][-n nsites][-p priority][-v]" << endl;
     cerr << "\t -h home (required; h stands for home directory)" << endl
     << "\t -l host:port (required; l stands for local)" << endl
     << "\t -C or -M (optional; start up as client or master)" << endl
     << "\t -r host:port (optional; r stands for remote; any "
     << "number of these" << endl
     << "\t may be specified)" << endl
     << "\t -R host:port (optional; R stands for remote peer; only "
     << "one of" << endl
     << "\t these may be specified)" << endl
     << "\t -a all|quorum (optional; a stands for ack policy)" << endl
     << "\t -b (optional; b stands for bulk)" << endl
     << "\t -n nsites (optional; number of sites in replication "
     << "group; defaults " << endl
     << "\t     to 0 to try to dynamically compute nsites)" << endl
     << "\t -p priority (optional; defaults to 100)" << endl
     << "\t -v (optional; v stands for verbose)" << endl;
     exit(EXIT_FAILURE);
int main(int argc, char **argv) {
     RepConfigInfo config;
     char ch, portstr, tmphost;
     int tmpport;
     bool tmppeer;
     config.no_dummy_wr = false;
     // Extract the command line parameters
     while ((ch = getopt(argc, argv, "E:a:bCh:l:Mn:p:R:r:vw")) != EOF) {
          tmppeer = false;
          switch (ch) {
          case 'a':
               if (strncmp(optarg, "all", 3) == 0)
                    config.ack_policy = DB_REPMGR_ACKS_ALL;
               else if (strncmp(optarg, "quorum", 6) != 0)
                    usage();
               break;
          case 'b':
               config.bulk = true;
               break;
          case 'C':
               config.start_policy = DB_REP_CLIENT;
               break;
          case 'E':
config.start_policy = DB_REP_ELECTION;
break;
          case 'h':
               config.home = optarg;
               break;
          case 'l':
               config.this_host.host = strtok(optarg, ":");
               if ((portstr = strtok(NULL, ":")) == NULL) {
                    cerr << "Bad host specification." << endl;
                    usage();
               config.this_host.port = (unsigned short)atoi(portstr);
               config.got_listen_address = true;
               break;
          case 'M':
               config.start_policy = DB_REP_MASTER;
               break;
          case 'n':
               config.totalsites = atoi(optarg);
               break;
          case 'p':
               config.priority = atoi(optarg);
               break;
          case 'R':
               tmppeer = true; // FALLTHROUGH
          case 'r':
               tmphost = strtok(optarg, ":");
               if ((portstr = strtok(NULL, ":")) == NULL) {
                    cerr << "Bad host specification." << endl;
                    usage();
               tmpport = (unsigned short)atoi(portstr);
               config.addOtherHost(tmphost, tmpport, tmppeer);
               break;
          case 'v':
               config.verbose = true;
               break;
          case 'w':
               config.no_dummy_wr = true;
               //config.priority = 2;
               break;
          case '?':
          default:
               usage();
     // Error check command line.
     if ((!config.got_listen_address) || config.home == NULL)
          usage();
     RepQuoteExample runner;
     g_runner=&runner;
     g_config=&config;
     try {
          runner.init(&config);
          runner.doloop();
     } catch (DbException dbe) {
          cerr << "Caught an exception during initialization or"
               << " processing: " << dbe.what() << endl;
     runner.terminate();
     return 0;
// This is a very simple thread that performs checkpoints at a fixed
// time interval. For a master site, the time interval is one minute
// plus the duration of the checkpoint_delay timeout (30 seconds by
// default.) For a client site, the time interval is one minute.
void checkpoint_thread(void args)
     DbEnv *env;
     APP_DATA *app;
     int i, ret;
     env = (DbEnv *)args;
     app = (APP_DATA *)env->get_app_private();
     for (;;) {
          // Wait for one minute, polling once per second to see if
          // application has finished. When application has finished,
          // terminate this thread.
          for (i = 0; i < 60; i++) {
               sleep(1);
               if (app->app_finished == 1)
                    return ((void *)EXIT_SUCCESS);
          // Perform a checkpoint.
          // original line
          if ((ret = env->txn_checkpoint(0, 0, 0)) != 0) {
          //if ((ret = env->txn_checkpoint(0, 0, DB_FORCE)) != 0) {
               env->err(ret, "Could not perform checkpoint.\n");
               return ((void *)EXIT_FAILURE);
// This is a simple log archive thread. Once per minute, it removes all but
// the most recent 3 logs that are safe to remove according to a call to
// DBENV->log_archive().
// Log cleanup is needed to conserve disk space, but aggressive log cleanup
// can cause more frequent client initializations if a client lags too far
// behind the current master. This can happen in the event of a slow client,
// a network partition, or a new master that has not kept as many logs as the
// previous master.
// The approach in this routine balances the need to mitigate against a
// lagging client by keeping a few more of the most recent unneeded logs
// with the need to conserve disk space by regularly cleaning up log files.
// Use of automatic log removal (DBENV->log_set_config() DB_LOG_AUTO_REMOVE
// flag) is not recommended for replication due to the risk of frequent
// client initializations.
void log_archive_thread(void args)
     DbEnv *env;
     APP_DATA *app;
     char **begin, **list;
     int i, listlen, logs_to_keep, minlog, ret;
     env = (DbEnv *)args;
     app = (APP_DATA *)env->get_app_private();
     logs_to_keep = 3;
     for (;;) {
          // Wait for one minute, polling once per second to see if
          // application has finished. When application has finished,
          // terminate this thread.
          for (i = 0; i < 60; i++) {
               sleep(1);
               if (app->app_finished == 1)
                    return ((void *)EXIT_SUCCESS);
          // Get the list of unneeded log files.
          if ((ret = env->log_archive(&list, DB_ARCH_ABS)) != 0) {
               env->err(ret, "Could not get log archive list.");
               return ((void *)EXIT_FAILURE);
          if (list != NULL) {
               listlen = 0;
               // Get the number of logs in the list.
               for (begin = list; *begin != NULL; begin++, listlen++);
               // Remove all but the logs_to_keep most recent
               // unneeded log files.
               minlog = listlen - logs_to_keep;
               for (begin = list, i= 0; i < minlog; list++, i++) {
                    if ((ret = unlink(*list)) != 0) {
                         env->err(ret,
                         "logclean: remove %s", *list);
                         env->errx(
                         "logclean: Error remove %s", *list);
                         free(begin);
                         return ((void *)EXIT_FAILURE);
               free(begin);
#define DATABASE_DUMMY "dummy.db"
void create_dummy_db(DB_ENV env, DB *dbp)
DB_ENV *dbenv=env;
int ret;
u_int32_t db_flags;
if ((ret = db_create(dbp, dbenv, 0)) != 0)
dbenv->err(dbenv, ret, "create_dummy_db: db_create");
db_flags = DB_AUTO_COMMIT | DB_CREATE;
//if ((ret = (*dbp)->open(*dbp,NULL, DATABASE, NULL, DB_BTREE, db_flags, 0)) != 0)
if ((ret = (*dbp)->open(*dbp,NULL, NULL, DATABASE_DUMMY, DB_BTREE, db_flags, 0)) != 0)
dbenv->err(dbenv, ret, "create_dummy_db: DB->open");
void reopen_dummy_db(DB_ENV env, DB *dbp)
DB_ENV *dbenv=env;
int ret;
u_int32_t db_flags;
if ((ret = db_create(dbp, dbenv, 0)) != 0)
dbenv->err(dbenv, ret, "create_dummy_db: db_create");
db_flags = DB_AUTO_COMMIT | DB_CREATE;
//if ((ret = (*dbp)->open(*dbp,NULL, DATABASE, NULL, DB_BTREE, db_flags, 0)) != 0)
if ((ret = (*dbp)->open(*dbp,NULL, NULL, DATABASE_DUMMY, DB_BTREE, db_flags, 0)) != 0)
dbenv->err(dbenv, ret, "reopen_dummy_db: DB->open");
void perform_db_operation(DB_ENV env, DB *dbp, bool bRead)
//main loop
//DB *dbp=NULL;
DB_ENV *dbenv=env;
int ret;
u_int32_t db_flags;
DBT key, data;
char buf[20]="dummy", *rbuf;
rbuf=buf;
if (*dbp == NULL)
create_dummy_db(dbenv, dbp);
if (! bRead)
     memset(&key, 0, sizeof(key));
     memset(&data, 0, sizeof(data));
     key.data = buf;
     key.size = (u_int32_t)strlen(buf);
     data.data = rbuf;
     data.size = (u_int32_t)strlen(rbuf);
     if ((ret = (*dbp)->put(*dbp, NULL, &key, &data, 0)) != 0)
          if (ret == DB_REP_HANDLE_DEAD)
               //create_dummy_db(dbenv, dbp);
               reopen_dummy_db(dbenv, dbp);
               (*dbp)->err(*dbp, ret, "DB->put :");
          else
          if (ret != DB_KEYEXIST)
               (*dbp)->err(*dbp, ret, "perform_db_operation: DB->put");
     else
          DB_BTREE_STAT *statp;
          (*dbp)->stat(*dbp,NULL, &statp, 0);
          std::cout<<"dbp read stats: key#"<< statp->bt_nkeys <<std::endl;
void dummy_write_thread(void args)
     DbEnv *env;
     APP_DATA *app;
     char **begin, **list;
     int i, listlen, logs_to_keep, minlog, ret;
     DB *m_dbp; // a pointer
     env = (DbEnv *)args;
     app = (APP_DATA *)env->get_app_private();
     logs_to_keep = 3;
     for (;;) {
          if (! app->no_dummy_wr)
               if (app->is_master)
               perform_db_operation(env->get_DB_ENV(),&m_dbp,false);
                    //env->txn_checkpoint(0, 0, DB_FORCE);
          usleep(1 * 1000 * 1000);
          else
               if (app->is_master)
                    //DB *db_quote=g_repquote->get_DB();
                    //perform_db_operation(env->get_DB_ENV(),&db_quote,true);
                    //if (g_repquote)
                    //     g_runner->print_stocks_size(g_repquote);
                    //env->txn_checkpoint(0, 0, DB_FORCE);
                    //perform_db_operation(env->get_DB_ENV(),&m_dbp,false);
                    env->rep_flush();
          usleep(4 * 1000 * 1000);
my script to simulate the split brain
#!/bin/sh
[ -z "$node1" ] && node1=10.10.32.121
[ -z "$node2" ] && node2=10.10.32.91
trap myend 0 1 2 3 6 9 14 15
myend()
     echo "Receive signal to stop test..."
     un_split_brain
     echo "done"
     exit 1
split_brain()
     echo -n "Split-Brain at node $node..."
     snmpset -m ALL -v 2c -c svil 10.10.0.100 ifAdminStatus.41 i 2 >/dev/null 2>&1
     echo "done"
un_split_brain()
     echo -n "Undo Split-Brain at node $node..."
     snmpset -m ALL -v 2c -c svil 10.10.0.100 ifAdminStatus.41 i 1 >/dev/null 2>&1
     echo "done"
is_slave()
     local r=$(ssh root@$1 "tail -2 /tmp/BDB.log" | grep -c CLIENT)
     [ $r -gt 1 ] && ret=1 || ret=0
     return $ret
is_master()
     local r=$(ssh root@$1 "tail -2 /tmp/BDB.log" | grep -c MASTER)
     [ $r -gt 1 ] && ret=1 || ret=0
     return $ret
wait_for_master()
     echo -n "Waiting for MASTER at node $node ... "
     is_master $node
     r=$?
     while ( [ ! $r -eq 1 ] )
     do
     usleep 500000
     is_master $node
     r=$?
     echo -n "."
     done
     echo "done"
wait_for_slave()
     local r
     local tm
     tm=0
     echo -n "Waiting for SLAVE at node $node ... "
     is_slave $node
     r=$?
     while ( [ ! $r -eq 1 ] )
     do
          usleep 500000
          is_slave $node
          r=$?
          echo -n "."
          tm=$((tm+1))
          [ $tm -gt 120 ] && break
     done
     [ $tm -gt 120 ] && ret=0 || ret=1
     echo "done"
     return $ret
run_test_split_brain()
     local nt
     nt=1
     nfails=0
     x=4
     [ -z "$1" ] && node=$node2
     while ((1))
     do
          printf "*************** TEST [%02d] ********************\n" $nt
          split_brain
          wait_for_master
          x=$((RANDOM%9))
          echo -n " waiting $x sec ..."
          sleep $x
          echo "done"
          un_split_brain
          wait_for_slave
          r=$?
          [ ! $r -eq 1 ] && echo "`date` - test [$nt] - fails ..." || echo "`date` - test [$nt] - OK ."
          [ ! $r -eq 1 ] && nfails=$((nfails+1))
          perc_failure=$(echo "100.0 - $nfails / $nt * 100.0" | bc -l)
          echo "************************************************ [% Success test $perc_failure % ]"
          nt=$((nt+1))
          x=$((RANDOM%9))
          echo -n " waiting $x sec ..."
          sleep $x
     done
run_test_split_brain
here is the makefile to run to two environments
i run:
- make run
and in another window sh test_split_brain.sh
node1?=10.10.32.121
node2?=10.10.32.91
nsite?=2
debug?=0
all: RepQuoteExampleEric install
RepConfigInfo.o: RepConfigInfo.cpp RepConfigInfo.h
     g++ -I/usr/local/BerkeleyDB.5.1/include/ -g -O0 -c RepConfigInfo.cpp -o RepConfigInfo.o
RepQuoteExampleEric: RepQuoteExampleEric.cpp RepConfigInfo.o
     g++ -I/usr/local/BerkeleyDB.5.1/include/ -g -O0 RepQuoteExampleEric.cpp RepConfigInfo.o -o RepQuoteExampleEric -L /usr/local/BerkeleyDB.5.1/lib/ -lreadline -lcurses -ldb_cxx
kill:
     -ssh -X root@$(node1) "killall -9 /root/RepQuoteExampleEric"
     -ssh -X root@$(node2) "killall -9 /root/RepQuoteExampleEric"
run: RepQuoteExampleEric kill install clean_env
     ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.110:12345 -r 2.0.0.210:12345 -a quorum -b -n $(nsite) -v | tee /tmp/BDB.log\"" &
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
run_node2: clean_env2
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
debug_node2: clean_env2
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v -w | tee /tmp/BDB.log\"" &
     sleep 3
     ssh -X root@$(node2) /sbin/pidof RepQuoteExampleEric >/tmp/pid
     ssh -X root@$(node2) ~/kdbg /root/db-5.1.19/examples/cxx/excxx_repquote/RepQuoteExampleEric -p `cat /tmp/pid`
run_debug_node1: RepQuoteExampleEric kill install clean_env
     ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/kdbg /root/RepQuoteExampleEric\" " &
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.210:12345 -r 2.0.0.110:12345 -a quorum -b -n $(nsite) -v\"" &
run_debug_node2: RepQuoteExampleEric kill install clean_env
     ssh -X root@$(node1) "xterm -geom 100x20+100+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/RepQuoteExampleEric -h /opt/bdb/ -l 2.0.0.110:12345 -r 2.0.0.210:12345 -a quorum -b -n $(nsite) -v\" " &
     ssh -X root@$(node2) "xterm -geom 100x20+800+100 -e \"LD_LIBRARY_PATH=/usr/local/BerkeleyDB.5.1/lib/ /root/kdbg /root/RepQuoteExampleEric\"" &
install: RepQuoteExampleEric
     scp RepQuoteExampleEric root@$(node1):~
     scp RepQuoteExampleEric root@$(node2):~
clean_env: clean_env1 clean_env2
clean_env1:
     ssh -X root@$(node1) rm -rf /opt/bdb/*
clean_env2:
     ssh -X root@$(node2) rm -rf /opt/bdb/*

Split brain syndrome in RAC

As per Split brain syndrome in Oracle RAC in case of inter-connect failures the master node will evict other/dead nodes .
Let say 2 node RAC configuration node 1 is defined as master node (by some parameter like load and others) incase of network failures node 1 will terminate node 2 from cluster.
;l
what happens if master node, in this case node 1 fails. which will terminate node 1 and is node 2 will become master node ?

Hi,
It occurs when the instance members in a RAC fail to ping/connect to each other via this private interconnect, but the servers are all pysically up and running and the database instance on each of these servers is also running. These individual nodes are running fine and can conceptually accept user connections and work independently. So basically due to lack of commincation the instance thinks that the other instance that it is not able to connect is down and it needs to do something about the situation. The problem is if we leave these instance running, the sane block might get read, updated in these individual instances and there would be data integrity issue, as the blocks changed in one instance, will not be locked and could be over-written by another instance. Oracle has efficiently implemented check for the split brain syndrome.
In RAC if any node becomes inactive, or if other nodes are unable to ping/connect to a node in the RAC, then the node which first detects that one of the node is not accessible, it will evict that node from the RAC group. e.g. there are 4 nodes in a rac instance, and node 3 becomes unavailable, and node 1 tries to connect to node 3 and finds it not responding, then node 1 will evict node 3 out of the RAC groups and will leave only Node1, Node2 & Node4 in the RAC group to continue functioning.
The split brain concepts can become more complicated in large RAC setups. For example there are 10 RAC nodes in a cluster. And say 4 nodes are not able to communicate with the other 6. So there are 2 groups formed in this 10 node RAC cluster ( one group of 4 nodes and other of 6 nodes). Now the nodes will quickly try to affirm their membership by locking controlfile, then the node that lock the controlfile will try to check the votes of the other nodes. The group with the most number of active nodes gets the preference and the others are evicted. Moreover, I have seen this node eviction issue with only 1 node getting evicted and the rest function fine, so I cannot really testify that if thats how it work by experience, but this is the theory behind it.
When we see that the node is evicted, usually oracle rac will reboot that node and try to do a cluster reconfiguration to include back the evicted node.
You will see oracle error: ORA-29740, when there is a node eviction in RAC. There are many reasons for a node eviction like heart beat not received by the controlfile, unable to communicate with the clusterware etc.
And also You can go through Metalink Note ID: 219361.1

Split Brain Scenario

Hello,
Whilst testing Dataguard with FSFO, I seem to have managed to achieve a split brain situation where 2 databases in the Data Guard configuration were considered as primary databases and both were available to receive client connections. I dont understand why this happened and I'm looking for some assurance that this is not a bug in Oracle.
My Data Guard Configuration is as follows:-
1 primary database (DBa)
3 physical standby databases (DBb, DBc, DBd)
All databases are single instance, i.e no RAC, and are running Oracle 11g (11.1.0.7) on RHEL 5.3
DBb is the Fast Start Failover Target for DBa. The FSFO observer process is running on a stand-alone server called OBS1,
To simulate a 'data centre disaster' i did the following :-
1) Kill the SMON processes on the servers running DBa and DBb (Note I did not kill the Observer process)
2) From DGMGRL on the server running DBc issue the following commands :-
DGMGRL> disable fast_start failover force (Without doing this I could not issue the subsequent failover command)
DGMGRL> failover to DBc ;
This worked as expected and DBc was established as the new primary database in the configuration. DBd continued to function correctly as a stand db. Subsequent client connections were routed to DBc as expected.
3) I then attempted to simulate the two failed databases DBa and DBb rejoining the configuration. Firstly I put DBa into MOUNT status using the STARTUP MOUNT command from the SQLPLUS command line.
4) Before I did anything with DBb, the Observer process that was still running on OBS1, detected that DBa was 'active' again and OPENed the database. In doing this it took no notice of the fact that DBc was already open and acting as the primary database in the configuration. The result of this was that two databases - DBa and DBc in the configuration were in an an OPEN state and acting as a primary database i.e Split Brain. The TNSNAMES.ora configuration on the Oracle machines meant that it was now perfectly possible for client connection to be spread over both these machines.
I am very concerned as to why Oracle allowed the above situation to happen. Was my test unreasonable or should Oracle have detected that DBc was the new primary database after I attmepted to restart DBa in MOUNT state?? I now understand that if I had also issued the STOP OBSERVER command in DGMGRL, after issuing the FAILOVER ro DBc command, then the FSFO Observer could not have OPENed DBa once DBc had become the primary database, so is this the only mechanism that must be used in the above scenario to prevent a Split brain ?
Any advice would be greatly appreciated.
Thanks,
Shaun

Your environment is complex enough, three standbys, I've not worked in a similar environment. My recommendation would be to open an SR at metalink and please let us know what they tell you as we may very well find ourselves with multiple standbys in the future.

Some quick help needed with certificates and split brain dns.

I run exch 2010 and have one cas server(srv03). I have split brain dns configured and working in my system. I got a new certificate this year because of the new regulations that won't allow .internal names in the san portion of an ssl cert.
I have followed several tids on the internet and still when I tried to implement it today the outlook clients started getting a popup that says [the name on the certificate is invalid or does not match the name of the site] At the top of this popup
is srv03.abccorp.internal which is what it was before.
The certificate is for mail.abccorp.com and also includes autodiscover.abccorp.com and srv03.abccorp.com.
When I run [Get-clientAccessServer | fl Name,AutoDiscoverServiceInternalUri] the name and the Url is correct and has the .com value.
When I run the test email autoconfiguration from my Outlook icon, and look at the log, Autodiscover URL found through SCP, is correct and it says Succeeded at the end. In the results tab however the Server, Availability Service, OOF URL are still showing
the .internal instead of .com. The Internal OWA, External OWA and the OAB are correctly displaying the .com. What commands do I need to run to change these as they seem to be the problem.
I wasted a lot of time chasing the autodiscover before I found out about this test in outlook and realized the autodiscover url was correct. :-)
I have two days left on my old cert that has both .com and .internal SANs so I rolled that back into service so the users stop getting messages. Any help would be appreciated.

Hi OTS,
You can run the following command to Change the InternalUrl attribute of the EWS:
Set-WebServicesVirtualDirectory -Identity "CAS_Server_Name\EWS (Default Web Site)" -InternalUrl https://mail.abccorp.com/ews/exchange.asmx
Best regards,
Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact [email protected]
Niko Cheng
TechNet Community Support

HSRP "Split Brain" on the STP Topology

Hello. I'm a network administrator in my company.
I have a question about HSRP "Split Brain" on the STP Topology.
verifying to HSRP Down Time that attached network topology.
Trying "ICMP Ping" from PC to HSRP Virtual IP.
I have found unexpected senario. PIng goes down when reboot the L2 Core SW 1 and Router 1 HSRP goes Active from Init Status.
Why ping goes down?
Router 1's Gratuitouse ARP, It's should not be transffered to L2 Core SW 2?
Sorry to trouble you, Could you please teach.

Thank you for your reply, rejeevh. I retried "L2 Core SW 1 Down Test".
From a result, My verification senario was wrong.
It was not shown in the figure, there is another link of both routers to other routers over the Core SWs that enable OSPF and "redistribute connected" in practice.
"ping from PC to HSRP Virtual IP" was wrong. "ping from PC to another OSPF Router's Interface" is correct senario.
I verified correct senario rebooting L2 Core SW 1. Also, ping goes down.
but this result was simple, It was dropped at SW 1's EtherChannel Interface in STP LIS/LER status when recieved return packet from another Router. (SW 1's other Interface was enabled Portfast or Portfast Trunk. )
I was confirmed the result was improved, It was enable Portfast Trunk in Etherchannel Interfaces of SW 2 and SW 1.
Thank you very much for your reply.

Missing partitions after running Recovery from previous System Image Backup in Windows 8.1

After running Recovery from a recent Sytem Image Backup the LRS_ESP partition is now missing and the Used space on the PBR_DRV partition is reported as 95.20MB while it previously reported 9.52GB in use. The WINRE_DRV, SYSTEM_DRV, Windows8_OS and D:LENOVO partions appear to be OK.
I ran the Recover after replacing the original 500GB HDD with a 1TB. Any details/links/recommendations on recreating/correcting the LRS_ESP and PBR_DRV partitions? Thanks!

Here are the before and after partition details obtained through Partition Wizard:
Link to image 1
Link to image 2
Moderator note: large image(s) converted to link(s): About Posting Pictures In The Forums

Data recovery from corrupt boot partition

The boot partition on my MacBook running 10.7.6 has a corrupt volume structure and will not mount, much less boot. The recovery partition boots but really doesn't let me do anything. Disk Utility can't repair nor even complete verification. I have lots of images which I need to recover, so am looking for a utility which might help.
Everything was apparently ok until the failure and the drive hardware checked out ok, so I'm hopeful my files are recoverable. A week or so ago I optimized the volume with TechTool Pro and the directory with DiskWarrior, and last night attempted to sync 1Password on my iPad with 1Password for Mac. This did no t work, probably because of my unfamiliarity with the procedure, but seemed to do no damage. When I ran TTPro last night from its eDrive partition it reported a volume structure problem on the partition which I started to fix with the program but then cancelled the analysis and decided to use disk utility instead. Disk utility also reported a problem but stopped the verification with the instruction to do a repair which it again could not complete. When I rebooted the system partition would again not boot but this time I got grey screen with a message that the debugger had loaded and <panic>. A message box said power down and restart, but that only repeats the process.
I'm thinking of attempting to install a system on an external boot drive and accessing the corrupt partition with a data recovery utility. Any insights, ideas or shared experiences appreciated. Thanks in advance for any desperately needed help.

The safest thing to do here is to install a new disk in the system and do a clean install of OS X. From there, you can put your corrupted volume in an external USB enclosure and mount the file system to try and recover as much data as you can.
It's likely obvious now, but it's REALLY worth investing in a large-capacity external storage or Time Capsure to use with Time Machine. Backups are essential. Hardware always eventually dies. We all need an effective strategy to deal with that.
A long time ago, I was backing my Linux system up to 4mm DAT. When the inevitable HDD crash came, I thought I was ready. Unfortunately, I hadn't tested recovery from tape and I ended up losing everything. Which is to say, until you know you can restore, you don't even have a backup. I lost thousands upon thousands of photos of my kids growing up.
BTW, if it is essential that you recover as much as possible, consider taking the disk to a data recovery service. Be warned: It's expensive.

Backup/Recovery from web application

Hello guys,
I am using Oracle 9i as DB and Oracle 9iAS for web application server. I want to provide Backup and Recovery functionality to the user via web. I don't know any thing in this regard.
Is it possible that we can take backup and recovery from web application?
Is there any alternative for this function.
any other comments will be appreciated.
Thank you,
Jawed Nazar Ali

Read this article in order to get an idea about Java Stored Procedures.
Oracle Developer JAVA STORED PROCEDURES
Simplify with Java Stored Procedures
By Kuassi Mensah
Use Java stored procedures to bridge SQL, XML, Java, and J2EE and Web Services.
Stored procedures allow a clean separation of persistence logic that runs in the database tier from business logic that runs in the middle tier. This separation reduces overall application complexity and increases reuse, security, performance, and scalability.
A major obstacle, however, for widespread adoption of stored procedures is the set of various proprietary, database-dependent implementation languages that different database vendors use. The use of Java-based stored procedures fixes this concern. Oracle has implemented ANSI standards that specify the ability to invoke static Java methods from SQL as procedures or functions. This implementation is called simply "Java stored procedures."
In this article, you will learn how Java stored procedures help simplify and increase the performance of your business logic and extend database functionality. I'll show how Oracle enables the use of Java stored procedures within the database. I'll also look at how Java stored procedures access data, and show how to create a basic Java stored procedure.
PL/SQL or Java
When you think of Oracle stored procedures, you probably think of PL/SQL. Oracle, however, has provided Java support in the database since Oracle8i, to offer an open and portable alternative to PL/SQL for stored procedures. I can hear the $64,000 question: "How do I choose between PL/SQL and Java? Should I forget all the things I've been told about PL/SQL and move on to the greener Java pastures?"
Both languages are suitable for database programming, and each has its strengths and weaknesses. In deciding which language to use, here's a general rule of thumb:
Use PL/SQL for database-centric logic that requires seamless integration with SQL and therefore complete access to database objects, types, and features.
Use Java as an open alternative to PL/SQL for database independence, but also for integrating and bridging the worlds of SQL, XML, J2EE, and Web services.
OracleJVM Lets You Run Java within the Database
Since Oracle8i, Release 1 (Oracle 8.1.5), Oracle has offered a tightly integrated Java virtual machine (JVM) that supports Oracle's database session architecture. Any database session may activate a virtually dedicated JVM during the first Java code invocation; subsequent users then benefit from this already Java-enabled session. In reality, all sessions share the same JVM code and staticsonly private states are kept and garbage collected in an individual session space, to provide Java sessions the same session isolation and data integrity capabilities as SQL operations. There is no need for a separate Java-enabled process for data integrity. This session-based architecture provides a small memory footprint and gives OracleJVM the same linear SMP scalability as the Oracle database.
Creating Java Stored Procedures
There are a few steps involved in turning a Java method into a Java stored procedure. These include loading the Java class into the database using the loadjava utility, and publishing the Java methods using a call specification (Call Spec) to map Java methods, parameter types, and return types to their SQL counterparts. The following section shows how to do this.
I'll use a simple Hello class, with one method, Hello.world(), that returns the string "Hello world":
public class Hello
public static String world ()
return "Hello world";
The Loadjava Utility
Loadjava is a utility for loading Java source files, Java class files, and Java resource files; verifying bytecodes; and deploying Java classes and JAR files into the database. It is invoked either from the command line or through the loadjava() method contained within the DBMS_JAVA class. To load our Hello.class example, type:
loadjava -user scott/tiger Hello.class
As of Oracle9i Release 2, loadjava allows you to automatically publish Java classes as stored procedures by creating the corresponding Call Specs for methods contained in the processed classes. Oracle provides Oracle9i JDeveloper for developing, testing, debugging, and deploying Java stored procedures.
The Resolver Spec
The JDK-based JVM looks for and resolves class references within the directories listed in the CLASSPATH. Because Oracle database classes live in the database schema, the OracleJVM uses a database resolver to look for and resolve class references through the schemas listed in the Resolver Spec. Unlike the CLASSPATH, which applies to all classes, the Resolver Spec is applied on a per-class basis. The default resolver looks for classes first in the schema in which the class is loaded and then for classes with public synonyms.
loadjava -resolve <myclass>
You may need to specify different resolvers, and you can force resolution to occur when you use loadjava, to determine at deployment time any problems that may occur later at runtime.
loadjava -resolve -resolver "((* SCOTT) (foo/bar/* OTHERS)
(* PUBLIC))"
Call Spec and Stored Procedures Invocation
To invoke a Java method from SQL (as well as from PL/SQL and JDBC), you must first publish the public static method through a Call Spec, which defines for SQL the arguments the method takes and the SQL types it returns.
In our example, we'll use SQL*Plus to connect to the database and define a top-level Call Spec for Hello.world():
SQL> connect scott/tiger
SQL> create or replace function helloworld return
VARCHAR2 as language java name 'Hello.world () return
java.lang.String';
Function created.
You can then invoke the Java stored procedure as shown below:
SQL> variable myString varchar2[20];
SQL> call helloworld() into :myString;
Call completed.
SQL> print myString;
MYSTRING
Hello world
Java stored procedures are callable, through their Call Spec, from SQL DML statements (INSERT, UPDATE, DELETE, SELECT, CALL, EXPLAIN PLAN, LOCK TABLE, and MERGE), PL/SQL blocks, subprograms, and packages, as well as database triggers. The beauty of Call Spec is that stored procedure implementations can change over time from PL/SQL to Java or vice versa, transparently to the requesters.
Call Spec abstracts the call interface from the implementation language (PL/SQL or Java) and therefore enables sharing business logic between legacy applications and newer Java/J2EE-based applications. At times, however, when invoking a database-resident Java class from a Java client, you may not want to go through the PL/SQL wrapper. In a future release, Oracle plans to provide a mechanism that will allow developers to bypass the Call Spec.
Advanced Data-Access Control
Java stored procedures can be used to control and restrict access to Oracle data by allowing users to manipulate the data only through stored procedures that execute under their invoker's privileges while denying access to the table itself. For example, you can disable updates during certain hours or give managers the ability to query salary data but not update it, or log all access and notify a security service.
Sharing Data Logic Between Legacy and J2EE Applications
Because legacy applications and J2EE applications both invoke stored procedures through the Call Spec, the same data logic can be shared between J2EE and non-J2EE worlds. Thanks to Call Spec, this data logic can be shared regardless of the implementation language used (whether PL/SQL or Java).
Autogeneration of Primary Keys for BMP Entity Beans
When using BMP for EJB entity beans, a bean instance can be uniquely identified by the auto-generated primary key associated with the newly inserted data as a return value for ejbCreate(). You can retrieve this value within ejbCreate() in one database operation by using a stored procedure that inserts the corresponding data and retrieves or computes the primary key. Alternatively, you could insert the data and retrieve the corresponding key (or ROWID) in one SQL statement, using the RETURN_GENERATED_KEYS feature in JDBC 3.0. However, the stored procedure approach is more portable across JDBC driver versions and databases.
You can implement this pattern with these three steps:
Create the Java stored procedure, defining a public static Java method insertAccount() within a public GenPK class. This method will insert data, compute a unique key (by passing out a sequence number), and return the computed key as primary key.
Define the Call Spec.
CREATE OR REPLACE PROCEDURE insertAccount(owner IN
varchar, bal IN number, newid OUT number)
AS LANGUAGE JAVA NAME 'GenPK.insertAccount(
java.lang.String [])';
Invoke the stored procedure within ejbCreate().
Public AccountPK ejbCreate(String ownerName, int balance) throws CreateException
try {
CallableStatement call = conn.prepareCall{
"{call insertAccount(?, ?, ?)}"};
return new AccountPK(accountID);
Custom Primary Key Finders for CMP Entity Beans
Finder methods are used for retrieving existing EJB entity bean instances. Primary key finders allow you to retrieve a uniquely identified EJB instance. For CMP entity beans, the EJB container automatically generates the primary key finder findByPrimaryKey() method, based on declarative description. In some situations, however, you might need more control; for example, you may need a specialized finder such as findByStoredProcKey(). In these situations, you can use Java stored procedures in conjunction with an object relational framework (such as Oracle9i Application Server [Oracle9iAS] TopLink) to implement a custom primary key finder method. After you define the EJB finder as a REDIRECT or NAMED finder, TopLink will generate the SQL query for retrieving the bean instance.
Data-Driven EJB Invocation
In a data-driven architecture, business logic invocation can be triggered as a result of database operations (such as inserts, updates, or deletes). A Java stored procedure implementing the data logic can be declared as a database trigger to invoke EJBs running in a middle-tier J2EE application server. You can make EJB calls by using either standard remote method invocation (RMI) over Interoperable Inter-ORB Protocol (IIOP), using a J2EE 1.3 compatible server, or RMI over a vendor-specific transport protocol (such as ORMI with Oracle9iAS/OC4J or RMI over T3 with BEA WebLogic). Each application server vendor has its own optimized protocol while providing RMI over IIOP for interoperability. Oracle9iAS supports both RMI calls over IIOP and ORMI protocols.
Data-Driven Messaging
Oracle9i Database embeds Advanced Queuing (AQ), which is an integrated, persistent, reliable, secure, scalable, and transactional message-queuing framework. Oracle exposes AQ features to Java developers through the standard Java Messaging System (JMS) API. Java stored procedures can invoke AQ operations through the JMS interface to allow fast, intra-session, scalable, data-driven messaging.
Java stored procedures can use JMS to invoke AQ operations. You can implement this pattern in four steps:
Create and start the JMS Queue (to do so, embed the following operations within a SQL script):
execute dbms_aqadm.create_queue_table(queue_table =>
'queue1', queue_payload_type =>
'SYS.AQ$_JMS_TEXT_MESSAGE', comment => 'a test queue',
multiple_consumers => false, compatible => '8.1.0');
execute dbms_aqadm.create_queue( queue_name => 'queue1',
queue_table => 'queue1' );
execute dbms_aqadm.start_queue(queue_name => 'queue1');
Create the Java stored procedure (a code snippet is shown):
public static void runTest(String msgBody)
try
// get database connection
ora_drv = new OracleDriver();
db_conn = ora_drv.defaultConnection();
// setup sender (cf online code sample)
// create message
s_msg = s_session.createTextMessage(msgBody);
// send message
sender.send(s_msg);
s_session.commit();
// receive message
r_msg = (TextMessage) receiver.receive();
r_session.commit();
// output message text
String body = r_msg.getText();
System.out.println("message was '"+body+"'");
Create the Call Spec:
create or replace procedure jmsproc (t1 IN VARCHAR)
as language java name 'jmsSample.main (java.lang.String[])';
Invoke the stored procedure:
call jmsproc('hello');
Database-Assisted Web Publishing (Cache Invalidation)
One of the common issues application architects must face is how to cache database information reliably to increase overall system performance. JCACHE is an upcoming standard specification (JSR 107) that addresses this problem. It specifies an approach for temporary, in-memory caching of Java objects, including object creation, shared access, spooling, invalidation, and consistency across JVMs. It can be used to cache read-mostly data such as product catalogs and price lists within JSP. Using JCACHE, most queries will have response times an order of magnitude faster because of cached data (in-house testing showed response times about 15 times faster).
In order to track all the changes to the origin data and refresh the cached data, a Java stored procedure is attached to a table as a trigger. Any change to this table will result in the automatic invocation of this stored procedure, which in turn will call out a defined JSP to invalidate the JCACHE object that maps its state to the database table. Upon invalidation, the very next query will force the cache to be refreshed from the database. Next Steps
READ MORE about Java Stored Procedures
This article is adapted from the white paper "Unleash the Power of Java Stored Procedures." You can find the white paper at:
/tech/java/java_db/pdf/
OW_30820_JAVA_STORED_PROC_paper.PDF
New PL/SQL features in Oracle9i Database, Release 2
/tech/pl_sql/pdf/
Paper_30720_Doc.pdf
Resolver Spec
/docs/products/oracle9i/
doc_library/release2/java.920/a96659.pdf
OracleJVM and Java 2 Security
/docs/products/oracle9i/
doc_library/release2/java.920/a96656.pdf
DOWNLOAD Code
Exercise code examples from this article:
/sample_code/tech/
java/jsp/Oracle9iJSPSamples.html
LEARN about stored procedures as Web services
/tech/webservices
Extending Database Functionality
One of the great things about running Java code directly in the database is the ability to implement new functionality by simply loading the code or library and using the Call Spec to make the entry points (public static methods) available to SQL, PL/SQL, Java, J2EE, and non-Java APIs. Oracle9i Database customers can easily extend database functionality. Oracle itself leverages this capability for new utilities and packages such as the XML Developer Kits (XDKs).
Bridging SQL, PL/SQL, Java, J2EE, .NET, and XML
The Oracle XDK is written in Java and exposes its public methods as Java stored procedures, extending the database's XML programmability. SQL, PL/SQL, Java, J2EE, and non-Java (.NET) business logic all have access to the XML parser, the XSLT processor, the XPath engine, and XML SQL Utility (XSU).
The XML parser is accessible through the xmlparser and xmldom packages. XSU is a Java utility that generates an XML document from SQL queries or a JDBC ResultSet, and writes data from an XML document into a database table or view. Using XSU, XML output can be produced as Text, DOM trees, or DTDs. XSU is exposed to PL/SQL through the dbms_xmlquery and dbms_xmlsave packages.
Conclusion
The integration of the Oracle database with a Java VM enables the creation of portable, powerful, database-independent data logic and persistence logic. The loose coupling of business logic that runs in the middle tier with data logic that runs in the database tier improves application scalability, performance, flexibility, and maintenance.
Kuassi Mensah ([email protected]) is a product manager in the Server Technologies division at Oracle.
http://otn.oracle.com/oramag/oracle/03-jan/o13java.html
Joel Pérez

Windows Server 2012 - Backup failing with Exchange - The application will not be available for recovery from this backup. the consistency check failed

Hi
We have a Windows 2012 server with Exchange 2013, all is working fine except now i am getting issues with the backup.
'Exchange - The application will not be available for recovery from this backup. the consistency check failed for the component Microsoft Exchange Server'
I have checked the database all is fine, i have created a new db and move all mailbox;s over and then removed the old db, i have enabled circular logging and then disabled it, it seems no matter what i do i cannot get a full backup!
i did have to restore the server once and the backups still worked for about 4 days after that and then stopped, i have also tried to remove and re add the backup role!
i am stumped, any advice would be great!

Hi
Ok, i created a test db and tried to back it up right away, it failed, i did not add any mailbox's to it either. i got quite a few events in the windows logs, as well as the same event above i got the following:
Log Name: Application
Source: MSExchangeRepl
Date: 21/01/2013 10:16:30
Event ID: 2038
Task Category: Exchange VSS Writer
Level: Warning
Keywords: Classic
User: N/A
Computer: NERDS-DC01.nerds.local
Description:
Microsoft Exchange VSS Writer backup failed. No log files were truncated. Instance 75754d0d-8dfe-4909-8beb-5a4f824254a9. Database 4843b37c-7b3c-42b2-8b57-1393615c2c15.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="MSExchangeRepl" />
<EventID Qualifiers="32772">2038</EventID>
<Level>3</Level>
<Task>2</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2013-01-21T10:16:30.000000000Z" />
<EventRecordID>261645</EventRecordID>
<Channel>Application</Channel>
<Computer>NERDS-DC01.nerds.local</Computer>
<Security />
</System>
<EventData>
<Data>75754d0d-8dfe-4909-8beb-5a4f824254a9</Data>
<Data>4843b37c-7b3c-42b2-8b57-1393615c2c15</Data>
</EventData>
</Event>
AND
Log Name: Application
Source: MSExchangeRepl
Date: 21/01/2013 10:16:30
Event ID: 2038
Task Category: Exchange VSS Writer
Level: Warning
Keywords: Classic
User: N/A
Computer: NERDS-DC01.nerds.local
Description:
Microsoft Exchange VSS Writer backup failed. No log files were truncated. Instance 75754d0d-8dfe-4909-8beb-5a4f824254a9. Database db5826f3-1029-4219-ad80-441a0e94537a.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="MSExchangeRepl" />
<EventID Qualifiers="32772">2038</EventID>
<Level>3</Level>
<Task>2</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2013-01-21T10:16:30.000000000Z" />
<EventRecordID>261646</EventRecordID>
<Channel>Application</Channel>
<Computer>NERDS-DC01.nerds.local</Computer>
<Security />
</System>
<EventData>
<Data>75754d0d-8dfe-4909-8beb-5a4f824254a9</Data>
<Data>db5826f3-1029-4219-ad80-441a0e94537a</Data>
</EventData>
</Event>
and
Log Name: Application
Source: MSExchangeRepl
Date: 21/01/2013 10:16:30
Event ID: 2034
Task Category: Exchange VSS Writer
Level: Error
Keywords: Classic
User: N/A
Computer: NERDS-DC01.nerds.local
Description:
The Microsoft Exchange Replication service VSS Writer (Instance 75754d0d-8dfe-4909-8beb-5a4f824254a9) failed with error FFFFFFFC when processing the backup completion event.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="MSExchangeRepl" />
<EventID Qualifiers="49156">2034</EventID>
<Level>2</Level>
<Task>2</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2013-01-21T10:16:30.000000000Z" />
<EventRecordID>261649</EventRecordID>
<Channel>Application</Channel>
<Computer>NERDS-DC01.nerds.local</Computer>
<Security />
</System>
<EventData>
<Data>75754d0d-8dfe-4909-8beb-5a4f824254a9</Data>
<Data>FFFFFFFC</Data>
</EventData>
</Event>
and
Log Name: Application
Source: SPP
Date: 21/01/2013 10:16:30
Event ID: 16389
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: NERDS-DC01.nerds.local
Description:
Writer Microsoft Exchange Writer experienced retryable error during shadow copy creation. Retrying... More info: .
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="SPP" />
<EventID Qualifiers="0">16389</EventID>
<Level>2</Level>
<Task>0</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2013-01-21T10:16:30.000000000Z" />
<EventRecordID>261650</EventRecordID>
<Channel>Application</Channel>
<Computer>NERDS-DC01.nerds.local</Computer>
<Security />
</System>
<EventData>
<Data>Microsoft Exchange Writer</Data>
<Data>
</Data>
<Data>The writer experienced a transient error. If the backup process is retried, the error may not reoccur. (0x800423F3)</Data>
<Data>
</Data>
<Binary>00000000A5120000981200000000000042BEB7C511CAC619E59C92030000000000000000</Binary>
</EventData>
</Event>

Time Machine Full Recovery From Wireless Backup

I am reporting this as a learning experience for all.
I just had a hard drive fail on my MacBook. Apple Care was golden and now I have a clean install with Leopard.
Here is my situation. MacBook backed up via time machine on a wireless network. Windows on boot camp partition. Dedicated disk (1TB Buffalo) connected to wife's iMac via firewire.
I want to do a full recovery. I start up leopard and it gives me the option to do a full recovery during setup. I have no time machine backup list provided. I continue with setup. When I go to the recover option of time machine I am asked to select a file. This is when I call tech support.
Tech Support says I should have had the backup disk connected directly via firewire for setup to see it. Time machine stores the backup under these conditions as a Sparse bundle. Now that I am past setup I have to do a migration rather than a recovery. I start up the migration assistant and I still cannot access the sparse bundle as it is grayed out. The leap here is you have to mount the sparse bundle, double click, then it shows up as a disk you can migrate from.
Windows Partition, as TS puts it is "on its own".
The confusion for me was that I could not recover exactly as I had backed up.
I hope this helps someone. Cheers!

coma53 wrote:
It seems to be. All music, photos, applications, etc. are present. I have not had time to do a complete review of my migration, I will track this post for a while and report any problems. There were significant OS updates which were required. I thought that they would be included as part of the migrate, not so. Perhaps they are on a true "full recovery" from setup, any one know?
Migration has no option for any part of the OS. It assumes you already have it.
If you recover your entire system from a TM backup (assuming nothing significant was excluded), you get your entire system back exactly the way it was: the OS, Apps, settings, preferences, user data, etc.
You don't load a new OS from the Leopard Install disc -- you only use it for it's copy of Disk Utility and the installer -- everything else comes from your TM backup.

RMAN RECOVERY FROM INCREMENTAL BACKUP

Hi,
hOW TO RECOVER FROM INCREMENTAL BACKUP ?
Any script for recovery from Full+incremental BACKUP?
Many thanks before.

Thank you,
but I can not connect from RMAN when database is shutdowned.
C:\Documents and Settings\Farid>rman catalog rman/****@reprman target sys/***@bd1
Recovery Manager: Release 10.2.0.1.0 - Production on Ven. FÚvr. 16 18:28:19 2007
Copyright (c) 1982, 2005, Oracle. All rights reserved.
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: Úchec de l'initialisation du gestionnaire de rÚcupÚration interne
RMAN-04005: erreur de la base de donnÚes cible :
ORA-12514: TNS : le processus d'Úcoute ne conna¯t pas actuellement le service demandÚ dans le descripteur de connexion
tnsping BD1Adaptateur TNSNAMES utilisÚ pour la rÚsolution de l'alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST
E = bd1.virtualparc.com)))
OK (30 msec)

OneKey Recovery from External drive

Hello,
I am trying to restore my system with Onekey Recovery from an external drive. However, onekey recovery onlys shows two path for selecting images - C drive and E drive.
When I select the E drive, it is not pointing to my external drive. How do I change the configuration such that Onekey can access my external drive to find the backedup image ?
Thanks,
Kim

Hi @Karlo12333 ,
Welcome to the HP Forums!
It is a great place to find answers and information!
For you to have the best experience in the HP forum I would like to direct your attention to the HP Forums Guide Learn How to Post and More
Unfortunately coping the partition does not work. I don't thing you will be able to create recovery media from the copied partition on the external but if you have not yet created recovery disk you can give it a try.
If you cannot create the recovery media you will need to contact HP support to obtain a recovery kit.
Please call our technical support at 800 474 6836. If you live outside the US/Canada Region, please click the link below to get a support number for your region.
World Wide Phone Support
Good Luck!
Sparkles1
I work on behalf of HP
Please click “Accept as Solution ” if you feel my post solved your issue, it will help others find the solution.
Click the “Kudos, Thumbs Up" on the bottom right to say “Thanks” for helping!

Recovery from split brain

Similar Messages

Maybe you are looking for