Master failover

I have setup a replication group with 4 nodes (1 master, 3 replica nodes).
1. When there is a failure in current master node, the election is triggered. Is it possible to specify which replica node should be selected as next master either programmatically or through some configuration?
2. I want to specify the isolation level for my read operations as READ_COMMITTED. The API allows to specify this only if I use a Transaction object. How can I set isolation level without using a transaction?
primaryKey.get(Transaction, PK, LockMode)
Thanks
PS

1. When there is a failure in current master node, the election is triggered. Is it possible to specify which replica node should be selected as next master either programmatically or through some configuration?You can use rep_set_priority() to assign an integer priority value to each site in your replication group. Larger integer values indicate more desirable election winners. There is a special priority value of 0 that indicates that a site should never win an election.
An election determines a winner first by determining the client with the most recent log records. If clients are tied in this respect, the election chooses the client with this highest priority value. If both log records and client priorities are tied, a winner is selected at random.
2. I want to specify the isolation level for my read operations as READ_COMMITTED. The API allows to specify this only if I use a Transaction object. How can I set isolation level without using a transaction?
primaryKey.get(Transaction, PK, LockMode)I'm not an expert in transactions, but I do not see a way you can achieve READ_COMMITTED isolation level 2 guarantees without transactions. But I have asked some of our transaction experts about this, and one of us will respond either to confirm this or let you know about anything I have missed.
Paula Bingham
Oracle

Similar Messages

  • Selecting master failover

    I have two questions regarding replicated applications.
    I have setup a replication group with 4 nodes (1 master, 3 replica nodes) using Berkeley DB JE 4.0.103. I am using Direct Persistence Layer (DPL) java API, specifically com.sleepycat.je.rep.ReplicatedEnvironment class.
    1. When there is a failure in current master node, the election is triggered. Is it possible to specify which replica node should be selected as next master either programmatically or through some configuration?
    2. I want to specify the isolation level for my read operations as READ_COMMITTED. The API allows to specify this only if I use a Transaction object. How can I set isolation level without using a transaction?
    primaryKey.get(Transaction, PK, LockMode)
    Thanks
    PS

    Hi,
    I'll leave your first question for someone else to answer.
    On your second question:
    2. I want to specify the isolation level for my read operations as READ_COMMITTED. The API allows to specify this only if I use a Transaction object. How can I set isolation level without using a transaction?primaryKey.get(Transaction, PK, LockMode)>
    This general issue is addressed here:
    http://www.oracle.com/technology/documentation/berkeley-db/je/TransactionGettingStarted/isolation.html#readcommitted
    It doesn't mention the fact that you can use LockMode.READ_COMMITTED without also using a transaction, and the reason for this omission is that it isn't really useful to do so. If you do not pass a transaction (you pass null) to a get() method, then no lock will be held by the operation, which is effectively the same as using READ_COMMITTED.
    If this doesn't answer your question, please describe more about why you want to use READ_COMMITTED, and what behavior you are expecting.
    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • OVM-2020 Server pool master  can not be set to maintenance mode

    Hi Guys.
    I have 2 server in ha mode
    server 1 (10.99.99.161) with oracle Manager installed (server pool master, utility server, vm server)
    Server 2 (10.99.99.161) without Oracle Manager installed (utility server, vmserver)
    HA enable with server1 and server 2
    10.99.99.165 is the virtual server pool master
    1 San connected with server 1 and server 2.
    Without problem I can connect on https://10.99.99.165:4443/OVS without problem.
    All seems work, but :
    1 - I I try to set in maintenance mode the server 1, the VM Manager say me:
    OVM-2020 Server pool master (10.99.99.162) can not be set to maintenance mode, please use server pool master failover policy to change its role, then try again.
    What I shuld do ?
    2 - If I turn off the server 1 the server pool master is attribuite to server 2, but I'm unable to connect on https://10.99.99.165:4443/OVS webpage. It is correct, it is a bug, or I'm doing something wrong ? Must I upgrate something ?
    Many Thanks
    Luca

    user8857532 wrote:
    1 - I I try to set in maintenance mode the server 1, the VM Manager say me:
    What I shuld do ?You need to live migrate all your guests off the current Master server, then issue service ovs-agent stop on the command-line. This will cause the pool mastery to switch to another server. Once that's done, you can start the agent agent.
    2 - If I turn off the server 1 the server pool master is attribuite to server 2, but I'm unable to connect on https://10.99.99.165:4443/OVS webpage. It is correct, it is a bug, or I'm doing something wrong ? Must I upgrate something ?
    You need to ensure the VM that's running the Oracle VM Manager software is still running. Live Migrate it first (preferred option) and flag it HA-enabled so that it is automatically restarted in the case of a server failure.

  • Set Server Pool Master to Maintenance Mode

    Hi all,
    when trying to set a server pool master to maintenance, I got this: "OVM-2020 Server pool master (vm-13) can not be set to maintenance mode, please use server pool master failover policy to change its role, then try again."
    I have dug through the help and docs, but can not find a method for dynamically reassigning the server pool master role.
    This document: http://download.oracle.com/docs/cd/E15458_01/doc.22/e15441/server.htm#CCHIEBCE , just says "You must first reassign the Server Pool Master role to another server in the server pool.", without describing how.
    This document: http://download.oracle.com/docs/cd/E15458_01/doc.22/e15441/site.htm#insertedID4 , states "You can also dynamically change the Oracle VM Server which acts as the Server Pool Master without causing any outages. See Section 3.4.1, "Editing Server Pool".", which just links back to the same section I am already in.
    Now, I can just migrate all my machines and then stop ovs-agent, but I am not always the one managing this, and having a nice way through the web interface would be great.
    Please just tell me I am blind and point me to the section in the manuals which tells me how to do it, or is this a feature which got dropped before release?
    Thanks.

    I have got the same problem, I want to switch the role of the server pool master to another server in the same pool. But after reading these answers, I have absolutly no clue how to do it!
    We have a productiv environment, so any ideas like shutting down or stopping VMs are not acceptable.
    So here is the situation:
    "ovs#1" is hosting all VMs at the moment - Server type: Utility Server,Virtual Machine Server
    "ovs#2" is free of any VMs - Server type: Server Pool Master,Utility Server,Virtual Machine Server
    I want to switch the Server Pool Master from "ovs#2" to "ovs#1" without any downtime. How does this work? Thanks!
    Specifications:
    - Oracle VM Manager 2.2.0
    - Oracle VM server release 2.2.1
    Edited by: user11932329 on 15.09.2010 04:50

  • [SOLVED] Cannot bring wifi interface up

    Hello,
    A bit of a newbie here...
    I've just installed archlinux on a laptop and I'm trying to get it to use a static IP on the wireless (already suceeded to get it working with a static IP on the wired network) via WEP to an access point that doesn't broadcast its ESSID.
    I have set up a netctl profile which runs at boot time. Said profile seems to run correctly and I even get an IP address for the wireless. However, it doesn't seem to bring the wifi interface up and typing "ip link set <INTERFACE> up" as root seems to have no effect. I have already installed the firmware for the card and I'm not successful in getting a connection.
    This is the output of the commands I've tried so far:
    # ip addr show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
    2: wlp2s3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether 00:12:f0:02:47:bd brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.6/26 brd 192.168.1.63 scope global wlp2s3
    valid_lft forever preferred_lft forever
    3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:11:43:4b:d6:ec brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.5/26 brd 192.168.1.63 scope global enp2s0
    valid_lft forever preferred_lft forever
    inet6 fe80::211:43ff:fe4b:d6ec/64 scope link
    valid_lft forever preferred_lft forever
    # iw dev
    phy#0
    Interface wlp2s3
    ifindex 2
    wdev 0x1
    addr 00:12:f0:02:47:bd
    type managed
    # iw dev wlp2s3 link
    Not connected.
    # ip link set wlp2s3 up
    <this command doesn't produce any output>
    # ip addr show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
    2: wlp2s3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether 00:12:f0:02:47:bd brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.6/26 brd 192.168.1.63 scope global wlp2s3
    valid_lft forever preferred_lft forever
    3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:11:43:4b:d6:ec brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.5/26 brd 192.168.1.63 scope global enp2s0
    valid_lft forever preferred_lft forever
    inet6 fe80::211:43ff:fe4b:d6ec/64 scope link
    valid_lft forever preferred_lft forever
    # iw dev
    phy#0
    Interface wlp2s3
    ifindex 2
    wdev 0x1
    addr 00:12:f0:02:47:bd
    type managed
    # iw dev wlp2s3 link
    Not connected.
    # lspci -k
    02:03.0 Network controller: Intel Corporation PRO/Wireless 2200BG [Calexico2] Network Connection (rev 05)
    Subsystem: Intel Corporation Dell Latitude D600
    Kernel driver in use: ipw2200
    Kernel modules: ipw2200
    # dmesg | grep ipw2200
    [ 6.606394] ipw2200: Intel(R) PRO/Wireless 2200/2915 Network Driver, 1.2.2kmprq
    [ 6.606401] ipw2200: Copyright(c) 2003-2006 Intel Corporation
    [ 6.608009] ipw2200: Detected Intel PRO/Wireless 2200BG Network Connection
    [ 6.854658] ipw2200: Detected geography ZZD (13 802.11bg channels, 0 802.11a channels)
    # rfkill list
    0: phy0: Wireless LAN
    Soft blocked: no
    Hard blocked: no
    As a reference, here you have the netctl profile I created:
    Description='WEP encrypted wireless connection'
    Interface=wlp2s3
    Connection=wireless
    Security=wep
    ESSID='MYNETWORKNAME'
    # Prepend \" to hexadecimal keys
    Key=\"01234567890ABCDEF012345678
    IP=static
    Address=('192.168.1.6/26')
    Gateway='192.168.1.XXX'
    DNS=('XXX.XXX.XXX.XXX' 'YYY.YYY.YYY.YYY')
    SkipNoCarrier=yes
    # Uncomment this if your ssid is hidden
    Hidden=yes
    # Uncomment if you are using an ad-hoc connection
    #AdHoc=yes
    Thanks in advance for your help/advice!!
    Last edited by AMaudio (2013-11-09 21:08:35)

    Thanks for the pointers, Strike0. The wired-wireless failover scenario does look quite interesting. So I'd like to explore that option more extensively. I attempted to create a master bond/failover connection via netctl with two slaves: one for the wired connection and another one for the wireless connection. For the wired connection I used a netctl profile and for the wireless connection I used a systemd service manually setting up the wireless connection, somewhat following the wiki pages you refer to.
    All three services start up correctly, but when I unplug the ethernet cable, the wireless network doesn't take over. It seems that it's not really connected.
    Here's the details of my configuration:
    The master failover (bond) netctl profile:
    # cat /etc/netctl/failover
    Description='A wired connection with failover to wireless'
    Interface='bond0'
    Connection=bond
    BindsToInterfaces=('enp2s0' 'wlp2s3')
    IP=static
    Address="192.168.1.5/26"
    Gateway='192.168.1.1'
    DNS=('XXX.XXX.XXX.XXX' 'YYY.YYY.YYY.YYY')
    SkipNoCarrier='no'
    The failover wired netctl profile:
    # cat /etc/netctl/failover_wired
    Description='Failover wired network'
    Interface=enp2s0
    Connection=ethernet
    IP=no
    SkipNoCarrier=yes
    ## For IPv6 autoconfiguration
    #IP6=stateless
    ## For IPv6 static address configuration
    #IP6=static
    #Address6=('1234:5678:9abc:def::1/64' '1234:3456::123/96')
    #Routes6=('abcd::1234')
    #Gateway6='1234:0:123::abcd'
    The systemd service for the wireless network:
    # cat /etc/systemd/system/network-wireless\@.service
    [Unit]
    Description=Wireless network connectivity (%i)
    Wants=network.target
    Before=network.target
    BindsTo=sys-subsystem-net-devices-%i.device
    After=sys-subsystem-net-devices-%i.device
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    #EnvironmentFile=/etc/conf.d/network-wireless@%i
    ExecStart=/usr/bin/ip link set dev %i up
    ExecStart=/usr/bin/iwconfig wlp2s3 essid MYNETWORKNAME
    ExecStart=/usr/bin/iwconfig wlp2s3 key 0123456789ABCDEF
    #ExecStart=/usr/bin/ip addr add ${address}/${netmask} broadcast ${broadcast} dev %i
    #ExecStart=/usr/bin/ip route add default via ${gateway}
    #ExecStop=/usr/bin/ip addr flush dev %i
    ExecStop=/usr/bin/ip link set dev %i down
    [Install]
    WantedBy=multi-user.target
    After rebooting the machine, I see that all three profile/services started successfully:
    # netctl status failover
    [email protected] - A wired connection with failover to wireless
    Loaded: loaded (/etc/systemd/system/[email protected]; enabled)
    Active: active (exited) since Sat 2013-11-09 13:59:27 GMT; 34min ago
    Docs: man:netctl.profile(5)
    Process: 165 ExecStart=/usr/lib/network/network start %I (code=exited, status=0/SUCCESS)
    Main PID: 165 (code=exited, status=0/SUCCESS)
    Nov 09 13:59:24 mycomputer systemd[1]: Starting A wired connection with failover to wireless...
    Nov 09 13:59:27 mycomputer network[165]: Starting network profile 'failover'...
    Nov 09 13:59:27 mycomputer network[165]: RTNETLINK answers: File exists
    Nov 09 13:59:27 mycomputer systemd[1]: Started A wired connection with failover to wireless.
    Nov 09 13:59:28 mycomputer systemd[1]: Started A wired connection with failover to wireless.
    # netctl status failover_wired
    netctl@failover_wired.service - Failover wired network
    Loaded: loaded (/etc/systemd/system/netctl@failover_wired.service; enabled)
    Active: active (exited) since Sat 2013-11-09 13:59:27 GMT; 34min ago
    Docs: man:netctl.profile(5)
    Process: 163 ExecStart=/usr/lib/network/network start %I (code=exited, status=0/SUCCESS)
    Main PID: 163 (code=exited, status=0/SUCCESS)
    Nov 09 13:59:24 mycomputer systemd[1]: Starting Failover wired network...
    Nov 09 13:59:27 mycomputer systemd[1]: Started Failover wired network.
    Nov 09 13:59:28 mycomputer systemd[1]: Started Failover wired network.
    # systemctl status [email protected]
    [email protected] - Wireless network connectivity (wlp2s3)
    Loaded: loaded (/etc/systemd/system/[email protected]; enabled)
    Active: active (exited) since Sat 2013-11-09 13:59:25 GMT; 36min ago
    Process: 177 ExecStart=/usr/bin/iwconfig wlp2s3 key 12345667890ABCDEF (code=exited, status=0/SUCCESS)
    Process: 175 ExecStart=/usr/bin/iwconfig wlp2s3 essid MYNETWORKNAME (code=exited, status=0/SUCCESS)
    Process: 164 ExecStart=/usr/bin/ip link set dev %i up (code=exited, status=0/SUCCESS)
    Main PID: 177 (code=exited, status=0/SUCCESS)
    Nov 09 13:59:24 mycomputer systemd[1]: Starting Wireless network connectivity (wlp2s3)...
    Nov 09 13:59:27 mycomputer systemd[1]: Started Wireless network connectivity (wlp2s3).
    This is the output from ip addr show, which as far as I can tell, it's all correct:
    # ip addr show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
    valid_lft forever preferred_lft forever
    2: enp2s0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 00:11:43:4b:d6:ec brd ff:ff:ff:ff:ff:ff
    3: wlp2s3: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc pfifo_fast master bond0 state DOWN qlen 1000
    link/ether 00:11:43:4b:d6:ec brd ff:ff:ff:ff:ff:ff
    4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 00:11:43:4b:d6:ec brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.5/26 brd 192.168.1.63 scope global bond0
    valid_lft forever preferred_lft forever
    inet6 fe80::211:43ff:fe4b:d6ec/64 scope link
    valid_lft forever preferred_lft forever
    And here's the output from iwconfig. Notice that it says "Access point: not-associated"
    # iwconfig
    wlp2s3 IEEE 802.11bg ESSID:"MYNETWORKNAME"
    Mode:Managed Channel:0 Access Point: Not-Associated
    Bit Rate:0 kb/s Tx-Power=20 dBm Sensitivity=8/0
    Retry limit:7 RTS thr:off Fragment thr:off
    Encryption key:900A-0B0D-4B57-C007-1F5D-5C63-79 Security mode:open
    Power Management:off
    Link Quality:0 Signal level:0 Noise level:0
    Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
    Tx excessive retries:0 Invalid misc:0 Missed beacon:0
    lo no wireless extensions.
    enp2s0 no wireless extensions.
    bond0 no wireless extensions.

  • Pool Master server failover  issue in Oracle VM 2.2.1

    Hello All , We are new to oracle VM world. Sorry about detailed explanation.
    Our current configuration is, server1-poolmaster/utility/VM server & Server2-utility/VM server
    We have guest-VM running on both servers, and serverpool-VIP is configured properly. Below is our OVS-version.
    #rpm -qa | grep -i ovs
    oracle-logos-4.9.17-7ovs
    enterprise-linux-ovs-5-1.0
    ovs-release-2.2-1.0
    ovs-utils-1.0-34
    kernel-ovs-2.6.18-128.2.1.4.25.el5
    ovs-agent-2.3-42
    When we tested HA failover(shutting down server1), it work fine as expected. Pool master moved from server1 to server2, and guest VM restarted on server-2(which was running on server1 earlier).
    Now-- Pool master is server2.
    When we shutdown server2 now, pool master is not migrated to server1 & the guest-VMs(running on server 2) all went to power-off mode & serverpool is "inactive status'.
    Found below error in server1's /var/log/messages.. It seems like some 'dead-lock situation, and the serverpool-VIP is not moved from server2 to server1, until server2 came up online". Why is it so? The expected result should be "pool-master" & serverpool-VIP should moved to server1, but it didnt.
    Anyone experienced this? Any help/ input is appreciated.
    log file from server1's /var/log/ovs-agent/ovs_remaster.log
    2011-01-14 01:47:56 INFO=> run(): release_master_dlm_lock ...
    2011-01-14 01:48:02 INFO=> run(): release_master_dlm_lock ...
    2011-01-14 01:48:08 INFO=> run(): release_master_dlm_lock ...
    2011-01-14 01:48:14 INFO=> run(): release_master_dlm_lock ...
    2011-01-14 01:48:20 INFO=> run(): release_master_dlm_lock ...
    2011-01-14 01:48:26 INFO=> run(): release_master_dlm_lock ...
    ***** At this time its waiting to release the server pool-VIP on server 2
    *** Once server2 came online, serverpool-VIP released and taken by server1***
    2011-01-14 01:54:11 INFO=> cluster_get_next_master: => {"status": "SUCC", "value": "10.24.60.41"}
    2011-01-14 01:54:11 INFO=> run(): cluster_get_next_master: => {"status": "SUCC", "value": "10.24.60.41"}
    2011-01-14 01:54:13 INFO=> run(): clusterm_setup_master_env: => {"status": "SUCC"}
    2011-01-14 01:54:20 INFO=> run(): i am the new master. vip=10.24.60.45
    truncated logs from server1's /var/log/messages
    Jan 14 01:46:40 fwblade1 kernel: ocfs2_dlm: Node 1 leaves domain 70FFE4CF84634F5DB61BEA66E04693A7
    Jan 14 01:46:40 fwblade1 kernel: ocfs2_dlm: Nodes in domain ("70FFE4CF84634F5DB61BEA66E04693A7"): 0
    Jan 14 01:47:59 fwblade1 kernel: ocfs2_dlm: Node 1 leaves domain ovm
    Jan 14 01:47:59 fwblade1 kernel: ocfs2_dlm: Nodes in domain ("ovm"): 0
    Jan 14 01:48:55 fwblade1 kernel: o2net: connection to node fwblade2.wg.kns.com (num 1) at 10.24.60.42:7777 has been idle for 30.0 second
    s, shutting it down.
    Jan 14 01:48:55 fwblade1 kernel: (0,0):o2net_idle_timer:1503 here are some times that might help debug the situation: (tmr 1294987705.66
    5702 now 1294987735.663612 dr 1294987705.665695 adv 1294987705.665724:1294987705.665725 func (53ed487f:505) 1294987705.665424:1294987705
    .665428)
    Jan 14 01:48:55 fwblade1 kernel: o2net: no longer connected to node fwblade2.wg.kns.com (num 1) at 10.24.60.42:7777
    Jan 14 01:48:55 fwblade1 kernel: (5190,0):dlm_send_remote_lock_request:333 ERROR: status = -112
    Jan 14 01:48:55 fwblade1 kernel: (5186,2):dlm_send_remote_lock_request:333 ERROR: status = -107
    Jan 14 01:48:55 fwblade1 kernel: (5190,0):dlm_send_remote_lock_request:333 ERROR: status = -107
    Jan 14 01:48:55 fwblade1 kernel: (5186,2):dlm_send_remote_lock_request:333 ERROR: status = -107
    ** the above messages is repeated till server1 came online ***
    Jan 14 01:48:57 fwblade1 kernel: (4694,2):dlm_drop_lockres_ref:2211 ERROR: status = -107
    Jan 14 01:48:57 fwblade1 kernel: (4694,2):dlm_purge_lockres:206 ERROR: status = -107
    Jan 14 01:48:57 fwblade1 kernel: (4694,2):dlm_drop_lockres_ref:2211 ERROR: status = -107
    Jan 14 01:48:57 fwblade1 kernel: (4694,2):dlm_purge_lockres:206 ERROR: status = -107
    Jan 14 01:49:30 fwblade1 kernel: (4651,0):ocfs2_dlm_eviction_cb:98 device (253,0): dlm has evicted node 1
    Jan 14 01:49:30 fwblade1 kernel: (32373,0):dlm_get_lock_resource:844 78CD07B6D4C34CEAB756BF56E6D9C561:M00000000000000000002182aa14db5: a
    t least one node (1) to recover before lock mastery can begin
    ** Still no sign of server1 taking up the serverpool-VIP, all the guest-VM are still power-off status***
    Jan 14 01:49:35 fwblade1 kernel: (4695,0):dlm_get_lock_resource:844 78CD07B6D4C34CEAB756BF56E6D9C561:$RECOVERY: at least one node (1) to
    recover before lock mastery can begin
    Jan 14 01:49:35 fwblade1 kernel: (4695,0):dlm_get_lock_resource:878 78CD07B6D4C34CEAB756BF56E6D9C561: recovery map is not empty, but mus
    t master $RECOVERY lock now
    Jan 14 01:49:35 fwblade1 kernel: (4695,0):dlm_do_recovery:524 (4695) Node 0 is the Recovery Master for the Dead Node 1 for Domain 78CD07
    B6D4C34CEAB756BF56E6D9C561
    ** still no luck.. all guest VM are down***
    Jan 14 01:53:59 fwblade1 kernel: (5186,1):dlm_send_remote_lock_request:333 ERROR: status = -92
    Jan 14 01:53:59 fwblade1 kernel: (5190,10):dlm_send_remote_lock_request:333 ERROR: status = -92
    Jan 14 01:53:59 fwblade1 kernel: (5186,1):dlm_send_remote_lock_request:333 ERROR: status = -92
    Jan 14 01:53:59 fwblade1 kernel: (5190,10):dlm_send_remote_lock_request:333 ERROR: status = -92
    Jan 14 01:53:59 fwblade1 kernel: (5186,1):dlm_send_remote_lock_request:333 ERROR: status = -92
    Jan 14 01:54:00 fwblade1 kernel: (5190,10):dlm_send_remote_lock_request:333 ERROR: status = -92
    Jan 14 01:54:00 fwblade1 kernel: (5186,1):dlm_send_remote_lock_request:333 ERROR: status = -92
    Jan 14 01:54:00 fwblade1 kernel: ocfs2_dlm: Node 1 joins domain 78CD07B6D4C34CEAB756BF56E6D9C561
    Jan 14 01:54:00 fwblade1 kernel: ocfs2_dlm: Nodes in domain ("78CD07B6D4C34CEAB756BF56E6D9C561"): 0 1
    Jan 14 01:54:00 fwblade1 kernel: (5190,10):dlmlock_remote:269 ERROR: dlm status = DLM_IVLOCKID
    Jan 14 01:54:00 fwblade1 kernel: (5190,10):dlmlock:747 ERROR: dlm status = DLM_IVLOCKID
    Jan 14 01:54:00 fwblade1 kernel: (5190,10):ocfs2_lock_create:997 ERROR: DLM error DLM_IVLOCKID while calling dlmlock on resource F000000
    000000000000a50dd279960c: bad lockid
    Jan 14 01:54:00 fwblade1 kernel: (5190,10):ocfs2_file_lock:1584 ERROR: status = -22
    Jan 14 01:54:00 fwblade1 kernel: (5190,10):ocfs2_do_flock:79 ERROR: status = -22
    Jan 14 01:54:00 fwblade1 kernel: (5186,1):dlmlock_remote:269 ERROR: dlm status = DLM_IVLOCKID
    Jan 14 01:54:00 fwblade1 kernel: (5186,1):dlmlock:747 ERROR: dlm status = DLM_IVLOCKID
    Jan 14 01:54:00 fwblade1 kernel: (5186,1):ocfs2_lock_create:997 ERROR: DLM error DLM_IVLOCKID while calling dlmlock on resource F0000000
    00000000000a50dd279960c: bad lockid
    Jan 14 01:54:00 fwblade1 kernel: (5186,1):ocfs2_file_lock:1584 ERROR: status = -22
    Jan 14 01:54:00 fwblade1 kernel: (5186,1):ocfs2_do_flock:79 ERROR: status = -22
    Jan 14 01:54:05 fwblade1 kernel: ocfs2_dlm: Node 1 joins domain 70FFE4CF84634F5DB61BEA66E04693A7
    Jan 14 01:54:05 fwblade1 kernel: ocfs2_dlm: Nodes in domain ("70FFE4CF84634F5DB61BEA66E04693A7"): 0 1
    ** Now server2 came online(old pool-master) and server-pool-VIP is moved to server1.** All guest-VM are restarted on SERVER2 itself.
    Thanks
    Prakash

    You might be running into a OCFS2 bug. Check the bug list for bug 1099
    http://oss.oracle.com/bugzilla/show_bug.cgi?id=1099
    also related to this subject might be bug 1095 and 1080. You might want to check with the OCFS2 guys at Oracle and participate in resolving this bug. Not sure this is the case however I think this is a good starting point.
    Please keep us posted.
    Regards,
    Johan Louwers.

  • Reliability: OD Master & Replica, or IP-based failover?

    I have an Xserve that serves approx. 60 users and around 200GB of shared files. The primary Xserve is the OD master for my workgroup, and the backup Xserve is an OD replica.
    I'm new to OS X Server (but not client), so I don't know what happens in the event of, say, the OD master dying. Do client login/access requests automatically get answered by the OD replica? The reason I'm asking this is I'm trying to decide between using two separate servers with an OD master & replica relationship, or using IP-based failover where the backup server would effectively "pretend" to be the primary server in case of a problem.
    But with IP-based failover, I don't know if the backup server will automatically know to become an OD master or not. Or, perhaps, the OD integration is completely separate; and I can choose to use IP-based failover and do OD replication whichever way I want to. Does anyone have concrete info?

    If your OD Master fails the OD Replica will automatically start handling authentication requests so for the most part you're covered (at least users can still log in).
    As noted in that .pdf, though, the replica doesn't allow changes to the directory other than password changes for users with Open Directory passwords.
    The main issue is how long your master is going to be down, and the relative pain of promoting the replica to be a master.
    For various reasons there is no automatic promote-to-master function because most failures of the master are temporary (e.g. a simple reboot after a software update).
    For the failover question I would use IP failover for any services that don't have their own failover mechanism. In all cases you need to write failover scripts to startup whatever processes you need your standby machine to take over. In the case of OD there is a built-in, automatic failover process so I would use that. The only exception to that would be if you wanted the replica to become a master server should the master fail, but I can't think of any circumstance where that would be the case.

  • How to achieve failover when the master definition site is down ?

    I have two Oracle 8i instances in a multi-master async replication environment. Two J2EE application servers connect to each of the Oracle instance respectively. When one of the oracle instance is down, I'd like to have my J2EE application failover to the second Oracle instance automatically without human intervention. I understand I can use OCI driver to allow my J2EE app to connect to the other Oracle instance when the first is down. But here are some questions:
    1. If I understand it correctly, there's only one master definition site. Let's say my master definition site sits with the first Oracle instance. If this server is down, the DBA has to go in to config the second Oracle instance to be the master definition site. Can this be done automatically without the DBA ?
    2. If the master definition site is down and the DBA is yet to come in, will the second Oracle instance still work ? I do understand the replication won't work, but will normal database read/write work ? What happens between the master definition site down and the master definition site reconfigured ?
    3. Are there any low cost third party tools can handle what I wanted ?
    Thanks in advance,

    To answer your questions.
    1. Yes there is only one master def site, it is not important to relocate the master def site, in the event of its failure. The def site is used as a the source for replication definition generation.
    2. It depends upon what type of multi-master replication you have implemented. If you are using asynch replication, the site that is up will continue to allow updates and will queue the transactions. When the other site(s) are active again, it will forward the transactions. Note, this can cause untold problems if you have a heavy used system, and narrow bandwidth. If on the other hand you are using synchanous replication, when one site goes down, the other site(s) lock their tables to prevent dml/ddl changes. Then you have to decide whether or not you can wait for the down site to be restored, or break replication to continue operations.
    3. Yes, there are a few, one that comes to mind, I believe is Shareplex.

  • MySQL Master/Slave w/IP-Failover

    Hello All,
    I just currently setup a MySQL Master and Slave which are working great! I used these instructions and it is working excellent on Leopard Server 10.5.8 running MySQL 5.1.37 GA release:
    http://homepage.mac.com/kelleherk/iblog/C711669388/E351220100/index.html
    My question is, is it possible to setup an ip-failover on these servers so that when the MySQL Master server goes down my clients can use the MySQL Slave?
    Or do I need to setup a master/master environment first that both replicate to each other?
    Any help would be greatly appreciated!
    Thank you very much!

    There are many ways of achieving this, but the 'right' approach depends a lot on your network infrastructure and environment.
    For example, you probably have some kind of application that's talking to the database server. In some respects the 'best' solution is to have the application be aware of the database setup and failover its connections to the replica if the master goes away. This kind of application-awareness is likely to be the most robust and scalable solution if your application supports it. Without knowing the application it's impossible to predict if this is an option for you.
    Another approach is to use a load balancer - point your application to the load balancer and let the load balancer decide which database server to relay the connection to. Due to the cost of load balancers this is likely only an option for larger sites but it's one to bear in mind.
    You could go the IP-failover route but this is more complex since you will likely need to restart MySQL after a failover event. That's because MySQL won't automatically listen to the newly-assigned IP address, so your failover script will need to stop MySQL, reconfigure the network, then restart MySQL. Clearly this won't be a seamless failover (or failback).
    Another approach to consider would be viable if you have multiple application servers, and that would be to set half your servers to talk to one server and half to talk to the other. That way if you do lose one database server you still have half your application servers running. This avoids the whole issue of needing to run IPFailover provided the active half of your application servers can handle the load.
    Of course, all these approaches require two-way replication - you need to be able to update your secondary database and have the updates push back to the master. If your secondary is read-only your application is likely to fail no matter what scenario you use (unless your application is primarily SELECTs and few INSERTS and it can fail gracefully when an INSERT fails).

  • Mac OS X 10.7.2 Lion Client that bind to replica OD 10.7.2 Server refuse to failover to look at Master 10.7.2 OD Server.

    Hi All,
    I got a tricky situation here.
    My Setup
    Mac OS X  Master OD 10.7.2 Server
    Mac OS X Replica OD 10.7.2 Server
    Clients are bind to the replica Mac OS X  server( 10.7.2) for MCX management and Bind to AD for Authentication.
    The tricky part comes,
    When I shut down the replica Mac OS X server (10.7.2) to test whether the Mac Client (10.7.2) will failover and point it to the Master OD (10.7.2 ) Server.  It refuse to failover.
    Like wise , I get the Mac Client (10.7.2) to bind to the master OD Server  , and I shut down the Master OD server  and test whether the Mac Client will failover and point it to the replica OD server for MCX. It refuse to failover and have the famous red light.
    But If I do Manual Binding (work like a charm)
    I will go into Directory Utility and manually  bind OD and AD , it work like a charm.
    The Client will automatically do failover and point it to the replica or Master , depend on which OpenDirectory it bind to .
    Single Command of authenticated binding  and using a binding script still produce the same result
    dsconfigldap -f -a "${ODM_SERVER}" -c "${COMPUTER_ID}" -u "${ADMIN_LOGIN}" -p "${ADMIN_PWD}"
    My binding script (took from DeployStudio Bind OD script  )
    http://pastebin.com/ncXvAAgZ
    I am at a lost. Any suggestion will be good.
    Thanks
    Roy

    Someone in the Lion Server Forum might know?
    Regards,
    Colin R.

  • OAM- "You do not have sufficient access rights" message with Master Admin

    Customer has configured the OAM system to have both the primary and the secondary side for failover purposes. The back end directory server on both systems are in sync. The primary side of the systems works well as far as this issue is concerned.
    On the secondary side, if you login with the MASTER administrator of the system and click 'Identity System Console' or click any of the configurations under the Configurations in the User Manager, you get the error message saying "You do not have sufficient access rights". However, if they navigate to the Access system on the same browser and access the "Access System Console", and then navigate back to the Identity system, the Master Administrative rights are granted and now have a full access to the system.
    We tried following things to resolve the issue, but could not resolve it:
    1) Tried deleting 'cookieencryptionkey' which is found under "obcontainerid=encryptionkey,o=oblix" and restarted both the Identity Servers.
    2) Confirmed that the OAM administrator is present in cn=Web Masters,o=Oblix,<> and cn=Directory Administrators,o=Oblix,<> from the LDAP.
    3) Under the apps=PSC node, checked the Advance Properties for the 'obuniquememberStr' attribute:
    - Master Web Resource Admins (cn=master web resource admins, obapp=PSC, o=oblix, ...)
    Made sure that the values for the 'obuniquememberStr' attribute has the correct value there.
    4) Reconfigured the Secondary Identity Server.
    None of the above really helped to resolve the issue.
    Could anybody please help here to get rid of this issue.
    -Amol

    Hi Vinod,
    Here is the customer's response to your above 2 questions:
    1. We have 4 Directory server profiles for Identity servers; one for user data and one for configuration data for each server.
    I have at least reduced them to two and used only the ones initially used by the primary identity server as our user and configuration data do not reside together. User data is consumed via OVD.
    However, this does not seem to have any effect on the current behavior.
    2. All components except for the access server are on 10.1.4.2 and the access server is on 10.1.4.1
    Also below are the errors from the oblogs:
    dentity Server log
    =============
    2008/03/19@10:04:16.508530 4332 262160 PPP INFO 0x000008C7 obeventcatalog.cpp:183 "Cannot find the action" function^ObEventCatalog::GetActionEntry2Modify() actionName^ENCRYPTION_cookieEncryptionKey
    Access Server Log
    =============
    2008/03/19@10:03:56.329959 13608 1687633 CONNECTIVITY DEBUG3 0x00000201 /usr/abuild/Oblix/1014lwhf/palantir/netlib/src/obmessagechannel.cpp:601 "Received " ipaddr^10.217.209.81 ipport^1853 seqno^12 opcode^1 opcodeStr^IsResrcOpProtected Message^ro=t%253d0%2520o%253d%2520no%253d%2520r%253d%2520nr%253d%2520wu%253d/identity/oblix/apps/admin/bin/frontpage_admin.cgi%2520wh%253d10.217.209.81%2520wo%253d1%2520wa%253d0%2520ws%253d st=ma%253d2%2520mi%253d2%2520sg%253d0%2520sm%253d version=3 pd=
    2008/03/19@10:03:56.340433 3099 802864 AUTHENTICATION DEBUG2 0x00000201 /usr/abuild/Oblix/1014lwhf/palantir/aaa_server/src/aaa_service_server.cpp:2779 "Authorization successful"
    Webgate Log
    ==========
    2008/03/19@10:04:05.661000 5796 4516 HTTP_REQ DEBUG3 0x00000201 \Oblix\coreid1014\palantir\webgate2\src\isprotected.cpp:185 "Resource is protected" ResourceOperation^GET ResourceType^http Resource^//10.217.209.81/identity/oblix/apps/admin/bin/front_page_admin.cgi authnSchemeName^Oracle Access and Identity Basic Over LDAP
    2008/03/19@10:04:14.661000 5796 4516 LDAP DEBUG3 0x00000201 \Oblix\coreid1014\np_common\db\ldap\util\ldap_util2.cpp:537 "MLK-Memory leak for LDAP error information. This will show up as memory leak in LDAP SDK calls." key^25
    2008/03/19@10:04:14.661000 5796 4516 LDAP DEBUG3 0x00000201 \Oblix\coreid1014\np_common\db\ldap\util\ldap_util2.cpp:537 "MLK-Memory leak for LDAP error information. This will show up as memory leak in LDAP SDK calls." key^25
    2008/03/19@10:05:54.552000 5796 5256 CONFIG DEBUG2 0x00000201 \Oblix\coreid1014\palantir\access_api\src\obconfig.cpp:865 "Client configuration not updated"
    2008/03/19@10:05:54.552000 5796 5256 CONFIG INFO 0x0000182D \Oblix\coreid1014\palantir\access_api\src\obconfig.cpp:866 "The Access Server has returned a fatal error with no detailed information." raw_code^302
    I checked the OVD logs but did not find any error in it. Customer also tried to unprotect the /identity and /access URLs but the issue persist.
    Also I do not feel this as a bug, because this environment was working quite for few months without any such issues, also there were no changes made on the OVD/AD configurations. However, the server that hosts the OVD/AD was shut down and when it was restarted, we started experiencing this issue.

  • Failover cluster using BEA WL 7.0

    Hi there,
              since this is my first posting here, let me first describe my
              background: I have some experience in developing J2EE applications
              using various application servers, I have some knowledge in setting up a
              network and a BEA server, but mostly in Unix environments. I know a
              little bit of the concepts of clustering, but never did it myself.
              Now I have some problem understanding the posibilities with BEA WL
              clustering. What I want to do is the following:
              I have two machines with BEA WL Server, one should be the "master",
              the other a "backup". Normally, the master should recieve all
              requests, i.e. the backup is idle then. Only if the master fails, the
              backup should take over its functionality (fail-over). When the master
              is back again, the backup returns control (fail-back). The process of
              fail-over and fail-back may take some time, and I do not really need to
              transfer sessions between them (i.e., if the master fails, all sessions
              will be terminated). So this is really a very very low level of
              "high-availabilty" - I simply want to make sure that there is another
              server which can process my requests, if the other fails.
              The BEA server run on two Windows 2000 Advanced Server (yuck!), the
              whole situation looks like this:
              |
              --------+
              | Apache | (Win 2000)
              --------+
              |
              --------+
              | FW |
              --------+
              |
              ----------------+
              --+           --+
              | BEA | | BEA | (Win 2000)
              --+           --+
              ----------------+
              |
              --------+
              | FW |
              --------+
              |
              I read a couple of documents on BEA clustering, I read some postings in
              various newsgroups, but till now, I still do not know
              a) if I really need BEA clustering services here - and if yes
              b) how to use BEA clustering in this siutuation - which, when using
              the Apache plugin, does round-robin all the time, AFAIK.
              Maybe I only need to deploy my application to two servers and use some
              "dispatcher", which directs all requests to the master normally, and
              only if that is not available, to the backup. Can't this be done
              without BEA Clustering with the hardware shown above (using the MS
              Clustering Service and/or the BEA Apache plugin)?
              A second question is if there will be any problem with the FW between
              the Apache and the BEA when using the Apache plugin.
              Any help appreciated!
              Thanks in advance
              Jonas
              

              Hi Jonas,
              My experience with WLS clustering is just at the level of preparing a proof of
              concept and doing testing with toy applications. So, although this helped me
              understand how to use it I can not say I have any experience in a production environment.
              Having said that, here are my comments.
              As far as I have seen, you do not need to do any clustering at all. One of the
              biggies about failover is to be able to recover the session data and, for that
              matter, WLS provides several options (replicate, file, JDBC, cookies).
              However, since you do not care about preserving the session information all you
              really need to do is to create a bunch of managed servers (not clustered), the
              administrator server, and the server with the plug-in (Apache, WLS, Netscape).
              The application would be load balanced accross the managed servers by the plug-in,
              and if any of them fails the other ones will continue working OK.
              I understand you are talking about only two application servers. Please note
              that you do not need to have one of them offline for backup; you can have both
              of them serving your application, which will help your performance, and if one
              fails the whole load will go to the other one.
              You could have the plug-in in WLS in the administrator server and in this way
              you would need only three servers.
              Hope this helps.
              Ziweth
              Jonas Rathert <[email protected]> wrote:
              >Hi there,
              >
              >since this is my first posting here, let me first describe my
              >background: I have some experience in developing J2EE applications
              >using various application servers, I have some knowledge in setting up
              >a
              > network and a BEA server, but mostly in Unix environments. I know
              >a
              >little bit of the concepts of clustering, but never did it myself.
              >
              >Now I have some problem understanding the posibilities with BEA WL
              >clustering. What I want to do is the following:
              >
              >I have two machines with BEA WL Server, one should be the "master",
              >the other a "backup". Normally, the master should recieve all
              >requests, i.e. the backup is idle then. Only if the master fails, the
              >
              >backup should take over its functionality (fail-over). When the master
              >
              >is back again, the backup returns control (fail-back). The process of
              >
              >fail-over and fail-back may take some time, and I do not really need
              >to
              >transfer sessions between them (i.e., if the master fails, all sessions
              >
              >will be terminated). So this is really a very very low level of
              >"high-availabilty" - I simply want to make sure that there is another
              >
              >server which can process my requests, if the other fails.
              >
              >The BEA server run on two Windows 2000 Advanced Server (yuck!), the
              >whole situation looks like this:
              >
              >
              > |
              > --------+
              > | Apache | (Win 2000)
              > --------+
              > |
              > --------+
              > | FW |
              > --------+
              > |
              > ----------------+
              > --+           --+
              > | BEA | | BEA | (Win 2000)
              > --+           --+
              > ----------------+
              > |
              > --------+
              > | FW |
              > --------+
              > |
              >
              >I read a couple of documents on BEA clustering, I read some postings
              >in
              >various newsgroups, but till now, I still do not know
              >
              > a) if I really need BEA clustering services here - and if yes
              >
              > b) how to use BEA clustering in this siutuation - which, when using
              > the Apache plugin, does round-robin all the time, AFAIK.
              >
              >Maybe I only need to deploy my application to two servers and use some
              >
              >"dispatcher", which directs all requests to the master normally, and
              >
              >only if that is not available, to the backup. Can't this be done
              >without BEA Clustering with the hardware shown above (using the MS
              >Clustering Service and/or the BEA Apache plugin)?
              >
              >A second question is if there will be any problem with the FW between
              >
              >the Apache and the BEA when using the Apache plugin.
              >
              >Any help appreciated!
              >
              >Thanks in advance
              >
              > Jonas
              >
              >
              >
              >
              >
              >
              >
              >
              >
              >
              >
              >
              

  • Failover not working correctly on "redundancy-phy" (box to box style)

    Hi,
    I've got 2 CSS 11506 boxes configured using box to box failover.   Failing the master CSS box itself (powering down) causes the backup CSS to become master and all is well.
    However when the switch, which the CSS is connected to, fails the CSS didn't fail over so I added the redundancy-phy to both the interfaces connected to the switch and failed the switch again.  At this point a "show redundancy" shows the master becoming backup but between 3 and 5 seconds later it re-assumes master status and keeps flipping every  60 - 90 seconds
    I also tried a service with a type of redundancy-up and again the same symptoms - fails over but assumes master again within 3--5 seconds.
    Any help gratefully received!
    Cheers

    box-to-box is the least interesting redundancy mechanism.
    I definitely prefer vip/interface redundancy.
    More complex to configure but better control.
    Regarding your problem, is the switch connected to both CSS ?  Do you have a direct link between the CSS for the redundancy protocol ? What version do you run ?
    Gilles

  • Is it possible to set up a bigger cluster with different AGs failover in sets of the nodes

    For SQL 2012 alwaysON, can it support this scenario?
    Set up a 12 node windows server failover cluster (A B C ... L)
    create 8 availability groups, let them fail over in 3 nodes of the cluster, e.g.
    AG1 failover in A B  C nodes
    AG2 failover in B C D nodes
    AG3 failover in C D E nodes
    etc.
    if this is supported, I guess then it's possible to say for any node in the Windows cluster, it can be at the same time be primaries for some AGs and secondaries for some other AGs (if we don't worry about performance)

    Microsoft supporting it is one thing. You and your team supporting it is another. Highly available systems need to be kept as simple as possible so long as it meets your recovery objectives and service level agreements. The question here is this: do all
    databases on those Availability Groups all have the same recovery objectives and service level agreements? This should dictate how you architect your solution
    Edwin Sarmiento SQL Server MVP | Microsoft Certified Master
    Blog |
    Twitter | LinkedIn
    SQL Server High Availability and Disaster Recover Deep Dive Course

  • UCCX8.5 HA node 1 not master

    Hi,
    We are running UCCX8.5 in our dev lab (NFR kit) in HA (LAN) setup, when I reboot both servers node 1 is master
    (check by http://node1server/uccx/isDBMaster)
    but after a while (hour or so) suddenly node 2 is master.
    When I run CAD, I can login, go ready/not ready, IPPA says service is not active, I guess the ip-address should be changed of the url?
    Anyway, why would node 2 be master, I expect node 1 to always be master unless it becomes unavailable
    Can I force node 1 to be master?
    I am not sure which log to check for issues, if there are any

    I wonder if we run into bug;
    CSCto90417 3 GC pauses cause CVD heartbeats failure and thereby failover
    (I see all missed heartbeats on node2,
    28979: May 29 19:17:04.201 NZST %MCVD-CVD-5-HEARTBEAT_MISSING_HEARTBEAT:CVD does not receive heartbeat from node for a long period: nodeId=1,dt=2617
    28980: May 29 19:17:04.201 NZST %MCVD-CVD-4-HEARTBEAT_SUSPECT_NODE_CRASH:CVD suspects node crash: state=Heartbeat State,nodeInfo=Node id=1 ip=a.b.c.d convId=11 cmd=34 viewLen=1,dt=524)
    we are development partner and don't have access to downloads, so can't apply any SR's
    I rebuild node 2, once it was up and running I lost realtime stats updating..
    edit;
    node2 just rebooted
    29750: May 29 19:18:05.412 NZST %MCVD-CVD-7-UNK:Heartbeat State activated, retransmitInterval=500, silenceInterval=2550
    29751: May 29 19:18:05.569 NZST %MCVD-PROMPT_MGR-6-NO_SYSTEM_TTS_PROVIDER:No system TTS provider registered
    29752: May 29 19:18:06.571 NZST %MCVD-CVD-5-HEARTBEAT_MISSING_HEARTBEAT:CVD does not receive heartbeat from node for a long period: nodeId=1,dt=1045
    29753: May 29 19:18:06.587 NZST %MCVD-CVD-3-CVD_REGISTER_SUBSCRIBER_ERROR:CVD can not register subscriber on the remote node: nodeId=1,Exception=null
    29754: May 29 19:18:06.588 NZST %MCVD-CVD-7-UNK:Dispatcher:: readMcvdProperty() property file = application.MCVD.properties Property value=2
    29755: May 29 19:18:06.588 NZST %MCVD-CVD-7-UNK:Dispatcher:: processHeartbeatNodeJoinCmd() Restarting system
    29756: May 29 19:18:06.588 NZST %MCVD-CVD-7-UNK:Dispatcher:: updateMcvdProperty() property file= application.MCVD.properties Property value=3
    29757: May 29 19:18:06.588 NZST %MCVD-CVD-3-REBOOT_ON_CVD_REGISTER_SUBSCRIBER_ERROR:CVD can not register subscriber on the remote node due to connection issue. System is being rebooted once to recover from the issue.: Exception=java.rmi.ConnectException: Connection refused to host: a.b.c.d; nested exception is:
    java.net.ConnectException: Connection refused

Maybe you are looking for