Master node down

Dear DBAs,
for a maintenance reason we had to shutdown the 3 cluster servers (Windows 2003-64bit).
after starting up the master node (the others still down) the windows stuck on the "Applying profile", however if we disconnect this server from the network the server starts.
we noticed that after reconnecting the server to the network, it takes long time to discover the storage disks.
we are planning to move to another master node.
could you please send me a link with the step-by-step on how to move to a new master node.
the database is Oracle 10g v. 10.2.0.4.0
OS: windows 2003-64 bit
your quick reply is highly appreciated.
Regards
Elie

we noticed that after reconnecting the server to the network, it takes long time to discover the storage disks.I am not sure that doing so would decrease the time. You need to check with your n/w guys and storage guys that why its taking time since the same will happen with the new node as well.
we are planning to move to another master node.You need to take a backup of this db and then move it to the new host . Once done, you would need to add nodes to that node . But as said above, find the reason first why you are experiencing slow connectivity?
Aman....

Similar Messages

Abrupt shutdown of master node causes problem

A service in usmbcom1 (LMID ) makes tpacall to a service which is
present in
both usmbapp1 ( master node LMID ) and usmbapp2 ( slave node LMID
) ( HERE
usmbcom1 , usmbapp1 and usmbapp2 are LMIDs whereas the
corresponding physical m/cs unames are usmbd5 , usmbd3 and usmbd4
They are separate sun boxes) . LDBAL = Y and tpacall is done at
25 / sec.
Now there are 2 scenarios.
1. While tpacall is in progress we kill the servers in usmbapp1
using
kill command ( not kill -9) . Then clean up the ipcs. Only
few ( 3-5 out of a
total of 5000 ) messages were lost . This is understandable since
messages
which were already in the queue got lost). The rest of the messages
were
processes by usmbapp2.
2. In this case we switched off the sun box usmbapp1 ( m/c name
usmbd3)
while tpacall was in progress. This time we lost approx 50% of
the
messages. However if we go to the slave m/c i.e usmbapp2 and manually
make it master ( tmadmin ... master) ., then from that point of
time we stopped
losing messages.
Does that mean manual intervention is necessary if DBBL goes
down? Is there anything which I am missing out while configuring
the system?

Hi Scott,
You did understand the scenario and the problems.
The answers are quite convincing.
Actually the QA team here are doing failover testing
and they have both these ( kill and m/c shutdown)
as their test cases.
However I would like to know about what you meant
by High Availability Solutions.
Do you also mean that if I shutdown my master m/c
then no event would be written in the ULOG of slave,
which can be monitored and used to convert the slave
into master programatically ( I mean thru tpadmcall)
Thanks
Somashish
Scott Orshan <[email protected]> wrote:
Hi,
I'm not sure if I completely comprehend your situation,
but let me take
a guess.
When you killed the processes (including the Bridge),
which by the way
is a bad thing to do to TUXEDO, TCP notified the other
connected nodes
that the connection had dropped. This happens fairly quickly.
But if you just turn off a machine, TCP may not detect
it until it times
out, which can take several minutes. Since TUXEDO was
doing Round Robin
load balancing, half the requests were sent to the Bridge,
with a
destination of the dead machine.
To answer your final question, the DBBL has to be migrated
manually,
unless you are using one of our High Availability solutions
that uses an
external monitor.
The reason is that it is very hard to distinguish between
a network
failure or slowdown, and a real failure of the Master
node. And it would
be very bad to have two machines in the domain acting
as the Master.
Scott Orshan
BEA Systems
Somashish Gupta wrote:
A service in usmbcom1 (LMID ) makes tpacall to a servicewhich is
present in
both usmbapp1 ( master node LMID ) and usmbapp2 ( slavenode LMID
) ( HERE
usmbcom1 , usmbapp1 and usmbapp2 are LMIDs whereas the
corresponding physical m/cs unames are usmbd5 , usmbd3and usmbd4
They are separate sun boxes) . LDBAL = Y and tpacallis done at
25 / sec.
Now there are 2 scenarios.
1. While tpacall is in progress we kill the serversin usmbapp1
using
kill command ( not kill -9) . Then clean up theipcs. Only
few ( 3-5 out of a
total of 5000 ) messages were lost . This is understandablesince
messages
which were already in the queue got lost). The restof the messages
were
processes by usmbapp2.
2. In this case we switched off the sun box usmbapp1( m/c name
usmbd3)
while tpacall was in progress. This time we lost approx50% of
the
messages. However if we go to the slave m/c i.e usmbapp2and manually
make it master ( tmadmin ... master) ., then from thatpoint of
time we stopped
losing messages.
Does that mean manual intervention is necessary if DBBLgoes
down? Is there anything which I am missing out whileconfiguring
the system?

How do we know the Master Node in RAC ? ?

Hi Experts,
We have implemented 2-Node RAC 11g R2 on Linux Platform. My query is How do we know the Master Node in RAC ? ?
Thanks
Venkat

Hi,
There is no such thing as "master node" in RAC configurations. All nodes are equal.
Sebastian wrote: The only thing closest to something like a "Master" is that only one node has the role to update the Oracle Cluster Registry (OCR), all other nodes only read the OCR.
{message:id=9827969}
and
{message:id=2154683}
Regards,
Levi Pereira

Master node

Hi,
I am reading 'Advanced RAC troubleshooting' written by Riyaj Shamsudeen and have some questions about the wait event 'gc current/cr grant 2-way'.
It says:
CR – disk read*
Select c1 from t1 where n1 =:b1;*
+1 User process in instance 1 requests master for a PR mode block.+
+2 Assuming no current owner, master node grants the lock to inst 1.+
+3 User process in instance 1 reads the block from the disk and holds PR.+
Why the master node is inst2? I think if no current owner, inst1 should read the block from the disk directly and it will be the master.
If only one block is read from the disk and the object which the block belongs to is never read, how to define the master node? Is it the requesting node?
Please help me.
Thanks.

Hi
Master node is determined by no of time particular node accessed particular block.Which is accessed more will be master node for particular block.As I believe
Wait event gc current appears when request goest for current mode means for updation of blocks.
cr grant 2-way appears when same blocks is modified in two nodes.
Correction is highly appreciate.
Tinku

Master Node on RAC

Hi,
Just would like to know, how to know which node is the master on RAC, is there any command for this?
cheers
fzheng

One way to find that information is to look at the log file $ORA_CRS_HOME/log/`hostname`/cssd/ocssd.* files, you would find something like this:
ocssd.l05:[    CSSD]CLSS-3001: local node number 1, master node number 1
ocssd.l05:[    CSSD]2007-02-08 13:58:44.508 [507920] >TRACE: clssgmEstablishMasterNode: MASTER for 21 is node(1) birth(6)
ocssd.l05:[    CSSD]2007-02-08 13:58:44.508 [507920] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
ocssd.l05:[    CSSD]2007-02-08 13:58:44.514 [507920] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
ocssd.l05:[    CSSD]CLSS-3001: local node number 1, master node number 1
ocssd.l05:[    CSSD]2007-02-08 14:01:46.236 [524304] >TRACE: clssgmEstablishMasterNode: MASTER for 22 is node(1) birth(6)
ocssd.l05:[    CSSD]2007-02-08 14:01:46.236 [524304] >TRACE: clssgmMasterCMSync: Synchronizing group/lock status
ocssd.l05:[    CSSD]2007-02-08 14:01:46.241 [524304] >TRACE: clssgmMasterSendDBDone: group/lock status synchronization complete
But, this information is not really not that critical for ongoing operations or regular maintenance.....and just informational for all practical purposes.
HTH
Thanks
Chandra Pabba

VDI secondary data node down

Dear all,
i have one primary and two secondary setup
both primary and one secondaryA are running fine. but when i used ./vda-db-status on another secondary B . it showed me down
data node down
actually secondary B was shutdown due to power failure. when we restarted the server it gave us boot error. boot archive..fsck -F ufs /dev/rdisk/....solved the problem
but when we booted it was down..in mysql cluster status. this server is also datanode in cluster.
I checked svcs svc:/application/database/vdadb:core is down showing offline*
in /var/opt/SUNWvda/mysql-cluster/ndb_3.error.log
I found following
Current byte-offset of file-pointer is: 1566
Time: Tuesday 8 June 2010 - 13:08:28
Status: Ndbd file system error, restart node initial
Message: File not found (Ndbd file system inconsistency error, please report a bug)
Error: 2815
Error data: DBLQH: File system open failed. OS errno: 2
Error object: DBLQH (Line: 3083) 0x0000000a
Program: /opt/SUNWvda/mysql/bin/ndbd
Pid: 875
Version: mysql-5.1.37 ndb-7.0.8a
Trace: /var/opt/SUNWvda/mysql-cluster/ndb_3_trace.log.1
***EOM***
Time: Tuesday 8 June 2010 - 13:32:00
Status: Ndbd file system error, restart node initial
Message: File not found (Ndbd file system inconsistency error, please report a bug)
Error: 2815
Error data: DBLQH: File system open failed. OS errno: 2
Error object: DBLQH (Line: 3083) 0x0000000a
Program: /opt/SUNWvda/mysql/bin/ndbd
Pid: 5686
Version: mysql-5.1.37 ndb-7.0.8a
Trace: /var/opt/SUNWvda/mysql-cluster/ndb_3_trace.log.2
***EOM***
Time: Tuesday 8 June 2010 - 13:42:26
Status: Ndbd file system error, restart node initial
Message: File not found (Ndbd file system inconsistency error, please report a bug)
Error: 2815
Error data: DBLQH: File system open failed. OS errno: 2
Error object: DBLQH (Line: 3083) 0x0000000a
Program: /opt/SUNWvda/mysql/bin/ndbd
Pid: 764
Version: mysql-5.1.37 ndb-7.0.8a
Trace: /var/opt/SUNWvda/mysql-cluster/ndb_3_trace.log.3
***EOM***
______________________________________________________________________________________________________________

Further i saw Vdadb:core.log
i found following
.........................(logs omitted)
[ Jun 8 13:08:16 Executing start method ("/opt/SUNWvda/lib/vda-db-service start") ]
Configuration:
MGMT_NODE=[0]; NDBD_NODE=[1]; SQL_NODE=[0]; MULTI_HOST_MODE=[1];
NDBD_CONNECTSTRING=[mycompnay.com]; NDBD_INITIAL_ARG=[]; NDBD_NODE_ID=[3];
MYSQL_BIN=[opt/SUNWvda/mysql/bin];
Starting the Sun Virtual Desktop Infrastructure Database service:
- Starting Data Node... 2010-06-08 13:08:18 [ndbd] INFO -- Configuration fetched from 'mycompnay.com:1186', generation: 1
Arguments: [mycompnay.com ]...
Error
[ Jun 8 13:14:46 Method "start" exited with status 95 ]
[ Jun 8 13:31:35 Leaving maintenance because disable requested. ]
[ Jun 8 13:31:35 Disabled. ]
[ Jun 8 13:31:56 Enabled. ]
[ Jun 8 13:31:56 Executing start method ("/opt/SUNWvda/lib/vda-db-service start") ]
Configuration:
MGMT_NODE=[0]; NDBD_NODE=[1]; SQL_NODE=[0]; MULTI_HOST_MODE=[1];
NDBD_CONNECTSTRING=[mycompnay.com]; NDBD_INITIAL_ARG=[]; NDBD_NODE_ID=[3];
MYSQL_BIN=[opt/SUNWvda/mysql/bin];
Starting the Sun Virtual Desktop Infrastructure Database service:
- Starting Data Node... 2010-06-08 13:31:57 [ndbd] INFO -- Configuration fetched from 'mycompnay.com:1186', generation: 1
Arguments: [mycompnay.com ]...
Error
[ Jun 8 13:38:27 Method "start" exited with status 95 ]
[ Jun 8 13:38:27 Leaving maintenance because disable requested. ]
[ Jun 8 13:38:27 Disabled. ]
[ Jun 8 13:42:21 Executing start method ("/opt/SUNWvda/lib/vda-db-service start") ]
Configuration:
MGMT_NODE=[0]; NDBD_NODE=[1]; SQL_NODE=[0]; MULTI_HOST_MODE=[1];
NDBD_CONNECTSTRING=[mycompnay.com]; NDBD_INITIAL_ARG=[]; NDBD_NODE_ID=[3];
MYSQL_BIN=[opt/SUNWvda/mysql/bin];
Starting the Sun Virtual Desktop Infrastructure Database service:
- Starting Data Node... 2010-06-08 13:42:22 [ndbd] INFO -- Configuration fetched from 'mycompnay.com:1186', generation: 1
Arguments: [mycompnay.com ]...
Error
[ Jun 8 13:48:50 Method "start" exited with status 95 ]
any ideas

How to sort master-node in master-detail scenario without losing subnodes?

Hi,
I've a master-detail scenario and want to sort my master node.
How can I sort the master node without losing the detail-subnodes?
If I take a look in class CL_WDR_TABLE_METHOD_HNDL and method IF_WD_TABLE_METHOD_HNDL~APPLY_SORTING
Sorting is done by
- unload node with context_node->get_static_attributes_table into an internal table
- keeping node state like lead_selection(s) and attribute_properties
- sort internal table
- bind internal table to node
- set lead_selection and properties
But all subnodes are gone.
How do you sort a master node?
Thanks and Regards
Carsten

I think you have to write your own logic for that . May be you can implement IF_WD_TABLE_METHOD_HNDL in your class and extend the current logic to support subnodes.

TUXEDO11 in MP mode can't boot TMS_ORA on the non-master node

I have my Tuxedo 11 installed on Ubuntu9.10 server as the master node (SITE1) and on CentOS6.2 as the non-master node (SITE2). The client program is using WSL to communicate with the servers. Tuxedo 11 has no patch, and both Tuxedo11 and Oracle10gR2 are 32 bits running on 32 bits OS.
On both node a TMS_ORA associated with an ORACLE 10gR2 database was installed. When I issue "tmboot -y", the servers on the master node booted normally, however, the TMS_ORA server and server that using TMS_ORA on SITE2 reported "Assume started (pipe). ". There is no core file for these servers on SITE2 and in ULOG on SITE2 there is no Error or Warning concerning the failure starting of TMS_ORA.
In order to check my servers and TMS_ORA works OK on SITE2, I used the master command under tmadmin to first swap the master and non-master node, and after the migration is successful, on SITE2 I issued "tmshutdown -cy" command then "tmboot -y" command. Surprisingly, all the servers booted correctly on both nodes. Then I migrate the master node back to SITE1 and the servers are still there alive and my client program can successfully call these servers which means the TMS_ORA and server using TMS_ORA on both nodes works fine.
The problem is, when I "tmshutdown -s server" (those on SITE2, either TMS_ORA or server using TMS_ORA), then using "tmboot -s server" to boot them (those on SITE2, either TMS_ORA or server using TMS_ORA) I got "Assume started (pipe). " reported and those server process didn't appear on SITE2.
It seems that I can't boot TMS_ORA on SITE2 from the master node SITE1 but can boot all the servers correctly if SITE2 are acting as the master node. Server that don't use TMS_ORA on SITE2 can be booted successfully from SITE1.
Can anybody figure out what's wrong? Thanks in advance.
Best regards,
Orlando
Edited by: user10950876 on 2012-6-13 下午3:02
Edited by: user10950876 on 2012-6-13 下午3:33

Hi Todd,
Thank you for you reply. Following is my ULOG and tmboot report:
ubuntu9:~/tuxapp$tmboot -y
Booting all admin and server processes in /home/xp/tuxapp/tuxconfig
INFO: Oracle Tuxedo, Version 11.1.1.2.0, 32-bit, Patch Level (none)
Booting admin processes ...
exec DBBL -A :
on SITE1 -> process id=8803 ... Started.
exec BBL -A :
on SITE1 -> process id=8804 ... Started.
exec BBL -A :
on SITE2 -> process id=3964 ... Started.
Booting server processes ...
exec TMS_ORA -A :
on SITE1 -> process id=8812 ... Started.
exec TMS_ORA -A :
on SITE1 -> process id=8838 ... Started.
exec TMS_ORA2 -A :
on SITE2 -> CMDTUX_CAT:819: INFO: Process id=3967 Assume started (pipe).
exec TMS_ORA2 -A :
on SITE2 -> CMDTUX_CAT:819: INFO: Process id=3968 Assume started (pipe).
exec WSL -A -- -n //128.0.88.24:5000 -m 3 -M 5 -x 5 :
on SITE1 -> process id=8841 ... Started.
8 processes started.
ULOG on ubuntu9
134547.ubuntu9!DBBL.8803.3071841984.0: 06-14-2012: client high water (0), total client (0)
134547.ubuntu9!DBBL.8803.3071841984.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit
134547.ubuntu9!DBBL.8803.3071841984.0: LIBTUX_CAT:262: INFO: Standard main starting
134549.ubuntu9!DBBL.8803.3071841984.0: CMDTUX_CAT:4350: INFO: BBL started on SITE1 - Release 11112
134550.ubuntu9!BBL.8804.3072861888.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit, Patch Level (none)
134550.ubuntu9!BBL.8804.3072861888.0: LIBTUX_CAT:262: INFO: Standard main starting
134550.ubuntu9!BRIDGE.8806.3072931520.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit
134550.ubuntu9!BRIDGE.8806.3072931520.0: LIBTUX_CAT:262: INFO: Standard main starting
134555.ubuntu9!DBBL.8803.3071841984.0: CMDTUX_CAT:4350: INFO: BBL started on SITE2 - Release 11112
134556.ubuntu9!BRIDGE.8806.3072931520.0: CMDTUX_CAT:1371: INFO: Connection received from redhat62
134557.ubuntu9!TMS_ORA.8812.3057321664.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit
134557.ubuntu9!TMS_ORA.8812.3057321664.0: LIBTUX_CAT:262: INFO: Standard main starting
134559.ubuntu9!TMS_ORA.8838.3056805568.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit
134559.ubuntu9!TMS_ORA.8838.3056805568.0: LIBTUX_CAT:262: INFO: Standard main starting
134559.ubuntu9!WSL.8841.3072153920.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit
134559.ubuntu9!WSL.8841.3072153920.0: LIBTUX_CAT:262: INFO: Standard main starting
134559.ubuntu9!WSH.8842.3072411328.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit
134559.ubuntu9!WSH.8842.3072411328.0: WSNAT_CAT:1030: INFO: Work Station Handler joining application
134559.ubuntu9!WSH.8843.3073169088.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit
134559.ubuntu9!WSH.8843.3073169088.0: WSNAT_CAT:1030: INFO: Work Station Handler joining application
134559.ubuntu9!WSH.8844.3073066688.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit
134559.ubuntu9!WSH.8844.3073066688.0: WSNAT_CAT:1030: INFO: Work Station Handler joining application
ULOG on redhat62
134615.redhat62!tmloadcf.3961.3078567616.-2: 06-14-2012: client high water (0), total client (0)
134615.redhat62!tmloadcf.3961.3078567616.-2: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit
134615.redhat62!tmloadcf.3961.3078567616.-2: CMDTUX_CAT:872: INFO: TUXCONFIG file /home/tuxedo/tuxedo/simpapp/tuxconfig has been updated
134617.redhat62!BSBRIDGE.3963.3078089312.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit
134617.redhat62!BSBRIDGE.3963.3078089312.0: LIBTUX_CAT:262: INFO: Standard main starting
134619.redhat62!BBL.3964.3079420512.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit, Patch Level (none)
134619.redhat62!BBL.3964.3079420512.0: LIBTUX_CAT:262: INFO: Standard main starting
134620.redhat62!BRIDGE.3965.3077868128.0: 06-14-2012: Tuxedo Version 11.1.1.2.0, 32-bit
134620.redhat62!BRIDGE.3965.3077868128.0: LIBTUX_CAT:262: INFO: Standard main starting
134620.redhat62!BRIDGE.3965.3077868128.0: CMDTUX_CAT:4488: INFO: Connecting to ubuntu9 at //128.0.88.24:1800
ubb file content: (just in case you want to see it too. I've commented all the services in the ubb file, except the TMS_ORA2 on SITE2 to make it more distinct.)
*RESOURCES
IPCKEY 123456
DOMAINID TUXTEST
MASTER SITE1, SITE2
MAXACCESSERS 50
MAXSERVERS 35
MAXCONV 10
MAXGTT 20
MAXSERVICES 70
OPTIONS LAN, MIGRATE
MODEL MP
LDBAL Y
*MACHINES
DEFAULT: MAXWSCLIENTS=30
ubuntu9 LMID=SITE1
APPDIR="/home/xp/tuxapp"
TUXCONFIG="/home/xp/tuxapp/tuxconfig"
TUXDIR="/home/xp/tuxedo11gR1"
TLOGDEVICE="/home/xp/tuxapp/TLOG"
TLOGNAME="TLOG"
TLOGSIZE=100
TYPE=Linux
ULOGPFX="/home/xp/tuxapp/ULOG"
ENVFILE="/home/xp/tuxapp/ENVFILE"
UID=1000
GID=1000
redhat62 LMID=SITE2
TUXDIR="/usr/oracle/tuxedo11gR1"
APPDIR="/home/tuxedo/tuxedo/simpapp"
TLOGDEVICE="/home/tuxedo/tuxedo/simpapp/TLOG"
TLOGNAME="TLOG"
TUXCONFIG="/home/tuxedo/tuxedo/simpapp/tuxconfig"
TYPE=Linux
ULOGPFX="/home/tuxedo/tuxedo/simpapp/ULOG"
ENVFILE="/home/tuxedo/tuxedo/simpapp/ENVFILE"
UID=501
GID=501
*GROUPS
BANK1
LMID=SITE1 GRPNO=1 TMSNAME=TMS_ORA TMSCOUNT=2
OPENINFO="Oracle_XA:Oracle_XA+Acc=P/scott/tiger+SesTm=120+MaxCur=5+LogDir=.+SqlNet=xpdev"
CLOSEINFO="NONE"
BANK2
LMID=SITE2 GRPNO=2 TMSNAME=TMS_ORA2 TMSCOUNT=2
OPENINFO="Oracle_XA:Oracle_XA+Acc=P/scott/scott+SesTm=120+MaxCur=5+LogDir=.+SqlNet=tuxdev"
CLOSEINFO="NONE"
WSGRP
LMID=SITE1 GRPNO=3
OPENINFO=NONE
*NETGROUPS
DEFAULTNET NETGRPNO=0 NETPRIO=100
SITE1_SITE2 NETGRPNO=1 NETPRIO=200
*NETWORK
SITE1 NETGROUP=DEFAULTNET
NADDR="//128.0.88.24:1800"
NLSADDR="//128.0.88.24:1500"
SITE2 NETGROUP=DEFAULTNET
NADDR="//128.0.88.215:1800"
NLSADDR="//128.0.88.215:1500"
*SERVERS
DEFAULT:
CLOPT="-A"
#XFER SRVGRP=BANK1 SRVID=1
#TLR_ORA SRVGRP=BANK1 SRVID=2
#TLR_ORA2 SRVGRP=BANK2 SRVID=3
WSL SRVGRP=WSGRP SRVID=4
CLOPT="-A -- -n //128.0.88.24:5000 -m 3 -M 5 -x 5"
*SERVICES
#INQUIRY
#WITHDRAW
#DEPOSIT
#XFER_NOXA
#XFER_XA
Edited by: user10950876 on 2012-6-13 下午10:58

Identify the OCR master node for 11.2

My customer is on 11.1.0.6 RAC DB with 11.2 CRS+ASM and interested in finding "OCR master node" at any point time.
I noticed that one way is to identify the OCR master node is to search
$ORA_CRS_HOME/log/hostname/crsd/crsd.log
file for the line "I AM THE NEW OCR MASTER" or "MASTER" with the most recent timestamp. Does this applicable 11.2 Release ?
and what are the other alternate ways to identify master node.
Thanks in advance.

Hi,
as it was mentioned before, you can use the RAC FAQ Oracle Support Note to determine the masters in a RAC system. Except that this note would not elaborate on the OCR Master that you are asking for (OCR Writer as it is called in the documentation these days) in this context.
However, your command to check $ORA_CRS_HOME/log/hostname/crsd/crsd.log works and the message is pretty much the same in 11.2 as it was pre-11.2. However, note that only checking the CRSD.log may not always tell you the OCR master all the time. Reason: The CRSD.log is used in a rolling fashion. Once the log entries have reached approx. 50MB, it is rolled over to crsd.l01 or something like that and a fresh crsd.log is used. 10 archived logs are maintained.
For an average cluster this will last for a while, but in general, there might be a time when all these logs have been used and the OCR master has never been changed. In this case, you cannot use the logs anymore. Luckily, you should not have to find the OCR Master all the time. Why are you interested in knowing which node the OCR Master resides on all the time?
At least, you should therefore cat all crsd.l* files under the respective directory on all nodes to determine this. But again, that should not be necessary.
Hope that helps. Thanks,
Markus

Sun Cluster 3.2 - zpools - master node

How do we determine who is mastering the zpool from the clustering software? With SVM, we can determine who is the master node of the diskset. Thanks in advance.
Ryan

Why do you need to? I'm pretty sure the HAStoragePlus resource will validate the zpool, then once it is under its control you don't need to worry about which node masters it. It will be governed by which node has the HASP resource online. If it isn't online, then the zpool is deported and not owned at all.
I would guess that zpool import will give you some idea of whether a pool is mastered. If it errors, it's either owned or wasn't deported properly.
Regards,
Tim
---

8130: CREATE ACTIVE STANDBY PAIR must only be run on one of the MASTER node

CkptFrequency=600
CkptLogVolume=128
OracleNetServiceName=abmsrv1
PassThrough=1
plz give me some help~~thx
帖子经 user11036969编辑过

Here's the original post (the forum seems to have truncated it for some reason):
Content of the new Post:
Command> CREATE ACTIVE STANDBY PAIR abmmd ON "node1",abmmd ON "node2"
> RETURN RECEIPT
> STORE abmmd ON "node1" PORT 21000 TIMEOUT 30
> STORE abmmd ON "node2" PORT 20000 TIMEOUT 30;
8130: CREATE ACTIVE STANDBY PAIR must only be run on one of the MASTER nodes.
[ABMMD]
Driver=/abm/tt02/tt/TimesTen/tt1121/lib/libtten.so
DataStore=/abm/tt02/tt/tt11g/data/abm
LogDir=/abm/tt02/tt/tt11g/data/logs
SMPOptLevel=1
TypeMode =0
DurableCommits=0
ExclAccess=0
Connections=1000
Isolation=1
LockLevel=0
PermSize=50000
TempSize=1000
ThreadSafe=1
WaitForConnect=0
Logging=1
LogFileSize=256
LogPurge=1
CkptFrequency=600
CkptLogVolume=128
OracleNetServiceName=abmsrv1
PassThrough=1
plz give me some help~~thx
This error means that when TimesTen is processing this statement and asks the operating sysstem for the official hostname of the local node the O/S is returning somethign different to 'node1' or 'node2'.
it may be that you have incorrectly set the hostname to inculue a DNS domain (e.g. node1.xxx.yy.zzz).
Chris

BDB Native Version 5.0.21 - asynchronous write at the master node

Hi There,
As part of performance tuning, we think of introducing asynchronous write capabilities at the master node in replication code that uses BDB native edition (11g).
Are there any known issues with the asynchronous write at the master node? We'd like to confirm with Oracle before we promote to production.
For asynchronous write at the master node we have configured a TaskExecutor with the following configuration:
<bean id="MasterAsynchronousWriteTaskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="3"/>
<property name="maxPoolSize" value="10"/>
<property name="daemon" value="true"/>
<property name="queueCapacity" value="200000"/>
<property name="threadNamePrefix" value="Master_Entity_Writer_Thread"/>
<property name="threadGroupName" value="BDBMasterWriterThreads"/>
</bean>
Local test showed no issues. Please let us know at the EARLIEST convenience if there are any changes required to corePoolSize, “maxPoolSize” and “queueCapacity” values as a result of asynchronous write.
To summarize, 2 questions:
1) Are there any known issues with the asynchronous write at the master node for BDB Native, version 5.0.21?
2) If there are no issues, are any changes required to corePoolSize, “maxPoolSize” and “queueCapacity” values as a result of asynchronous write, and based on the configuration above?
Thank you!

Hello,
If you have not already, please take a look at the documentation
on "Database and log file archival" at:
http://download.oracle.com/docs/cd/E17076_02/html/programmer_reference/transapp_archival.html
Are you periodically creating backups of your database files?
These snapshots are either a standard backup, which creates a
consistent picture of the databases as of a single instant in
time; or an on-line backup (also known as a hot backup), which
creates a consistent picture of the databases as of an
unspecified instant during the period of time when the
snapshot was made. After backing up the database files you
should periodically archive the log files being created in the
environment. And I believe the question here is how often
the periodic archive should take place to establish the
best protocol for catastrophic recovery in the case of a
failure like physical hardware being destroyed, etc.
As the documentation describes, it is often helpful to think
of database archival in terms of full and incremental filesystem
backups. A snapshot is a full backup, whereas the periodic
archival of the current log files is an incremental backup.
For example, it might be reasonable to take a full snapshot
of a database environment weekly or monthly, and archive
additional log files daily. Using both the snapshot and the
log files, a catastrophic crash at any time can be recovered
to the time of the most recent log archival; a time long after
the original snapshot.
What other details can you provide about how how much activity
there is on your system with regards to log file creation
and how often a full backup is being taken, etc.
Thank,
Sandra

SDL Link Out of Service - Node down

We keep seeing these RTMT alerts and then receive SDL Link out service or server node down. I have been working with TAC but they are not able to pin point if its a CUCM issue or network issues. I have the network team check the CPU usage during this time and nothing major happening same thing on the CUCM side.
At Tue Oct 28 14:56:48 PDT 2014 on node the following SyslogSeverityMatchFound events generated:
SeverityMatch : Alert
MatchedEvent : Oct 28 14:56:23 PUB local7 1 : 18: PUB: Oct 28 2014 21:56:23.242 UTC : %UC_Location Bandwidth Manager-1-LBMLinkOOS: %[LocalNodeId=1][LocalApplicationID=700][RemoteIPAddress=][RemoteApplicationID=700][LinkID=1:700:SUB02:700][AppID=Cisco Location Bandwidth Manager][ClusterID=EvergreenHospital][NodeID=PUB]: LBM link to remote application is out of service AppID : Cisco Syslog Agent ClusterID
Has anyone experienced this issue? We recently upgraded to 9.x.

We are experiencing the exact same thing. There doesn't seem to be anything out of the ordinary or in the logs, but these errors randomly kick off as far as we can tell for no reason.
I opened a TAC case and they related it to CPU load on our Publisher, I bumped it as recommended and the error cleared for a few days, but just came back today.
Anyone have any insight? As I said, doesn't seem to be service impacting, so is it just another 'Cisco Feature'?

Error "Comatose" when leave the master node.

Hi All,
I'm trying to configured cluster on OES11 and i have many problems with it. this is one of:
OES5:~ # cluster status
Master_IP_Address_Resource Running OES1 3
DATAP_SERVER Comatose OES5 6
The error "commatose" show on status of data resource when i try to migrate DATAP_SERVER to the second node (OES5) with command: "cluster migrate DATAP_SERVER OES5"
The same problem occurs when i "cluster leave" with the master node, and the Master_IP_Address_Resource is running on second node but DATAP_SERVER.
My system:
Openfire(10.10.5.56) : ISCSI target with: 1GB for SBD, 19GB for DATA
OES1(10.10.5.155): Master Node, eDirectory
IP Cluster: 10.10.5.44
Data Resource: 10.10.5.43
Mounted ISCSI initiator SBD and DATA From Openfire
OES5(10.10.5.123): Second Node.
Mounted ISCSI initiator SBD and DATA From Openfire
i saw in the log file of the second node OES5:
Apr 26 23:31:15 oes1 ncs-resourced: DATAP_SERVER.load: Error opening NSS management file (No such file or directory) on server at /sbin/nss line 49.
it's because of the folder "_admin" doesn't exit on the second node.
i wonder how can make an _admin folder on the second node, it's not a normal folder. ????
do you have anyidea??
thanks for reading.
ndhuynh

This is a NSS problem (likely caused by eDir not running correctly at the time the server started up).
The easiest way to fix this is to reboot the node and check NSS with this command "nss /pools". If the command fails, you can further check eDir status with this command "ndsstat".
If reboot comes back good, you don't need to do anything. If it doesn't, please contact NTS.
Regards,
Changju
Originally Posted by ndhuynh
Hi All,
I'm trying to configured cluster on OES11 and i have many problems with it. this is one of:
The error "commatose" show on status of data resource when i try to migrate DATAP_SERVER to the second node (OES5) with command: "cluster migrate DATAP_SERVER OES5"
The same problem occurs when i "cluster leave" with the master node, and the Master_IP_Address_Resource is running on second node but DATAP_SERVER.
i saw in the log file of the second node OES5:
it's because of the folder "_admin" doesn't exit on the second node.
i wonder how can make an _admin folder on the second node, it's not a normal folder. ????
do you have anyidea??
thanks for reading.
ndhuynh

Changing Cluster Master node

Hello,
I have two nodes rac setup. I just need to change the master node to different node in the cluster. How to change the master node?
Please help me out.

Hi,
Master Node not exist on RAC and I have pretty sure you are not talking about "OCR Master", you are talking about RESOURCE MASTER.
The only thing closest to something like a "Master" is that only one node has the role to update the Oracle Cluster Registry (OCR), all other nodes only read the OCR.
However this is just a role, which can easily switch between the nodes (but is normally fix as long as the responsible nodes lives).
This node hover is then called OCR master.
How do I change node which hold "OCR Master"?
You can't decide that and you can't change that manually the Clusterware do it automatically without human intervention.
OCR Master is not a MASTER NODE it's only a role.
Good answer here:
Re: Identify the OCR master node for 11.2
Re: which node will become master
Please don't confuse the concept of MASTER OCR with RESOURCE MASTER (e.g Data Block).
All nodes hold MASTER RESOURCE, because that I said : all nodes are equal.
With "billions" of data block spread out in memory of cluster (SGA Instances), one node maintains extensive information (i.e Lock, Version, etc) about a particular resource (i.e a data block).
So, one node has master resource of data block "data_01" the other node has a master resource of data block "data_02" and so on. If a node which hold master resource fails (shutdown) GCS choose other node in the cluster to be a MASTER Resource of that particular resource.
http://www.oracleracsig.org/pls/apex/RAC_SIG.download_my_file?p_file=1003567
RAC object remastering ( Dynamic remastering ) Oracle database internals by Riyaj
Message was edited by: Levi-Pereira

Master node down

Similar Messages

Maybe you are looking for