Distributing load in 4 node mixed hardware cluster

Hi,
We've a 2 node cluster that's shortly to become a 4 node cluster - 2 x blades and 2 x midrange with similar capacity processor & bus wise.
Unfortunately, the existing 2 node cluster is heavily overloaded with 6 databases (not by choice, inheritance!).
In order to use our 2 new nodes for dedicated processing, idea is to have 4 node RAC, but 1 db on the 2 blades, not bringing up instances on the existing midrange nodes. Eventually, we hope to extend to proper 4 instances across all nodes, and reduce no. of DB's to 2 or 3. I've no idea why so many DB's were created, however cannot get around it. In order to hedge off processing we could bring up a 4 node db, however again would like to keep the new 2 nodes relatively clean wrt load to facilitate some performance testing.
Could I have peoples views on the above? Mitigation of load testing via resourcing is only really useful when you've the 1 DB and multiple apps running off the same. Similarly, using services to pin to the new nodes doesn't overcome the fact that existing nodes are a bit of a mess.
Thanks,
Conor.

Hi Conor,
I used to administrate a 4 node RAC environment, With 3 databases, Services for one database were started in two nodes, and services for the other databases were started in the other two servers.
I didn't have I good experience with this kind of architeture. Sometimes I've had problems related to one database and I had to stop all environment to solve the it. :(
For patches, was the same issue, sometimes I've had to patch all environment just for one problem related to one database.
About the performance, It was another issue for me. Although I've broken the load on the servers in the cluster, sometimes I had performance problems in one server and this problem used to freeze all cluster at all (until now I don't know how to explain that, but it's usually happened).
Now, We've broken this one 4 node cluster in three 2 node cluster (one cluster of two nodes for each database) and it's been working better than the old configuration.
The idea of putting everything in one cluster doesn't sounds good, at least for me. Talking to a friend, he told me about a 8 node cluster, working very well, but it's been very well planned and projected.
Regards,
Cerreia

Similar Messages

Load-balancing between Analytical Provider service nodes in a cluster

Hi All,
- First a little background on my architecture. My EPM environment consist of 3 Solaris servers:
Server1: Foundation Services + APS + EAS + WLS Admin server
Server2: Foundation Services + APS + EAS
Server3: Essbase server + Essbase Studio server
All above services are deployed to a single domain. We have a load-balancer sitting in front of server1 and server2 that redirects request based on availability of the services.
- Consider APS:
We have a APS cluster "AnalyticProviderServices" with members AnalyticProviderServices1 deployed on Server1 and AnalyticProviderServices2 deployed on Server2.
So I connect to APS and login as user1. Say the load-balancer decides to forward my request to server1, so all my request are then managed by APS on Server1. Now if APS on server1 is brought down, then any request to APS on server1 are redirected by weblogic to APS on server2.
Now ideally APS on server2 should say "hey I see APS on server1 is down so I will take up your session where it left off". So I expect the 2nd APS node in the cluster to tale up my session. But this does not happen.. I need to login again when I hit refresh in excel as I get the error "Invalid session.. Please login again". When I open EAS I see I have been logged in with a new session ID. So it seems that the cluster nodes simply act as load-balancers and are not smart enough to take up a failed nodes sessions where it left off.
Is my understanding correct or have I to configure something to allow for this to happen?
Thanks,
Kent

Thanks for your reply John!
I was hoping APS could do something like that .. I am not sure if restoring sessions of a dead APS cluster node on another APS would be helpful but I can think of one situation where a drill-through report is running for a long time on the Essbase server and APS goes down.. it would be good to have the other APS to take up the session and return the drill-through output to the user.

[b]Migrating the DB-Tier (DB and CM) to Two node non RAC cluster[/b]

Hi,
The current set-up of our E-business suite is a two node install:-
The DB Tier (Database and Concurrent Manager) on one node
The Apps Tier (Forms /Web Server) on another node.
For the HA solution (NON ORACLE RAC) we are planning to:-
Move the DB Tier (Database and Concurrent Manager) to a three node hardware sun cluster managed by veritas cluster manager (NOT ORACLE RAC). We need to know will the Database Tier (Database and Concurrent Manager) work on Hardware cluster node and will it support COLD FAILOVER from one Node to another. We know the database on its own would be fine with a cold failover because we have tested the database cold failover on the three node cluster for non E-business suite database. But here we have the added thing of the Concurrent manger sitting on the node along with the Database on the DB Tier.
The Apps Tier (Forms / Web server) will be put on a separate set of server using Load balances etc.
Has anybody implemented similar HA set-up and will this planned set-up work or are there any issues with this.
Any help / info would be appreciated.
Thanks

Hi,
Yes, you can do the cold failover the database and all the 11i services also.
1. In the concurrent manager service
== when you failover the concurrent manager do the following things before failover.
create the listener/tnsnames files which includes the new hostname and keep it with the veritas failover service. i mean, when you failover, these files should replace the existing files, before the existing files should backedup. and create a script to change the hostname,logfile_hostname,outfile_hostname in fnd_concurent_processes table.
add the nodes in the install->nodes navigation
do the failover manually and check the listener files are properly pinging using tnsping. and start the concurrent manager.
so tatally, you have to prepare two sql scripts
one for change the hostname from node b to node a
one for node a to node b.
and 2 listener + 2 tnsnames files which contains the seperate hostname accordingly.
use the adcmctl.sh top stop and start
finally, create a shell script to kill the sysmgr for the current managers when the manager takes long time to shutdown. before run the kill script, wait for 5 mts atleast.
I done the same scenario many times, with veritas failover service with 2 sun v880 servers.
regards,
Pandian

Messages in Distributed Queue remain on one JMSServer in Cluster

Hi,
          the situation is as follows:
          We're having Distributed Queues on a two-Node server setup. A queue therefore exists two times, once on each JMSServer on each node.
          Forward-Delay has been set on the DistibutedQueue to value 1 second (we tested with values 0 and 10, too).
          External Java client (both cluster nodes in the provider_url for initialcontext) connects to the DistributedQueue and sends messages. In the weblogic console you can see these messages on the physical queue on the JMSServer, let's call it QueueA on NodeA.
          An external java process to consume messages (again both clusters nodes in the jndi-connection-string) is started to consume messages. Because of round-robin it sometimes get's connections to QueueB (part of same DistributedQueue as QueueA, but on the other node in the cluster).
          When connected to QueueB on NodeB no single message is received. You can even see in the weblogic console that QueueA on NodeA has zero consumers and 50 messages, QueueB on NodeB has zero messages but one consumer.
          When we restart the consumer java process and it randomly get's connected to QueueA on NodeA it consumes fine, as it should.
          Should'nt Forward-Delay do exactly this? Transport messages from QueueA on NodeA with zero consumers to QueueB on NodeB which actually has consumers?
          Any help would be appreciated.
          Axel van Lil

Hi,
          thanks for your answer. We opened a support case today for this issue... I don't know. Cluster seems to run fine, no warnings, no exceptions.
          The next step of this issue is the following (just to give an idea of our upcoming problems :-)
          External Java process listens on QueueA on NodeA (QueueA is part of a Distributed Queue). Currently receives all the messages located on that queue. We kill NodeA on that machine to simulate failover.
          The external process failovers immediately to NodeB (which is good! proven by netstat and debugger) but just doesn't receive messages even though there are messages in that queue! It seems that the new consumer is just being forgotten by the BEA server.
          I don't know... somehow our configuration is quirked... or the BEA implementation doesn't work at all...
          Rgds,
          Axel

Unable to see other OC4J nodes in a cluster

I have installed 2 instances of OracleAS on 2 separate machines, both machines ( Lnx-5 and Lnx-6 ) were installed with the J2EE component and WEB component.
During installation, I have selected Lnx-5 as the administration node of the cluster, and I have configured the discovery address using multicast address 225.0.0.33:8001.
There were no installations errors encountered and things seems to work fine.
However, on Lnx-5, it can't "see" Lnx-6 as one of its cluster nodes. On both Lnx-5 and Lnx-6, I see the following when I issued the "opmnctl @cluster status".
---- On Lnx-5 , here is what I got ---------
[root@Lnx-5 conf]# opmnctl @cluster status
Processes in Instance: Lnx5.anydomain.com
--------------------------------------------------------------+---------
ias-component | process-type | pid | status
--------------------------------------------------------------+---------
OC4JGroup:default_group | OC4J:home | 5392 | Alive
ASG | ASG | N/A | Down
HTTP_Server | HTTP_Server | 5391 | Alive
---- On Lnx-6 , here is what I got ---------
[root@Lnx-6 conf]# opmnctl @cluster status
Processes in Instance: Lnx6.anydomain.com
--------------------------------------------------------------+---------
ias-component | process-type | pid | status
--------------------------------------------------------------+---------
OC4JGroup:default_group | OC4J:home | 5392 | Alive
ASG | ASG | N/A | Down
HTTP_Server | HTTP_Server | 5391 | Alive
I suppose I should see both Lnx-5 and Lnx-6 when I issue the commad in either nodes.
I have also verified that both machine are synchronized to the NTP server.
I have also done a tcpdump on both nodes, indeed I can multicast ( 225.0.0.33:8001 ) packets arriving at both nodes..
Really need some help in what would have go wrong, what information should I look for to address this issue.
Thanks in advance!!

Ok, for the discovery server configuration, here is the config that I have in the opmn.xml file, both lnx-5 and lnx-6 use exactly the same configuration :
<notification-server interface="ipv4">
<port local="6101" remote="6201" request="6004"/>
<ssl enabled="true" wallet-file="$ORACLE_HOME/opmn/conf/ssl.wlt/default"/>
<topology>
<discover list="10.1.230.11:6201,10.1.230.12:6201"/>
</topology>
</notification-server>
the ip address of Lnx-5 is 10.1.230.11, and Lnx-6 is 10.1.230.12.
Once this was configured on both Lnx-5, Lnx-6, I keep seeing this error from the Lnx-6's log file :
07/05/16 22:10:18 [pm-process] Process Alive: default_group~home~default_group~1
(1542677438:3859)
07/05/16 22:10:18 [pm-requests] Request 2 Completed. Command: /start
07/05/16 22:13:25 [ons-connect] Connection 9,10.1.230.11,6201 connect (Connectio
n refused)
07/05/16 22:13:26 [ons-connect] Connection a,10.1.230.12,6201 connect (Connectio
n refused)
Well, Once I enabled the debugging, there were some errors reported when opmn is started, the errors are as follows :
Loading Module libopmnohs callback functions
Module libopmnohs: loaded callback function opmnModInitialize
Module libopmnohs: unable to load callback function opmnModSetNumProcs
Module libopmnohs: unable to load callback function opmnModParse
Module libopmnohs: unable to load callback function opmnModDebug
Module libopmnohs: unable to load callback function opmnModDepend
Module libopmnohs: loaded callback function opmnModStart
Module libopmnohs: unable to load callback function opmnModReady
Module libopmnohs: loaded callback function opmnModNotify
Module libopmnohs: loaded callback function opmnModRestart
Module libopmnohs: loaded callback function opmnModStop
Module libopmnohs: loaded callback function opmnModPing
Module libopmnohs: loaded callback function opmnModProcRestore
Module libopmnohs: loaded callback function opmnModProcComp
Module libopmnohs: unable to load callback function opmnModReqComp
Module libopmnohs: unable to load callback function opmnModCall
Module libopmnohs: unable to load callback function opmnModInfo
Module libopmnohs: unable to load callback function opmnModCron
Module libopmnohs: loaded callback function opmnModTerminate
Loading Module libopmnoc4j callback functions
Module libopmnoc4j: loaded callback function opmnModInitialize
Module libopmnoc4j: unable to load callback function opmnModSetNumProcs
Module libopmnoc4j: loaded callback function opmnModParse
Module libopmnoc4j: unable to load callback function opmnModDebug
Module libopmnoc4j: unable to load callback function opmnModDepend
Module libopmnoc4j: loaded callback function opmnModStart
Module libopmnoc4j: unable to load callback function opmnModReady
Module libopmnoc4j: loaded callback function opmnModNotify
Module libopmnoc4j: loaded callback function opmnModRestart
Module libopmnoc4j: loaded callback function opmnModStop
Module libopmnoc4j: loaded callback function opmnModPing
Module libopmnoc4j: loaded callback function opmnModProcRestore
Module libopmnoc4j: loaded callback function opmnModProcComp
Module libopmnoc4j: unable to load callback function opmnModReqComp
Module libopmnoc4j: unable to load callback function opmnModCall
Module libopmnoc4j: unable to load callback function opmnModInfo
Module libopmnoc4j: unable to load callback function opmnModCron
Module libopmnoc4j: loaded callback function opmnModTerminate
Loading Module libopmncustom callback functions
Module libopmncustom: loaded callback function opmnModInitialize
Module libopmncustom: unable to load callback function opmnModSetNumProcs
Module libopmncustom: loaded callback function opmnModParse
Module libopmncustom: loaded callback function opmnModDebug
Module libopmncustom: unable to load callback function opmnModDepend
Module libopmncustom: loaded callback function opmnModStart
Module libopmncustom: loaded callback function opmnModReady
Module libopmncustom: unable to load callback function opmnModNotify
Module libopmncustom: loaded callback function opmnModRestart
Module libopmncustom: loaded callback function opmnModStop
Module libopmncustom: loaded callback function opmnModPing
Module libopmncustom: loaded callback function opmnModProcRestore
Module libopmncustom: loaded callback function opmnModProcComp
Module libopmncustom: loaded callback function opmnModReqComp
Module libopmncustom: unable to load callback function opmnModCall
Module libopmncustom: unable to load callback function opmnModInfo
Module libopmncustom: unable to load callback function opmnModCron
Module libopmncustom: loaded callback function opmnModTerminate
Loading Module libopmniaspt callback functions
Module libopmniaspt: loaded callback function opmnModInitialize
Module libopmniaspt: unable to load callback function opmnModSetNumProcs
Module libopmniaspt: unable to load callback function opmnModParse
Module libopmniaspt: unable to load callback function opmnModDebug
Module libopmniaspt: unable to load callback function opmnModDepend
Module libopmniaspt: loaded callback function opmnModStart
Module libopmniaspt: loaded callback function opmnModReady
Module libopmniaspt: unable to load callback function opmnModNotify
Module libopmniaspt: unable to load callback function opmnModRestart
Module libopmniaspt: loaded callback function opmnModStop
Module libopmniaspt: unable to load callback function opmnModPing
Module libopmniaspt: unable to load callback function opmnModProcRestore
Module libopmniaspt: loaded callback function opmnModProcComp
Module libopmniaspt: unable to load callback function opmnModReqComp
Module libopmniaspt: unable to load callback function opmnModCall
Module libopmniaspt: unable to load callback function opmnModInfo
Module libopmniaspt: unable to load callback function opmnModCron
Module libopmniaspt: loaded callback function opmnModTerminate
Looks pretty bad.. What cuases those errors to happen? Are they related?
Thanks!!

Adding node back into cluster after removal...

Hi,
I removed a cluster node using "scconf -r -h <node>" (carried out all the other usual removal steps before getting this command to work).
Because this is a pair+1 cluster and the node i was trying to remove was physically attached to the quroum device (scsi), I had to create a dummy node before the removal command above would work.
I reinstalled solaris, SC3.1u4 framwork, patches etc. and then tried to run scsinstall again on the node (reintroduced the node to the cluster again first using scconf -a -T node=<node>).
However! during the scsinstall i got the following problem:
Updating file ("ntp.conf.cluster") on node n20-2-sup ... done
Updating file ("hosts") on node n20-2-sup ... done
Updating file ("ntp.conf.cluster") on node n20-3-sup ... done
Updating file ("hosts") on node n20-3-sup ... done
scrconf: RPC: Unknown host
scinstall: Failed communications with "bogusnode"
scinstall: scinstall did NOT complete successfully!
Press Enter to continue:
Was not sure what to do at this point, but since the other clusternodes could now see my 'new' node again, i removed the dummy node, rebooted the new node and said a little prayer...
Now, my node will not boot as part of the cluster:
Rebooting with command: boot
Boot device: /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w21000004cfa3e691,0:a File and args:
SunOS Release 5.10 Version Generic_127111-06 64-bit
Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: n20-1-sup
/usr/cluster/bin/scdidadm: Could not load DID instance list.
Cannot open /etc/cluster/ccr/did_instances.
Booting as part of a cluster
NOTICE: CMM: Node n20-1-sup (nodeid = 1) with votecount = 0 added.
NOTICE: CMM: Node n20-2-sup (nodeid = 2) with votecount = 2 added.
NOTICE: CMM: Node n20-3-sup (nodeid = 3) with votecount = 1 added.
NOTICE: CMM: Node bogusnode (nodeid = 4) with votecount = 0 added.
NOTICE: clcomm: Adapter qfe5 constructed
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 being constructed
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 being constructed
NOTICE: clcomm: Adapter qfe1 constructed
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 being constructed
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 being constructed
NOTICE: CMM: Node n20-1-sup: attempting to join cluster.
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 being initiated
NOTICE: CMM: Node n20-2-sup (nodeid: 2, incarnation #: 1205318308) has become reachable.
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 online
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 being initiated
NOTICE: CMM: Node n20-3-sup (nodeid: 3, incarnation #: 1205265086) has become reachable.
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 online
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 being initiated
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 online
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 being initiated
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 online
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node n20-1-sup (nodeid = 1) is up; new incarnation number = 1205346037.
NOTICE: CMM: Node n20-2-sup (nodeid = 2) is up; new incarnation number = 1205318308.
NOTICE: CMM: Node n20-3-sup (nodeid = 3) is up; new incarnation number = 1205265086.
NOTICE: CMM: Cluster members: n20-1-sup n20-2-sup n20-3-sup.
NOTICE: CMM: node reconfiguration #18 completed.
NOTICE: CMM: Node n20-1-sup: joined cluster.
NOTICE: CMM: Node (nodeid = 4) with votecount = 0 removed.
NOTICE: CMM: Cluster members: n20-1-sup n20-2-sup n20-3-sup.
NOTICE: CMM: node reconfiguration #19 completed.
WARNING: clcomm: per node IP config clprivnet0:-1 (349): 172.16.193.1 failed with 19
WARNING: clcomm: per node IP config clprivnet0:-1 (349): 172.16.193.1 failed with 19
cladm: CLCLUSTER_ENABLE: No such device
UNRECOVERABLE ERROR: Sun Cluster boot: Could not initialize cluster framework
Please reboot in non cluster mode(boot -x) and Repair
syncing file systems... done
WARNING: CMM: Node being shut down.
Program terminated
{1} ok
Any ideas how i can recover this situation without having to reinstall the node again?
(have a flash with OS, sc3.1u4 framework etc... so not the end of the world but...)
Thanks a mil if you can help here!
- headwrecked

Hi - got sorted with this problem...
basically just removed (scinstall -r) the sc3.1u4 software from the node which was not booting, and then re-installed the software (this time the dummy node had been removed so it did not try to contact this node and the scinstall completed without any errors)
I think the only problem with the procedure i used to remove and readd the node was that i forgot to remove the dummy node before re-adding the actaul cluster node again...
If anyone can confirm this to be the case then great - if not... well its working now so this thread can be closed.
root@n20-1-sup # /usr/cluster/bin/scinstall -r
Verifying that no unexpected global mounts remain in /etc/vfstab ... done
Verifying that no device services still reference this node ... done
Archiving the following to /var/cluster/uninstall/uninstall.1036/archive:
/etc/cluster ...
/etc/path_to_inst ...
/etc/vfstab ...
/etc/nsswitch.conf ...
Updating vfstab ... done
The /etc/vfstab file was updated successfully.
The original entry for /global/.devices/node@1 has been commented out.
And, a new entry has been added for /globaldevices.
Mounting /dev/dsk/c3t0d0s6 on /globaldevices ... done
Attempting to contact the cluster ...
Trying "n20-2-sup" ... okay
Trying "n20-3-sup" ... okay
Attempting to unconfigure n20-1-sup from the cluster ... failed
Please consider the following warnings:
scrconf: Failed to remove node (n20-1-sup).
scrconf: All two-node clusters must have at least one shared quorum device.
Additional housekeeping may be required to unconfigure
n20-1-sup from the active cluster.
Removing the "cluster" switch from "hosts" in /etc/nsswitch.conf ... done
Removing the "cluster" switch from "netmasks" in /etc/nsswitch.conf ... done
** Removing Sun Cluster framework packages **
Removing SUNWkscspmu.done
Removing SUNWkscspm..done
Removing SUNWksc.....done
Removing SUNWjscspmu.done
Removing SUNWjscspm..done
Removing SUNWjscman..done
Removing SUNWjsc.....done
Removing SUNWhscspmu.done
Removing SUNWhscspm..done
Removing SUNWhsc.....done
Removing SUNWfscspmu.done
Removing SUNWfscspm..done
Removing SUNWfsc.....done
Removing SUNWescspmu.done
Removing SUNWescspm..done
Removing SUNWesc.....done
Removing SUNWdscspmu.done
Removing SUNWdscspm..done
Removing SUNWdsc.....done
Removing SUNWcscspmu.done
Removing SUNWcscspm..done
Removing SUNWcsc.....done
Removing SUNWscrsm...done
Removing SUNWscspmr..done
Removing SUNWscspmu..done
Removing SUNWscspm...done
Removing SUNWscva....done
Removing SUNWscmasau.done
Removing SUNWscmasar.done
Removing SUNWmdmu....done
Removing SUNWmdmr....done
Removing SUNWscvm....done
Removing SUNWscsam...done
Removing SUNWscsal...done
Removing SUNWscman...done
Removing SUNWscgds...done
Removing SUNWscdev...done
Removing SUNWscnmu...done
Removing SUNWscnmr...done
Removing SUNWscscku..done
Removing SUNWscsckr..done
Removing SUNWscu.....done
Removing SUNWscr.....done
Removing the following:
/etc/cluster ...
/dev/did ...
/devices/pseudo/did@0:* ...
The /etc/inet/ntp.conf file has not been updated.
You may want to remove it or update it after uninstall has completed.
The /var/cluster directory has not been removed.
Among other things, this directory contains
uninstall logs and the uninstall archive.
You may remove this directory once you are satisfied
that the logs and archive are no longer needed.
Log file - /var/cluster/uninstall/uninstall.1036/log
root@n20-1-sup #
Ran the scinstall again:
>>> Confirmation <<<
Your responses indicate the following options to scinstall:
scinstall -ik \
-C N20_Cluster \
-N n20-2-sup \
-M patchdir=/var/cluster/patches \
-A trtype=dlpi,name=qfe1 -A trtype=dlpi,name=qfe5 \
-m endpoint=:qfe1,endpoint=switch1 \
-m endpoint=:qfe5,endpoint=switch2
Are these the options you want to use (yes/no) [yes]?
Do you want to continue with the install (yes/no) [yes]?
Checking device to use for global devices file system ... done
Installing patches ... failed
scinstall: Problems detected during extraction or installation of patches.
Adding node "n20-1-sup" to the cluster configuration ... skipped
Skipped node "n20-1-sup" - already configured
Adding adapter "qfe1" to the cluster configuration ... skipped
Skipped adapter "qfe1" - already configured
Adding adapter "qfe5" to the cluster configuration ... skipped
Skipped adapter "qfe5" - already configured
Adding cable to the cluster configuration ... skipped
Skipped cable - already configured
Adding cable to the cluster configuration ... skipped
Skipped cable - already configured
Copying the config from "n20-2-sup" ... done
Copying the postconfig file from "n20-2-sup" if it exists ... done
Copying the Common Agent Container keys from "n20-2-sup" ... done
Setting the node ID for "n20-1-sup" ... done (id=1)
Verifying the major number for the "did" driver with "n20-2-sup" ... done
Checking for global devices global file system ... done
Updating vfstab ... done
Verifying that NTP is configured ... done
Initializing NTP configuration ... done
Updating nsswitch.conf ...
done
Adding clusternode entries to /etc/inet/hosts ... done
Configuring IP Multipathing groups in "/etc/hostname.<adapter>" files
IP Multipathing already configured in "/etc/hostname.qfe2".
Verifying that power management is NOT configured ... done
Ensure that the EEPROM parameter "local-mac-address?" is set to "true" ... done
Ensure network routing is disabled ... done
Updating file ("ntp.conf.cluster") on node n20-2-sup ... done
Updating file ("hosts") on node n20-2-sup ... done
Updating file ("ntp.conf.cluster") on node n20-3-sup ... done
Updating file ("hosts") on node n20-3-sup ... done
Log file - /var/cluster/logs/install/scinstall.log.938
Rebooting ...
Mar 13 13:59:13 n20-1-sup reboot: rebooted by root
Terminated
root@n20-1-sup # syncing file systems... done
rebooting...
R
LOM event: +103d+20h44m26s host reset
screen not found.
keyboard not found.
Keyboard not present. Using lom-console for input and output.
Sun Netra T4 (2 X UltraSPARC-III+) , No Keyboard
Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.10.1, 4096 MB memory installed, Serial #52960491.
Ethernet address 0:3:ba:28:1c:eb, Host ID: 83281ceb.
Initializing 15MB Rebooting with command: boot
Boot device: /pci@8,600000/SUNW,qlc@4/fp@0,0/disk@w21000004cfa3e691,0:a File and args:
SunOS Release 5.10 Version Generic_127111-06 64-bit
Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: n20-1-sup
Configuring devices.
devfsadm: minor_init failed for module /usr/lib/devfsadm/linkmod/SUNW_scmd_link.so
Loading smf(5) service descriptions: 24/24
/usr/cluster/bin/scdidadm: Could not load DID instance list.
Cannot open /etc/cluster/ccr/did_instances.
Booting as part of a cluster
NOTICE: CMM: Node n20-1-sup (nodeid = 1) with votecount = 0 added.
NOTICE: CMM: Node n20-2-sup (nodeid = 2) with votecount = 2 added.
NOTICE: CMM: Node n20-3-sup (nodeid = 3) with votecount = 1 added.
NOTICE: clcomm: Adapter qfe5 constructed
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 being constructed
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 being constructed
NOTICE: clcomm: Adapter qfe1 constructed
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 being constructed
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 being constructed
NOTICE: CMM: Node n20-1-sup: attempting to join cluster.
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 being initiated
NOTICE: CMM: Node n20-2-sup (nodeid: 2, incarnation #: 1205318308) has become reachable.
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-2-sup:qfe1 online
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 being initiated
NOTICE: CMM: Node n20-3-sup (nodeid: 3, incarnation #: 1205265086) has become reachable.
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-3-sup:qfe5 online
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 being initiated
NOTICE: clcomm: Path n20-1-sup:qfe5 - n20-2-sup:qfe5 online
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 being initiated
NOTICE: clcomm: Path n20-1-sup:qfe1 - n20-3-sup:qfe1 online
NOTICE: CMM: Cluster has reached quorum.
NOTICE: CMM: Node n20-1-sup (nodeid = 1) is up; new incarnation number = 1205416931.
NOTICE: CMM: Node n20-2-sup (nodeid = 2) is up; new incarnation number = 1205318308.
NOTICE: CMM: Node n20-3-sup (nodeid = 3) is up; new incarnation number = 1205265086.
NOTICE: CMM: Cluster members: n20-1-sup n20-2-sup n20-3-sup.
NOTICE: CMM: node reconfiguration #23 completed.
NOTICE: CMM: Node n20-1-sup: joined cluster.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
NOTICE: CMM: Votecount changed from 0 to 1 for node n20-1-sup.
NOTICE: CMM: Cluster members: n20-1-sup n20-2-sup n20-3-sup.
NOTICE: CMM: node reconfiguration #24 completed.
Mar 13 14:02:23 in.ndpd[351]: solicit_event: giving up on qfe1
Mar 13 14:02:23 in.ndpd[351]: solicit_event: giving up on qfe5
did subpath /dev/rdsk/c1t3d0s2 created for instance 2.
did subpath /dev/rdsk/c2t3d0s2 created for instance 12.
did subpath /dev/rdsk/c1t3d1s2 created for instance 3.
did subpath /dev/rdsk/c1t3d2s2 created for instance 6.
did subpath /dev/rdsk/c1t3d3s2 created for instance 7.
did subpath /dev/rdsk/c1t3d4s2 created for instance 8.
did subpath /dev/rdsk/c1t3d5s2 created for instance 9.
did subpath /dev/rdsk/c1t3d6s2 created for instance 10.
did subpath /dev/rdsk/c1t3d7s2 created for instance 11.
did subpath /dev/rdsk/c2t3d1s2 created for instance 13.
did subpath /dev/rdsk/c2t3d2s2 created for instance 14.
did subpath /dev/rdsk/c2t3d3s2 created for instance 15.
did subpath /dev/rdsk/c2t3d4s2 created for instance 16.
did subpath /dev/rdsk/c2t3d5s2 created for instance 17.
did subpath /dev/rdsk/c2t3d6s2 created for instance 18.
did subpath /dev/rdsk/c2t3d7s2 created for instance 19.
did instance 20 created.
did subpath n20-1-sup:/dev/rdsk/c0t6d0 created for instance 20.
did instance 21 created.
did subpath n20-1-sup:/dev/rdsk/c3t0d0 created for instance 21.
did instance 22 created.
did subpath n20-1-sup:/dev/rdsk/c3t1d0 created for instance 22.
Configuring DID devices
t_optmgmt: System error: Cannot assign requested address
obtaining access to all attached disks
n20-1-sup console login:

Is Load Balancer a Must for SOA Cluster

Hi,
We are trying to bring up a soa cluster of 4 soa_servers
We have deployed the code and seems to be picked up by all the soa_servers properly.
The issue we are facing is all the partner link references (which are internal composites in the cluster) is pointing to one of the soa server (i.e. the one we have used during deployment)
Because of this there is no load balancing happening and effectivly only one of the server is fully loaded.
If this is the expected behaviour, is a load balancer/proxy server a must for a soa cluster ?
Thanks,
Ajay....
Edited by: ajaykumar on Sep 6, 2010 8:54 AM

Dear All
Weblogic Server Cluster Domain by default WILL NOT have any Load Balancer and WILL NOT do any load balancing stuff. Any Weblogic Cluster should and must have some sort of Software or Hardware Load Balancer in front of WLS Cluster Domain. Most of the customers use open source Apache Http Server which is very simple configuration and works fine. See this below link for Apache details. Go back on this link to see other WebServers like IIS or Sun WebServer etc. You can use Hardware load balancer.
http://download.oracle.com/docs/cd/E14571_01/web.1111/e14395/apache.htm#CDEGCBAC
The basic steps are in one of the webservers configuration files, you specify the details of backend WLS clustered managed servers host and ports with comma separated. Now this Apache or any other WebServer runs at some host and port. Now this is the host and port that you need to configure for any WebServices WSDL URLs. Then load balancing will happen. So even though we do development say from JDeveloper we create WSDLs or WebServices client using single WLS Server Domain. So naturally that WSDL will have that domains host and port hardcoded. But when you export this webservice .war file and deploy in cluster env you have option to invoke which wsdl url. When we get the WebService Service Object, usually there are 2 constructors. One that take no parameters and internally it goes to the default wsdl url specified in the .wsdl file. Another constructor takes 2 parameters like wsdl url and QName for Service. If you see that service .java file generated when you create client jar for the WebService you will get an idea. So the point is in your client program use the second constructor and for wsdl url, use load balancers host and port and rest will be same. Infact you can put this wsdl url in .properties file also which goes under say WLS Domain folder. So for diff env like Test, QA, Prod etc you can have diff values for wsdl url in .properties file specific to that environment.
So bottom line for Load Balancing we do need external load balancer in front of Weblogic Cluster Domain.
Thanks
Ravi Jegga

Problem in writing to a property node of a cluster

Hello together!
I have a problem in writing to a property node of a cluster which contains several control elements, such as combo boxes or string controls.
I would like to set the options to choose for an array of such clusters.
I tried to do this by writing to property node --> value, but the the control element in the cluster does not remain a control, but instead an indicator. The can't choose one of
the options that I set. So I further set the property node --> indicator (of the cluster) to "False", with the purpose to keep the control as a control. This results in a comment from Labview,
that this is not possible as long as the Vi is not in edit mode. I don't understand this comment. If I look to Labviews toolbar "Operate", I see that I am obviously in edit mode.
If anybody could help me, or suggest a better solution to solve my problem I would be very glad.
Thanks a lot!
Woodi
An example of what I tried to do:
Attachments:
How to write to a cluster.vi ‏42 KB

I took the liberty to modify your VI for an alternative approach (LabVIEW 7.1). You should keep your array of 32 clusters in a shift register and show only a single cluster as a front panel control.
Selecting a different transducer from the listbox on the left will load its settings into that control via a local variable.
Any changes to the settings will modify the currently selected array element
At any give time, a boolean array shows which transducers have changed settings
At any given time, a listbox summarizes all settings.
Let me know it this makes sense to you. These are just some ideas, modify as needed. Good luck!
LabVIEW Champion . Do more with less code and in less time .
Attachments:
How_to_write_to_a_clusterMOD.vi ‏87 KB

Can I use one transport adapter on the nodes of the cluster?

Hi
I am new to sun cluster, in the cluster documentation they mentioned that each node should have 2 network cards one for public connections and one for private connection. what if I do not want the nodes to have public connections except for one node. In other words, I want to use one network card on each node except for the first node in the cluster, users can access the rest of the nodes through the fist node . Is that possible? If yes, what should be the name of the second transport adapter while installing the cluster software on the nodes.
Thank You for the help

Dear
We are using cluster for HA on failover condition, If you have only one network adapter so how you work in failover, and you can't assign one adaptor to two node as same, you have min 2 network adapter for 2 node cluster..
:)GooDLucK
Mohammed Tanvir

What will happen if adding a new node with current cluster, while new node's CPU is slower quality?

Hello,
Say, I have a 3 nodes RAC, I want to add a new node to current cluster... while the new node's CPUs are slower than the others.. what will happen?
(my concern is : can I add this new node successfully? if yes, can it anyway improve the whole cluster performance or not?)
Thank you
s9225

Also you can refer MOS note : RAC: Frequently Asked Questions (Doc ID 220970.1)
Can I have different servers in my Oracle RAC? Can they be from different vendors? Can they be different sizes?

How to find out the IP@s of all nodes in a cluster?

Is there any way to retrieve the IP addresses of all nodes in a cluster?
The problem is the following. We intend to write an administration program
that administers all nodes of a cluster using rmi (e.g. tell all singletons
in the cluster to reload configuration values etc.). My understanding is
that rmi only talks to a single node in a cluster. It would be a convenient
feature if the administration program could figure out all nodes in a
cluster by itself and then administers each node sequentially. So far we're
planning to pass all IP addresses to the administration program e.g. as
command line arguments but what if a node gets left out due to human error?
Thanks for your help.
Bernie

There is no public interface to inquire about the IP addresses of the servers in a cluster. If you use WLS 6.0, there is an administrative console that uses JMX to manage the cluster. Perhaps that would be of use to you?
Bernhard Lenz wrote:
Is there any way to retrieve the IP addresses of all nodes in a cluster?
The problem is the following. We intend to write an administration program
that administers all nodes of a cluster using rmi (e.g. tell all singletons
in the cluster to reload configuration values etc.). My understanding is
that rmi only talks to a single node in a cluster. It would be a convenient
feature if the administration program could figure out all nodes in a
cluster by itself and then administers each node sequentially. So far we're
planning to pass all IP addresses to the administration program e.g. as
command line arguments but what if a node gets left out due to human error?
Thanks for your help.
Bernie

RAC Instalation Problem (shared accross all the nodes in the cluster)

All experts
I am trying for installing Oracle 10.2.0 RAC on Redhat 4.7
reff : http://www.oracle-base.com/articles/10g/OracleDB10gR2RACInstallationOnLinux
All steps successfully completed on all nodes (rac1,rac2) every thing is okey for each node
on single node rac instalation successfull.
when i try to install on two nodes
on specify Oracle Cluster Registry (OCR) location showing error
the location /nfsmounta/crs.configuration is not shared accross all the nodes in the cluster. Specify a shared raw partation or cluster file system file that is visible by the same name on all nodes of the cluster.
I create shared disks on all nodes as:
1 First we need to set up some NFS shares. Create shared disks on NAS or a third server if you have one available. Otherwise create the following directories on the RAC1 node.
mkdir /nfssharea
mkdir /nfsshareb
2. Add the following lines to the /etc/exports file. (edit /etc/exports)
/nfssharea *(rw,sync,no_wdelay,insecure_locks,no_root_squash)
/nfsshareb *(rw,sync,no_wdelay,insecure_locks,no_root_squash)
3. Run the following command to export the NFS shares.
chkconfig nfs on
service nfs restart
4. On both RAC1 and RAC2 create some mount points to mount the NFS shares to.
mkdir /nfsmounta
mkdir /nfsmountb
5. Add the following lines to the "/etc/fstab" file. The mount options are suggestions from Kevin Closson.
nas:/nfssharea /nfsmounta nfs rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0 0 0
nas:/nfsshareb /nfsmountb nfs rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0 0 0
6. Mount the NFS shares on both servers.
mount /mount1
mount /mount2
7. Create the shared CRS Configuration and Voting Disk files.
touch /nfsmounta/crs.configuration
touch /nfsmountb/voting.disk
Please guide me what is wrong

I think you did not really mount it on the second server. what is the output of 'ls /nfsmounta'.
step 6 should be 'mount /nfsmounta', not 'mount 1'. I also don't know if simply creating a zero-size file is sufficient for ocr (i have always used raw devices, not nfs for this)

Regarding number of nodes in endeca cluster

Hi,
I have a question regarding number of nodes in the endeca server cluster.
Our solution contains one data domain running in a endeca cluster with two nodes.
Endeca server documentation recommends to run the cluster with atleast 3 nodes however our solution can't accomdate another server straight away.
Can anyone please suggest what are the implication of running the cluster with two nodes like
1. Can the cluster still serve the request if one node goes down?
2, How the leader promoting works if a node goes down?
Thank you,
regards,
rp

Hi rp,
You can definitely start with two nodes and then add another Endeca Server node later, if needed. It is recommended to run a cluster of three, for increased availability.
Here are some answers to you questions about the cluster behavior:
Q: Can the cluster still serve the request if one node goes down?
A: Quoting from this portion of the Endeca Server Cluster Guide > How enhanced availability is achieved:
Availability of Endeca Server nodes
In an Endeca Server cluster with more than one Endeca Server instance, an ensemble of the Cluster Coordinator services running on a subset of nodes in the Endeca Server cluster ensures enhanced availability of the Endeca Server nodes in the Endeca Server cluster.
When an Endeca Server node in an Endeca Server cluster goes down, all Dgraph nodes hosted on it, and the Cluster Coordinator service (which may also be running on this node) also go down. As long as the Endeca Server cluster consists of more than one node, this does not disrupt the processing of non-updating user requests for the data domains. (It may negatively affect the Cluster Coordinator services. For information on this, see Availability of Cluster Coordinator services.)
If an Endeca Server node fails, the Endeca Server cluster is notified and stops routing all requests to the data domain nodes hosted on that Endeca Server node, until you restart the Endeca Server node.
Let's consider an example that helps illustrate this case. Consider a three-node single data domain cluster hosted on the Endeca Server cluster consisting of three nodes, where each Endeca Server node hosts one Dgraph node for the data domain. In this case:
If one Endeca Server node fails, incoming requests will be routed to the remaining nodes.
If the Endeca Server node that fails happens to be the node that hosts the leader node for the data domain cluster, the Endeca Server cluster selects a new leader node for the data domain from the remaining Endeca Server nodes and routes subsequent requests accordingly. This ensures availability of the leader node for a data domain.
If the Endeca Server node goes down, the data domain nodes (Dgraphs) it is hosting are not moved to another Endeca Server node. If your data domain has more than two nodes dedicated to processing queries, the data domain continues to function. Otherwise, query processing for this data domain may stop until you restart the Endeca Server node.
When you restart the failed Endeca Server node, its processes are restarted by the Endeca Server cluster. Once the node rejoins the cluster, it will rejoin any data domain clusters for the data domains it hosts. Additionally, if the node hosts a Cluster Coordinator, it will also rejoin the ensemble of Cluster Coordinators.
Q: How the leader promoting works if a node goes down? See part of the answer above. Also, this: (from the same topic, but later in text)
Failure of the leader node. When the leader node goes offline, the Endeca Server cluster elects a new leader node and starts sending updates to it. During this stage, follower nodes continue maintaining a consistent view of the data and answering queries. When the node that was the leader node is restarted and joins the cluster, it becomes one of the follower nodes. Note that is also possible that the leader node is restarted and joins the cluster before the Endeca Server cluster needs to appoint a new leader node. In this case, the node continues to serve as the leader node.If the leader node in the data domain changes, the Endeca Server continues routing those requests that require the leader node to the Endeca Server cluster node hosting the newly appointed leader node.
Note: If the leader node in the data domain cluster fails, and if an outer transaction has been in progress, the outer transaction is not applied and is automatically rolled back. In this case, a new outer transaction must be started. For information on outer transactions, see the section about the Transaction Web Service in the Oracle Endeca Server Developer's Guide.
Failure of a follower node. When one of the follower nodes goes offline, the Endeca Server cluster starts routing requests to other available nodes, and attempts to restart the Dgraph process for this follower node. Once the follower node rejoins the cluster, the Endeca Server adjusts its routing information accordingly.
You may ask, why do you need three nodes then? This is to achieve the high availability of the cluster services themselves.
Quoting:
If you do not configure at least three Endeca Server nodes to run the Cluster Coordinator service, the Cluster Coordinator service will be a single point of failure. Should the Cluster Coordinator service fail, access to the data domain clusters hosted in the Endeca Server cluster becomes read-only. This means that it is not possible to change the data domains in any way. You cannot create, resize, start, stop, or change data domains; you also cannot define data domain profiles. You can send read queries to the data domains and perform read operations with the Cluster and Manage Web Services, such as listing data domains or listing nodes. No updates, writes, or changes of any kind are possible while the Cluster Coordinator service in the Endeca Server cluster is down — this applies to both the Endeca Server cluster and data domain clusters. To recover from this situation, the Endeca Server instance that was running a failed Cluster Coordinator must be restarted or replaced (the action required depends on the nature of the failure).
Julia

Find nodes in a cluster

How to find all the nodes in a cluster.
olsnodes -n gives the the info of nodes which are up . If any of the nodes is down it will not show.Is there any command or file from which i can check all nodes of a cluster even if it is down.

In 11.2.0.3 i use:
crsctl stat res -tHope it helps.

Cloned two vm cluster nodes from development cluster to act as template to create production cluster

Morning,
There was so much done to setup the development cluster, I thought it would be easy to have the two nodes in the cluster cloned. To my surprise, the development cluster was up and happily running on the new vm servers. Stopping resources verifies
it is stopping and starting resources on the original cluster. I am not sure how to safely have the two new servers not manage the development cluster and create a new production cluster on them.
I am hesitant to destroy the cluster as I suspect it will destroy the real development cluster. How do I do this? How do I delete the windows cluster software and re-install without affecting the development cluster?
Note that I tried to create a new cluster in the failover cluster manager and specify the new vm cluster servers, but it says they are part of a cluster already. I do not see them listed as nodes. I am not sure how to see what cluster it thinks
the new servers are part of or how to not make them part of the development cluster. That might be the path to my solution.

This actually has worked out okay. I found these steps and did them on both of the nodes that were claiming to be in a cluster already:
powershell
Import-Module FailoverClusters;
clear-clusternode

Distributing load in 4 node mixed hardware cluster

Similar Messages

Maybe you are looking for