Wrong hostname setting after Sun Cluster failover

Hi Gurus,
our PI system has been setup to fail over in a sun cluster with a virtual hostname s280m (primary host s280 secondary host s281)
The basis team set up the system profiles to use the virtual hostname, and I did all the steps in SAP Note 1052984 "Process Integration 7.1 High Availability" (my PI is 7.11)
Now I believe to have substituted "s280m" in every spot where previously "s280" existed, but when I start the system on the DR box (s281), the java stack throws erros when starting. Both SCS01 and DVEBMGS00 work directories contain a file called dev_sldregs with the following error:
Mon Apr 04 11:55:22 2011 Parsing XML document.
Mon Apr 04 11:55:22 2011 Supplier Name: BCControlInstance
Mon Apr 04 11:55:22 2011 Supplier Version: 1.0
Mon Apr 04 11:55:22 2011 Supplier Vendor:
Mon Apr 04 11:55:22 2011 CIM Model Version: 1.5.29
Mon Apr 04 11:55:22 2011 Using destination file '/usr/sap/XP1/SYS/global/slddest.cfg'.
Mon Apr 04 11:55:22 2011 Use binary key file '/usr/sap/XP1/SYS/global/slddest.cfg.key' for data decryption
Mon Apr 04 11:55:22 2011 Use encryted destination file '/usr/sap/XP1/SYS/global/slddest.cfg' as data source
Mon Apr 04 11:55:22 2011 HTTP trace: false
Mon Apr 04 11:55:22 2011 Data trace: false
Mon Apr 04 11:55:22 2011 Using destination file '/usr/sap/XP1/SYS/global/slddest.cfg'.
Mon Apr 04 11:55:22 2011 Use binary key file '/usr/sap/XP1/SYS/global/slddest.cfg.key' for data decryption
Mon Apr 04 11:55:22 2011 Use encryted destination file '/usr/sap/XP1/SYS/global/slddest.cfg' as data source
Mon Apr 04 11:55:22 2011 ******************************
Mon Apr 04 11:55:22 2011 *** Start SLD Registration ***
Mon Apr 04 11:55:22 2011 ******************************
Mon Apr 04 11:55:22 2011 HTTP open timeout = 420 sec
Mon Apr 04 11:55:22 2011 HTTP send timeout = 420 sec
Mon Apr 04 11:55:22 2011 HTTP response timeout = 420 sec
Mon Apr 04 11:55:22 2011 Used URL: http://s280:50000/sld/ds
Mon Apr 04 11:55:22 2011 HTTP open status: false - NI RC=0
Mon Apr 04 11:55:22 2011 Failed to open HTTP connection!
Mon Apr 04 11:55:22 2011 ****************************
Mon Apr 04 11:55:22 2011 *** End SLD Registration ***
Mon Apr 04 11:55:22 2011 ****************************
notice it is using the wrong hostname (s280 instead of s280m). Where did I forget to change the hostname? Any ideas?
thanks in advance,
Peter

Please note that the PI system is transparent about the Failover system used.
When you configure the parameters against the mentioned note, this means that in case one of the nodes is down, the load will be sent to another system under the same Web Dispatcher/Load Balancer.
When using the Solaris failover solution, it covers the whole environment, including the web dispatcher, database and all nodes.
Therefore, please check the configuration as per the page below, which talks specifically about the Solaris failover solution for SAP usage:
http://wikis.sun.com/display/SunCluster/InstallingandConfiguringSunClusterHAfor+SAP

Similar Messages

Patch set on sun cluster

Hi,
I have to upgrade 9.2.0.6 to 9.2.08 on sun cluster.Can you tell me in what sequence I need to install patch set? Is there any pre-requisite which I need to take care in advance?If anybody will provide me exact doc then It will be very helpful.I have seen one doc in metalink but it is not sufficient.
Thanks,
Mk

Have you checked the 9.2.0.8 patch README ? There are several references on how to patch clustered instances - if the information is insufficient or lacking, pl open an SR with Support.
http://updates.oracle.com/ARULink/Readme/process_form?aru=8690150
HTH
Srini

Cannot import a disk group after sun cluster 3.1 installation

Installed Sun Cluster 3.1u3 on nodes with veritas VxVM running and disk groups used. After cluster configuration and reboot, we can no longer import our disk groups. The vxvm displays message: Disk group dg1: import failed: No valid disk found containing disk group.
Did anyone run into the same problem?
The dump of the private region for every single disk in the VM returns the following error:
# /usr/lib/vxvm/diag.d/vxprivutil dumpconfig /dev/did/rdsk/d22s2
VxVM vxprivutil ERROR V-5-1-1735 scan operation failed:
Format error in disk private region
Any help or suggestion would be greatly appreciated
Thx
Max

If I understand correctly, you had VxVM configured before you installed Sun Cluster - correct? When you install Sun Cluster you can no longer import your disk groups.
First thing you need to know is that you need to register the disk groups with Sun Cluster - this happens automatically with Solaris Volume Manager but is a manual process with VxVM. Note you will also have to update the configuration after any changes to the disk group too, e.g. permission changes, volume creation, etc.
You need to use the scsetup menu to achieve this, though it can be done via the command line using an scconf command.
Having said that, I'm still confused by the error. See if the above solves the problem first.
Regards,
Tim
---

Sun Cluster 3.1 setup

Dear All,
Sooner we will upgrade the Sun Cluster 3.1. I am now working on a testing site.
What I am trying to setup is two server with SC 3.1 and simulate the migration procedure, however we don't have SAN in the testing site, so I stuck in the quorum configuration.
I was told SC 3.1 can be setup without any SAN but local disk, however I cannot locate any document related.
Could anyone please help with any tips? How can I setup the quorum device on NFS or even just local disk?
Thanks and Regards,
Donald
Edited by: Foo Donald on 2011/7/14 上午 1:07

Hi Nik,
I have setup a Sun Cluster 3.2 Quorum Server in a third system which is listening on port 9000.
Please correct me if wrong, it seems like Sun Cluster 3.1 command scconf cannot see the quorum server, it cannot specify the IP nor port.
The testing site is on Solaris 8 + Sun Cluster 3.1, it will be upgrade to Solaris 10 + Sun Cluster 3.2 by Live Upgrade.
Thanks and Regards,
Donald

Sun Cluster 3.1 Failover Resource without Logical Hostname

Maybe it could sound strange, but I'd need to create a failover service without any network resource in use (or at least with a dependency on a logical hostname created in a different resource-group).
Does anybody know how to do that?

Well, you don't really NEED a LogicalHostname in a RG. So, i guess i am not understanding
the question.
Is there an application agent which demands to have a network resource in the RG? Sometimes
the VALIDATE method of such agents refuses to work if there is no network resource in
the RG.
If so, tell us a bit more about the application. Is this GDS based and generated by
Sun Cluster Agent Builder? The Agent Builder has a option of "non Network Aware", if you
select that while building you app, it ought to work without a network resource in the RG.
But maybe i should back up and ask the more basic question of exactly what is REQUIRING
you to create a LogicalHostname?
HTH,
-ashu

Creating Logical hostname in sun cluster

Can someone tell me, what exactly logical hostname in sun cluster mean?
For registering logical hostname resource in failoover group, what exactly i need to specify
for example, i have two nodes in sun cluster , How to create or configure a logical hostanme and it should point to which IP Address ( Whether it should point to IP addresses of nodes in sun cluster). Can i get clarification on this?

Thanks Thorsten for ur continue help...
The output of clrs status abc_lg
=== Cluster Resources ===
Resource Name Node Name State Status Message
abc_lg node1 Offline Offline
node2 Offline Offline
The status is offline...
the output of clresourcegroup status
=== Cluster Resource Groups ===
Group Name Node Name Suspended Status
abc_rg node1 No Unmanaged
node2 No Unmanaged
You say that the resource should de enabled after creating the resource.. I am using GDS and i am just following the steps he provided to acheive high availabilty (in developers guide...)
I have 1) Logical hostname resorce.
2) Application resource in my failover resource group
When i bring online the failover resource group , what should my failover resource group status and the status of resource in my resource group

Didadm: unable to determine hostname. error on Sun cluster 4.0 - Solaris11

Trying to install Sun Cluster 4.0 on Sun Solaris 11 (x86-64).
iscs sharedi Quorum Disk are available in /dev/rdsk/ .. ran
devfsadm
cldevice populate
But don't see DID devices getting populated in /dev/did.
Also when scdidadm -L is issued getting the following error. Has any seen the same error ??
- didadm: unable to determine hostname.
Found in cluster 3.2 there was a Bug 6380956: didadm should exit with error message if it cannot determine the hostname
The sun cluster command didadm, didadm -l in particular, requires the hostname to function correctly. It uses the standard C library function gethostname to achieve this.
Early in the cluster boot, prior to the service svc:/system/identity:node coming online, gethostname() returns an empty string. This breaks didadm.
Can anyone point me in the right direction to get past this issue with shared quorum disk DID.

Let's step back a bit. First, what hardware are you installing on? Is it a supported platform or is it some guest VM? (That might contribute to the problems).
Next, after you installed Solaris 11, did the system boot cleanly and all the services come up? (svcs -x). If it did boot cleanly, what did 'uname -n' return? Do commands like 'getent hosts <your_hostname>' work? If there are problems here, Solaris Cluster won't be able to get round them.
If the Solaris install was clean, what were the results of the above host name commands after OSC was installed? Do the hostnames still resolve? If not, you need to look at why that is happening first.
Regards,
Tim
---

Sun Cluster + meta set shared disks -

Guys, I am looking for some instructions that most sun administrators would mostly know i believe.
I am trying to create some cluster resource groups and resources etc., but before that i am creating the file systems that is going to be used by two nodes in the sun cluster 3.2. we use SVM.
I have some drives that i plan to use for this specific cluster resource group that is yet to be created.
i know i have to create a metaset since thats how other resource groups in my environment are setup already so i will go with the same concept.
# metaset -s TESTNAME
Set name = TESTNAME, Set number = 5
Host Owner
server1
server2
Mediator Host(s) Aliases
server1
server2
# metaset -s TESTNAME -a /dev/did/dsk/d15
metaset: server1: TESTNAME: drive d15 is not common with host server2
# scdidadm -L | grep d6
6 server1:/dev/rdsk/c10t6005076307FFC4520000000000004133d0 /dev/did/rdsk/d6
6 server2:/dev/rdsk/c10t6005076307FFC4520000000000004133d0 /dev/did/rdsk/d6
# scdidadm -L | grep d15
15 server1:/dev/rdsk/c10t6005076307FFC4520000000000004121d0 /dev/did/rdsk/d15
Do you see what i am trying to say ? If i want to add d6 in the metaset it will go through fine, but not for d15 since it shows only against one node as you see from the scdidadm output above.
Please Let me know how i share the drive d15 same as d6 with the other node too. thanks much for your help.
-Param
Edited by: paramkrish on Feb 18, 2010 11:01 PM

Hi, Thanks for your reply. You got me wrong. I am not asking you to be liable for the changes you recommend since i know thats not reasonable while asking for help. I am aware this is not a support site but a forum to exchange information that people already are aware of.
We have a support contract but that is only for the sun hardware and those support folks are somewhat ok when it comes to the Solaris and setup but not that experts. I will certainly seek their help when needed and thats my last option. Since i thought this problem that i see is possibly something trivial i quickly posted a question in this forum.
We do have a test environment but that do not have two nodes but a 1 node with zone clusters. hence i dont get to see this similar problem in the test environment and also the "cldev populate" would be of no use as well to me if i try it in the test environment i think since we dont have two nodes.
I will check the logs as you suggested and will get back if i find something. If you have any other thoughts feel free to let me know ( dont bother about the risks since i know i can take care of that ).
-Param

Do SameSubnetDelay or SameSubnetThreshold setting need restart windows failover cluster ?

Hi ,
I set SameSubnetDelay=2000 and SameSubnetThreshold = 10,but after a failover, when i check the cluster log, we still found SameSubnetDelay and SameSubnetThreshold use default setting.
So i want to konw do these settings need restart windows failover cluster core resource group to take effect?
Our windows version: Windows 2008 R2 SP1
Many thanks.

Thanks for your reply.
After set up new values for these two settings , i checked the values use "cluster /prop " command, it indeed show new values.
But after a sql server cluster instance failover, i checked the cluster log to find the cause of failover ,in the cluster log it showed default values for these settings. There were two times like this.
I will keep watching this, and next time i will post the setting values and cluster log.

What are typical failover times for application X on Sun Cluster

Our company does not yet have any hands-on experience with clustering anything on Solaris, although we do with Veritas and Miscrosoft. My experience with MS is that it is as close to seemless (instantaneous) as possible. The Veritas clustering takes a little bit longer to activate the standby's. A new application we are bringing in house soon runs on Sun cluster (it is some BEA Tuxedo/WebLogic/Oracle monster). They claim the time it takes to flip from the active node to the standby node is ~30minutes. This to us seems a bit insane since they are calling this "HA". Is this type of failover time typical in Sun land? Thanks for any numbers or reference.

This is a hard question to answer because it depends on the cluster agent/application.
On one hand you may have a simple Sun Cluster application that fails over in seconds because it has to do a limited amount of work (umount here, mount there, plumb network interface, etc) to actually failover.
On the other hand these operations may, depending on the application, take longer than another application due to the very nature of that application.
An Apache web server failover may take 10-15 seconds but an Oracle failover may take longer. There are many variables that control what happens from the time that a node failure is detected to the time that an application appears on another cluster node.
If the failover time is 30 minutes I would ask your vendor why that is exactly.
Not in a confrontational way but a 'I don't get how this is high availability' since the assumption is that up to 30 minutes could elapse from the time that your application goes down to it coming back on another node.
A better solution might be a different application vendor (I know, I know) or a scalable application that can run on more than one cluster node at a time.
The logic with the scalable approach is that if a failover takes 30 minutes or so to complete it (failover) becomes an expensive operation so I would rather that my application can use multiple nodes at once rather than eat a 30 minute failover if one node dies in a two node cluster:
serverA > 30 minute failover > serverB
seems to be less desirable than
serverA, serverB, serverC, etc concurrently providing access to the application so that failover only happens when we get down to a handful of nodes
Either one is probably more desirable than having an application outage(?)

Failover Zones / Containers with Sun Cluster Geographic Edition and AVS

Hi everyone,
Is the following solution supported/certified by Oracle/Sun? I did find some docs saying it is but cannot find concrete technical information yet...
* Two sites with a 2-node cluster in each site
* 2x Failover containers/zones that are part of the two protection groups (1x group for SAP, other group for 3rd party application)
* Sun Cluster 3.2 and Geographic Edition 3.2 with Availability Suite for SYNC/ASYNC replication over TCP/IP between the two sites
The Zones and their application need to be able to failover between the two sites.
Thanks!
Wim Olivier

Fritz,
Obviously, my colleagues and I, in the Geo Cluster group build and test Geo clusters all the time :-)
We have certainly built and tested Oracle (non-RAC) configurations on AVS. One issue you do have, unfortunately, is that of zones plus AVS (see my Blueprint for more details http://wikis.sun.com/display/BluePrints/Using+Solaris+Cluster+and+Sun+Cluster+Geographic+Edition). Consequently, you can't built the configuration you described. The alternative is to sacrifice zones for now and wait for the fixes to RG affinities (no idea on the schedule for this feature) or find another way to do this - probably hand crafted.
If you follow the OHAC pages (http://www.opensolaris.org/os/community/ha-clusters/) and look at the endorsed projects you'll see that there is a Script Based Plug-in on the way (for OHACGE) that I'm writing. So, if you are interested in playing with OHACGE source or the SCXGE binaries, you might see that appear at some point. Of course, these aren't supported solutions though.
Regards,
Tim
---

Any experience with NFS failover in Sun Cluster?

Hello,
I am planning to install dual-node Sun Cluster for NFS failover configuration. The SAN storage is shared between nodes via Fibre Channel. The NFS shares will be manually assigned to nodes and should fail over / takeback between nodes.
Is this setup tested well? How the NFS clients survive the failover (without "stale NFS handle" errrors)? Does it work smoothly for Solaris,Linux,FreeBSD clients?
Please share your experience.
TIA,
-- Leon

My 3 year old linux installtion on my laptop, which is my NFS client most of the time uses udp as default (kernel 2.4.19).
Anyway the key is that the NFS client, or better, the RPC implementation on the client is intelligent enough to detect a failed TCP connection and tries to reestablish it with the same IP address. Now once the cluster has failed over the logical IP the reconnect will be successful and NFS traffic continues as if nothing bad had happened. This only(!) works if the NFS mount was done with the "hard" option. Only this makes the client retry the connection.
Other "dumb" TCP based applications might not retry and thus would need manual intervention.
Regarding UFS or PxFS, it does not make a difference. NFS does not know the difference. It shares a mount point.
Hope that helped.

Errors after initial Sun Cluster install

- SunOS conch 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V210
- Sun Cluster 3.2
I've gone through the scinstall process using the standard answers to questions. The only exception is that when it came to quorum, I answered I would set it up later, as I want to try to the quorum server. There's no shared storage - I'm seeing if it's possible to create a cluster using IP based replication.
I'm getting these error messages every 30 seconds (looks like a result of:
# svcs lrc:/etc/rc3_d/S91initgchb_resd
STATE STIME FMRI
legacy_run 16:19:29 lrc:/etc/rc3_d/S91initgchb_resd
Feb 8 16:38:59 conch Cluster.GCHB_resd: Unable to open door descriptor /var/run/rgmd_receptionist_door
Feb 8 16:38:59 conch Cluster.GCHB_resd: GCHB system error: scha_cluster_open failed with 18
Feb 8 16:38:59 conch : Bad file number
Feb 8 16:39:29 conch Cluster.GCHB_resd: Unable to open door descriptor /var/run/rgmd_receptionist_door
Feb 8 16:39:29 conch Cluster.GCHB_resd: GCHB system error: scha_cluster_open failed with 18
Feb 8 16:39:29 conch : Bad file number
Feb 8 16:39:59 conch Cluster.GCHB_resd: Unable to open door descriptor /var/run/rgmd_receptionist_door
Feb 8 16:39:59 conch Cluster.GCHB_resd: GCHB system error: scha_cluster_open failed with 18
Feb 8 16:39:59 conch : Bad file number
Feb 8 16:40:29 conch Cluster.GCHB_resd: Unable to open door descriptor /var/run/rgmd_receptionist_door
Feb 8 16:40:29 conch Cluster.GCHB_resd: GCHB system error: scha_cluster_open failed with 18
Feb 8 16:40:29 conch : Bad file number
There's no file system errors, and I'm at a complete loss as to why there appears to be this problem. Can anyone offer any advice?
Cheers,
Iain

Hi,
there are 2 issues here.
1. THe error messages that you see. I get them on my freshly installed cluster as well. What did I do? I used the JES installer and installed SC3.2 and SCGeo 3.2 - to be configured later. Ithink that it should only install the packages but not configure any part of them. It seems that it does oitherwise. To me ghcb sound like global cluster heartbeat.. I'll follow up with the developers to get this clarified.
2. Replication within a cluster and no shared storage. THis has several aspects. I, too, see more and more customer demand to have this. If you get it to work let us know. I am not sure though, why you installed the SC Geo edition to achieve this, as I do not think it well help you here.
In any case I can only recommend to set up the quorum server before proceeding, otherwise your whole cluster will panic as soon as you do a single reboot. That is per design..
Regards
Hartmut

RDP Services do not accept connection after cluster failover

Hi guys,
i am having weird behaviour on my Windows Server 2012 R2.
server 1
- 10.100.1.201
server 2 - 10.100.1.203
VIP - 10.100.1.202
when i perform remote desktop session to server 1 and server 2 after both servers are being rebooted, they are working perfectly fine. during the remote desktop session, i perform a cluster node failover switching node to server 2. immediately i perform
the task, my server 1 connection will hang and not able to login anymore.
strangely, when i am connection from the same server zone and perform remote desktop, they work perfectly fine and will not disconnect me from neither of the server 1 and 2.
i am suspecting the network routing mess up during the cluster failover, but from the route print, there are identical and has no problem with it.
any one here has the same problem i experience?
zhiyuan

Hi,
sorry if i lose you ... here is the story
node 1 - 10.100.1.201
node 2 - 10.100.1.203
vip - 10.100.1.202
when both server restarted, i can remote to both servers, no problem.
1. RDP to both node 1 and 2 together on their physical IP. Connected successfully.
2. checked the Active node on node 1. Perform fail over from node 1 to 2, node 1 RDP session loss connection immediately. checked on node 2, cluster node active on node 2. no errors.
3. perform node 2 to node 1 fail over. Node 2 RDP session loss connection immediately, node 1 session came back. checked cluster node active on node 1. no errors.
4. in order to have both can continue to rdp, perform restart on node 2 (the node cannot reconnect), after reboot, rdp back to normal.
5. firewall team confirm connection has reach server, server not responding to rdp apparently.

DS6 in a zone on a Sun Cluster

I have a sun cluster that I am trying to configure and I don't know if I am trying to do something wrong, so I thought I would ask.
I am using Sun Cluster 3.2 on a pair of Sun T2000s with a Fiber Channel disk array attached to both nodes. I have configured the disk array to have two file systems. One for each server. I have configured two resource groups in the global zone and setup a HAStoragePlus resource for each file system. I am successfully able to fail the file systems between the two nodes. On each of the file systems I have installed a zone. The zone is managed with the resource type provided by the SUNWsczone package to start and stop the zone. The resource is in the same resource group has the HAStoragePlus resource.
At this point I have created resource group for the zone to manage the directory server. After creating the resource group I am trying to create a resource for the directory service HA service. When I use the clresource command it complains that the resource group does not contain a logical hostname. When using the services provided by the SUNWsczone package I created a logical hostname that is being assigned to the zone in question. Is there a way to install the Directory Server HA resource into the resource group for the zone?

Philippe,
DS 6 Sun Cluster Agent was not tested with SC 3.2 in Zones.
Zone support came with SC 3.2, and DS 6 Cluster Agent was built with SC 3.1, tested with SC 3.1 and 3.2 in the Global zone.
Regards,
Ludovic.

Wrong hostname setting after Sun Cluster failover

Similar Messages

Maybe you are looking for