Sun cluster quorum disk
Hi,
I just want to know how to assign quorum disk under Sun cluster. Can I use LUN disk that is shared for both node as quorum disk and do I need to bring the disk first to vxvm control before using it as quorum disk? Appreciate any response/advise.
Thanks.
No you don't need to bring the disk into VXVM control.
First run scdidadm -L from either node. This will give you a list of shared disk devices. Find one that is shared between the nodes and note its DID. ie d21
scconf -a -q globaldev=d21
Once you have added a quorum disk you can set install mode to off.
scconf -c -q installmodeoff
I would also recommend reading this:
http://docs.sun.com/app/docs/doc/816-3384/6m9lu6fig?q=sun+cluster+add+quorum+disk&a=view
Then reset you quorum count.
scconf -c -q reset
Similar Messages
-
Hi, I'm having a problem in a VM Guest cluster using Windows Server 2012 R2 and virtual disk sharing enabled.
It's a SQL 2012 cluster, which has around 10 vhdx disks shared this way. all the VHDX files are inside LUNs on a SAN. These LUNs are presented to all clustered members of the Windows Server 2012 R2 Hyper-V cluster, via Cluster Shared Volumes.
Yesterday happened a very strange problem, both the Quorum Disk and the DTC disks got the information completetly erased. The vhdx disks themselves where there, but the info inside was gone.
The SQL admin had to recreated both disks, but now we don't know if this issue was related to the virtualization platform or another event inside the cluster itself.
Right now I'm seen this errors on one of the VM Guest:
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 3/4/2014 11:54:55 AM
Event ID: 1069
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: ServerDB02.domain.com
Description:
Cluster resource 'Quorum-HDD' of type 'Physical Disk' in clustered role 'Cluster Group' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster
Manager or the Get-ClusterResource Windows PowerShell cmdlet.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>1069</EventID>
<Version>1</Version>
<Level>2</Level>
<Task>3</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2014-03-04T17:54:55.498842300Z" />
<EventRecordID>14140</EventRecordID>
<Correlation />
<Execution ProcessID="1684" ThreadID="2180" />
<Channel>System</Channel>
<Computer>ServerDB02.domain.com</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="ResourceName">Quorum-HDD</Data>
<Data Name="ResourceGroup">Cluster Group</Data>
<Data Name="ResTypeDll">Physical Disk</Data>
</EventData>
</Event>
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 3/4/2014 11:54:55 AM
Event ID: 1558
Task Category: Quorum Manager
Level: Warning
Keywords:
User: SYSTEM
Computer: ServerDB02.domain.com
Description:
The cluster service detected a problem with the witness resource. The witness resource will be failed over to another node within the cluster in an attempt to reestablish access to cluster configuration data.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>1558</EventID>
<Version>0</Version>
<Level>3</Level>
<Task>42</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2014-03-04T17:54:55.498842300Z" />
<EventRecordID>14139</EventRecordID>
<Correlation />
<Execution ProcessID="1684" ThreadID="2180" />
<Channel>System</Channel>
<Computer>ServerDB02.domain.com</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="NodeName">ServerDB02</Data>
</EventData>
</Event>
We don't know if this can happen again, what if this happens on disk with data?! We don't know if this is related to the virtual disk sharing technology or anything related to virtualization, but I'm asking here to find out if it is a possibility.
Any ideas are appreciated.
Thanks.
Eduardo RojasHi,
Please refer to the following link:
http://blogs.technet.com/b/keithmayer/archive/2013/03/21/virtual-machine-guest-clustering-with-windows-server-2012-become-a-virtualization-expert-in-20-days-part-14-of-20.aspx#.Ux172HnxtNA
Best Regards,
Vincent Wu
Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread. -
Failover Cluster Quorum Disk is fallen off the shared volume
Hi, we had a Cluster that was holding 40+ VMs and was originally setup with the building 1GB Ethernet adapter, we yesterday installed Qlogic 10 GB NIC teaming for both of the nodes and reconfigured the network on both nodes. However now the Quorum disk
is not a part of Cluster Shared Volume How can I add that disk back to the shared volum please?Hi Riaz,
Add a quorum disk is easy so please let us know if there is any specific error occurs during the steps provided in following thread:
http://social.technet.microsoft.com/Forums/windowsserver/en-US/0566ede4-55bb-4694-a134-104fac2a7052/replace-quorum-disk-on-failover-cluster-on-different-lun?forum=winserverClustering
If you have any feedback on our support, please send to [email protected] -
Sun Cluster with Netapps - iSCSI quorum and network port
I am proposing Sun cluster with Netapps 3020C.
May I know
1) OS is Solaris 9. The SUN OSP says that we need to obtain an iSCSI license from Netapps. Is this the iSCSI initiator software for Solaris 9 to talk to the NAS quorum? Or do I need to purchased a 3rd party iSCSI initiator ?
2) We provide 2 network ports for the Netapps private NAS LAN. Is it a must to cater another dedicated network port for the iSCSI communication with the quorum?
3) If we need purchase a 3rd party iSCSI initiator, where can we get this? I have checked Qlogic and Cisco, they are both not suitable for my solution.
Appreciate your helpHi,
1) OS is Solaris 9. The SUN OSP says that we need to
obtain an iSCSI license from Netapps. Is this the
iSCSI initiator software for Solaris 9 to talk to the
NAS quorum? Or do I need to purchased a 3rd party
iSCSI initiator ?Have a look at http://docs.sun.com/app/docs/doc/817-7957/6mn8834r2?a=view
I read the "Requirements When Configuring NAS Devices as Quorum Devices"
section as this is the license for the iSCSI inititator software.
So you need to enable iSCSI on the netapps box and need to install a package from netapps (NTAPclnas) on the cluster nodes.
2) We provide 2 network ports for the Netapps
private NAS LAN. Is it a must to cater another
dedicated network port for the iSCSI communication
with the quorum?Have a look at http://docs.sun.com/app/docs/doc/819-0580/6n30eahcc?a=view#ch4_quorum-9
I don't read such a requirement there.
3) If we need purchase a 3rd party iSCSI initiator,
where can we get this? I have checked Qlogic and
Cisco, they are both not suitable for my solution.
Appreciate your helpI don't thibk you need such a 3rd party iSCSI initiator, unless this is stated in the above docs.
Greets
Thorsten -
Can't start cluster, 2 node 3.3 cluster lost 2 quorum disks
Hi,
I have a 2 node cluster 1 one iscsi quorum disk, I was in the middle of migrating the quorum device to another iscsu disk, when the servers lost contact with the disks(iscsi targe problem), so the 2 cluster nodes where left with no quorum, because of the 2 quorum devices 3 votes are needed, I only have 2 votes from the 2 cluster nodes.
iscsi disks are back online, but the cluster/quorum isn't able to get hold of them.
May 11 11:21:59 vmcluster1 genunix: [ID 965873 kern.notice] NOTICE: CMM: Node vmcluster2 (nodeid = 1) with votecount = 1 added.
May 11 11:21:59 vmcluster1 genunix: [ID 965873 kern.notice] NOTICE: CMM: Node vmcluster1 (nodeid = 2) with votecount = 1 added.
May 11 11:22:04 vmcluster1 genunix: [ID 832830 kern.warning] WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d1s2 with error 2.
May 11 11:22:10 vmcluster1 genunix: [ID 832830 kern.warning] WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d2s2 with error 2.
May 11 11:22:14 vmcluster1 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter e1000g2 constructed
May 11 11:22:15 vmcluster1 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter e1000g1 constructed
May 11 11:22:15 vmcluster1 genunix: [ID 843983 kern.notice] NOTICE: CMM: Node vmcluster1: attempting to join cluster.
May 11 11:22:15 vmcluster1 e1000g: [ID 801725 kern.info] NOTICE: pci8086,100e - e1000g[2] : link up, 1000 Mbps, full duplex
May 11 11:22:16 vmcluster1 e1000g: [ID 801725 kern.info] NOTICE: pci8086,100e - e1000g[1] : link up, 1000 Mbps, full duplex
May 11 11:23:20 vmcluster1 genunix: [ID 832830 kern.warning] WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d1s2 with error 2.
May 11 11:23:25 vmcluster1 genunix: [ID 832830 kern.warning] WARNING: CMM: Open failed for quorum device /dev/did/rdsk/d2s2 with error 2.
May 11 11:23:25 vmcluster1 genunix: [ID 980942 kern.notice] NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
Looks like the server thinks the ID of the disks have changed:
[root@vmcluster1:/]# scdidadm -L (05-11 11:27)
1 vmcluster1:/dev/rdsk/c3t5d0 /dev/did/rdsk/d1
1 vmcluster2:/dev/rdsk/c3t5d0 /dev/did/rdsk/d1
2 vmcluster1:/dev/rdsk/c3t4d0 /dev/did/rdsk/d2
2 vmcluster2:/dev/rdsk/c3t4d0 /dev/did/rdsk/d2
3 vmcluster2:/dev/rdsk/c1t0d0 /dev/did/rdsk/d3
4 vmcluster1:/dev/rdsk/c1t0d0 /dev/did/rdsk/d4
5 vmcluster2:/dev/rdsk/c3t6d0 /dev/did/rdsk/d5
5 vmcluster1:/dev/rdsk/c3t6d0 /dev/did/rdsk/d5
6 vmcluster2:/dev/rdsk/c1t1d0 /dev/did/rdsk/d6
7 vmcluster1:/dev/rdsk/c1t1d0 /dev/did/rdsk/d7
[root@vmcluster1:/]# scdidadm -r (05-11 11:27)
scdidadm: Device ID "vmcluster1:/dev/rdsk/c3t5d0" does not match physical device ID for "d1".
Warning: Device "vmcluster1:/dev/rdsk/c3t5d0" might have been replaced.
scdidadm: Device ID "vmcluster1:/dev/rdsk/c3t4d0" does not match physical device ID for "d2".
Warning: Device "vmcluster1:/dev/rdsk/c3t4d0" might have been replaced.
scdidadm: Device ID "vmcluster1:/dev/rdsk/c3t6d0" does not match physical device ID for "d5".
Warning: Device "vmcluster1:/dev/rdsk/c3t6d0" might have been replaced.
scdidadm: Could not save DID instance list to file.
scdidadm: File /etc/cluster/ccr/global/did_instances exists.
Disks are ok, and accesible from format
[root@vmcluster1:/]# echo | format (05-11 11:28)
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <DEFAULT cyl 8351 alt 2 hd 255 sec 63>
/pci@0,0/pci8086,2829@d/disk@0,0
1. c1t1d0 <DEFAULT cyl 1020 alt 2 hd 64 sec 32>
/pci@0,0/pci8086,2829@d/disk@1,0
2. c3t4d0 <IET-VIRTUAL-DISK-0-1.00GB>
/iscsi/[email protected]%3Astorage.lun10001,0
3. c3t5d0 <DEFAULT cyl 497 alt 2 hd 64 sec 32>
/iscsi/[email protected]%3Astorage.lun20001,1
4. c3t6d0 <DEFAULT cyl 496 alt 2 hd 64 sec 32>
/iscsi/[email protected]%3Astorage.lun30001,2
Is there a way to remove a quorum device without the cluster online?
Or is there another alternative?, try and fix the did problem ?
Thanks!This is the primary reason that you have one and only one quorum device. There are many failure modes that result in your cluster not starting. Looks like your only option is to hand edit the CCR. If this is a production cluster, please log a service desk ticket for the full procedure. If it's just a development cluster and you are happy to take a risk, the basic outline is (IIRC):
1. Boot nodes into non-cluster mode
2. Edit /etc/cluster/ccr/global/infrastructure and either remove the cluster.quorum_devices.* entries or set the votecount to 0
3. cd /etc/cluster/ccr/global
4. Run /usr/cluster/lib/sc/ccradm replace -i infrastructure infrastructure
5. Reboot back into cluster mode
6. Add one new quorum disk
You may need to run one or more of:
# cldev refresh
# cldev check
# cldev clean
# cldev populate
to get the right DID entries between steps 5 and 6.
Tim
--- -
RAW disks for Oracle 10R2 RAC NO SUN CLUSTER
Yes you read it correctly....no Sun cluster. Then why am I on the Forum right? Well we have one Sun Cluster and another that is RAC only for testing. Between Oracle and Sun, neither accept any fault for problems with their perfectly honed products. Currently, I have multipathed fiber hba's to a Storedge 3510, and I've tried to get Oracle to use a raw lun for the ocr and voting disks. It doesn't see the disk. I've made sure they are stamped for oracle:dba, and tried oracle:oinstall. When presenting /dev/rdsk/C7t<long number>d0s6 for the ocr, I get a "can not find disk path." Does Oracle raw mean SVM raw? Should I create metadisks?
"Between Oracle and Sun, neither accept any fault for problems with their perfectly honed products"...more specific:
Not that the word "fault" is characterization of any liability, but a technical characterization of acting like a responsible stakeholder when you sell your product to a corporation. I've been working on the same project for a year, as an engineer. Not withstanding a huge expanse of management issues over the project, when technical gray areas have been reached, whereas our team has tried to get information to solve the issue. The area has become a big bouncing hot potato. Specifically, when Oracle has a problem reading a storage device, according to Oracle, that is a Sun issue. According to Sun, they didn't certify the software on that piece of equipment, so go talk to Oracle. In the sun cluster arena, if starting the database creates a node eviction from the cluster, good luck getting any specific team to say, that's our problem. Sun will say that Oracle writes crappy cluster verify scripts, and Oracle will say that Sun has not properly certified the device for use with their product. Man, I've seen it. The first time I said O.K. how do we avoid this in the future, the second time I said how did I let this happen again, and after more issues, money spent, hours lost, and customers, pissed --do the math. I've even went as far as say, find me a plug and play production model for this specific environment, but good luck getting two companies to sign the specs for it...neither wants to stamp their name on the product due to the liability. Yes your right, I should beat the account team, but as an engineer, man that's not my area, and I have other problems that I was hired to deal with. I could go on. What really is a slap in face is no one wants to work on these projects, if given the choice with doing a Windows deployment, because they can pop out mind bending amounts of builds why we plop along figuring out why clusterware doesn't like slice 6 of a /device/scsi_vhci/ . Try finding good documentation on that. ~You can deploy faster, but you can't pay more! -
Cannot import a disk group after sun cluster 3.1 installation
Installed Sun Cluster 3.1u3 on nodes with veritas VxVM running and disk groups used. After cluster configuration and reboot, we can no longer import our disk groups. The vxvm displays message: Disk group dg1: import failed: No valid disk found containing disk group.
Did anyone run into the same problem?
The dump of the private region for every single disk in the VM returns the following error:
# /usr/lib/vxvm/diag.d/vxprivutil dumpconfig /dev/did/rdsk/d22s2
VxVM vxprivutil ERROR V-5-1-1735 scan operation failed:
Format error in disk private region
Any help or suggestion would be greatly appreciated
Thx
MaxIf I understand correctly, you had VxVM configured before you installed Sun Cluster - correct? When you install Sun Cluster you can no longer import your disk groups.
First thing you need to know is that you need to register the disk groups with Sun Cluster - this happens automatically with Solaris Volume Manager but is a manual process with VxVM. Note you will also have to update the configuration after any changes to the disk group too, e.g. permission changes, volume creation, etc.
You need to use the scsetup menu to achieve this, though it can be done via the command line using an scconf command.
Having said that, I'm still confused by the error. See if the above solves the problem first.
Regards,
Tim
--- -
Bizzare Disk reservation probelm with sun cluster 3.2 - solaris 10 X 4600
We have a 4 node X4600 sun cluster with shared AMS500 storage. There over 30 LUN's presented to the cluster.
When any of the two higher nodes ( ie node id 2 and node is 3 ) are booted, their keys are not added to 4 out of 30 LUNS. These 4 LUNs show up as drive type unknown in format. I've noticed that the only thing common with these LUN's is that their size is bigger than 1TB
To resolve this I simply scrub the keys, run sgdevs than they showup as normal in format and all nodes keys are present on the LUNS.
Has anybody come across this behaviour.
Commands used to resolve problem
1. check keys #/usr/cluster/lib/sc/scsi -c inkeys -d devicename
2. scrub keys #/usr/cluster/lib/sc/scsi -c scrub -d devicename
3. #sgdevs
4. check keys #/usr/cluster/lib/sc/scsi -c inkeys -d devicename
all node's keys are now present on the lunHi,
according to http://www.sun.com/software/cluster/osp/emc_clarion_interop.xml you can use both.
So at the end it all boils down to
- cost: Solaris multipathing is free, as it is bundled
- support: Sun can offer better support for the Sun software
You can try to browse this forum to see what others have experienced with Powerpath. From a pure "use as much integrated software as possible" I would go with the Solaris drivers.
Hartmut -
Sun Cluster + meta set shared disks -
Guys, I am looking for some instructions that most sun administrators would mostly know i believe.
I am trying to create some cluster resource groups and resources etc., but before that i am creating the file systems that is going to be used by two nodes in the sun cluster 3.2. we use SVM.
I have some drives that i plan to use for this specific cluster resource group that is yet to be created.
i know i have to create a metaset since thats how other resource groups in my environment are setup already so i will go with the same concept.
# metaset -s TESTNAME
Set name = TESTNAME, Set number = 5
Host Owner
server1
server2
Mediator Host(s) Aliases
server1
server2
# metaset -s TESTNAME -a /dev/did/dsk/d15
metaset: server1: TESTNAME: drive d15 is not common with host server2
# scdidadm -L | grep d6
6 server1:/dev/rdsk/c10t6005076307FFC4520000000000004133d0 /dev/did/rdsk/d6
6 server2:/dev/rdsk/c10t6005076307FFC4520000000000004133d0 /dev/did/rdsk/d6
# scdidadm -L | grep d15
15 server1:/dev/rdsk/c10t6005076307FFC4520000000000004121d0 /dev/did/rdsk/d15
Do you see what i am trying to say ? If i want to add d6 in the metaset it will go through fine, but not for d15 since it shows only against one node as you see from the scdidadm output above.
Please Let me know how i share the drive d15 same as d6 with the other node too. thanks much for your help.
-Param
Edited by: paramkrish on Feb 18, 2010 11:01 PMHi, Thanks for your reply. You got me wrong. I am not asking you to be liable for the changes you recommend since i know thats not reasonable while asking for help. I am aware this is not a support site but a forum to exchange information that people already are aware of.
We have a support contract but that is only for the sun hardware and those support folks are somewhat ok when it comes to the Solaris and setup but not that experts. I will certainly seek their help when needed and thats my last option. Since i thought this problem that i see is possibly something trivial i quickly posted a question in this forum.
We do have a test environment but that do not have two nodes but a 1 node with zone clusters. hence i dont get to see this similar problem in the test environment and also the "cldev populate" would be of no use as well to me if i try it in the test environment i think since we dont have two nodes.
I will check the logs as you suggested and will get back if i find something. If you have any other thoughts feel free to let me know ( dont bother about the risks since i know i can take care of that ).
-Param -
Beta Refresh Release Now Available! Sun Cluster 3.2 Beta Program
The Sun Cluster 3.2 Release team is pleased to announce a Beta Refresh release. This release is based on our latest and greatest build of Sun Cluster 3.2, build 70, which is close to the final Revenue Release build of the product.
To apply for the Sun Cluster 3.2 Beta program, please visit:
https://feedbackprograms.sun.com/callout/default.html?callid=%7B11B4E37C-D608-433B-AF69-07F6CD714AA1%7D
or contact Eric Redmond <[email protected]>.
New Features in Sun Cluster 3.2
Ease of use
* New Sun Cluster Object Oriented Command Set
* Oracle RAC 10g improved integration and administration
* Agent configuration wizards
* Resources monitoring suspend
* Flexible private interconnect IP address scheme
Availability
* Extended flexibility for fencing protocol
* Disk path failure handling
* Quorum Server
* Cluster support for SMF services
Flexibility
* Solaris Container expanded support
* HA ZFS
* HDS TrueCopy campus cluster
* Veritas Flashsnap Fast Mirror Resynchronization 4.1 and 5.0 option support
* Multi-terabyte disk and EFI label support
* Veritas Volume Replicator 5.0 support
* Veritas Volume Manager 4.1 support on x86 platform
* Veritas Storage Foundation 5.0 File System and Volume Manager
OAMP
* Live upgrade
* Dual partition software swap (aka quantum leap)
* Optional GUI installation
* SNMP event MIB
* Command logging
* Workload system resource monitoring
Note: Veritas 5.0 features are not supported with SC 3.2 Beta.
Sun Cluster 3.2 beta supports the following Data Services
* Apache (shipped with the Solaris OS)
* DNS
* NFS V3
* Java Enterprise System 2005Q4: Application Server, Web Server, Message Queue, HADBWithout speculating on the release date of Sun Cluster 3.x or even its feature list, I would like to understand what risk Sun would take when Sun Cluster would support ZFS as a failover filesystem? Once ZFS is part of Solaris 10, I am sure customers will want to use it in clustered environments.
BTW: this means that even Veritas will have to do something about ZFS!!!
If VCS is a much better option, it would be interesting to understand what features are missing from Sun Cluster to make it really competitive.
Thanks
Hartmut -
Sun Cluster 3.3 Mirror 2 SAN storages (storagetek) with SVM
Hello all,
I would like to know if you have any best practice for mirroring two storage systems with svm on sun cluster without corrupting/loosing data from the storages.
I currently have enabled the multipath on the fc (stmsboot) after that configure the cluster and created the SVM mirror with the did devices.
I have some issues that i wan to know if there's gonna be any problem.
a) 4 quorum votes. As i have two (2) nodes and 2 storages that i need to know which is up i have 4 votes, so in order the cluster to start needs 3 votes. Is there any solution on this like cldevice combine ?
b) The mirror is on SVM level so when a failover happens the metasets go to the other node. Is there any change to start the mirror from the second SAN insteand of the first and have any kind of corruption? Is there someway to better protect the storage ?
c) The storagetek has option for snapshots, is there a good way of using this feature or not?
d) Is there any problem by failling over global filesystems (global option in mount)? The only thing that may write in this filesystem is the application itself that belongs in the same resource group, so when it will need to fail over it will stop all the proccesses accessing this filesystem and it would be ok to unmount it.
Best regards to all of you,
PiTThank you very much for your answers Tim, they are really very helpfull, i only have some comments on them to be fully answered.
a) Its all answered to me. I thing that i will add the vote from only one storage and if the storage goes down, i will tell the customer to check the quorum status and add the second storage as QD. The quorum server is not a bad idea, but if the network is down for some reason i thing that bad thing will happen so i dont wont to relly on that.
b) I think you are clear enough.
c) I thing you are clear enough! (just as i thought this would happen for the snapshots....)
d) Finally, if this filesystem is in a metadisk that is been started from the first node and the second node is proxing to the first node for the metaset disks, is there any change to lock the filesystem/ metaset group and don't be able to take it?
Thanks in advance,
Pit
(I will also look the document you mention, a lot of thanks) -
LDOM SUN Cluster Interconnect failure
I am making a test SUN-Cluster on Solaris 10 in LDOM 1.3.
in my environment, i have T5120, i have setup two guest OS with some configurations, setup sun cluster software, when executed, scinstall, it failed.
node 2 come up, but node 1 throws following messgaes:
Boot device: /virtual-devices@100/channel-devices@200/disk@0:a File and args:
SunOS Release 5.10 Version Generic_139555-08 64-bit
Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: test1
Configuring devices.
Loading smf(5) service descriptions: 37/37
/usr/cluster/bin/scdidadm: Could not load DID instance list.
/usr/cluster/bin/scdidadm: Cannot open /etc/cluster/ccr/did_instances.
Booting as part of a cluster
NOTICE: CMM: Node test2 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node test1 (nodeid = 2) with votecount = 0 added.
NOTICE: clcomm: Adapter vnet2 constructed
NOTICE: clcomm: Adapter vnet1 constructed
NOTICE: CMM: Node test1: attempting to join cluster.
NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.
NOTICE: clcomm: Path test1:vnet1 - test2:vnet1 errors during initiation
NOTICE: clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
WARNING: Path test1:vnet1 - test2:vnet1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
WARNING: Path test1:vnet2 - test2:vnet2 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
clcomm: Path test1:vnet2 - test2:vnet2 errors during initiation
CREATED VIRTUAL SWITCH AND VNETS ON PRIMARY DOMAIN LIKE:<>
532 ldm add-vsw mode=sc cluster-vsw0 primary
533 ldm add-vsw mode=sc cluster-vsw1 primary
535 ldm add-vnet vnet2 cluster-vsw0 test1
536 ldm add-vnet vnet3 cluster-vsw1 test1
540 ldm add-vnet vnet2 cluster-vsw0 test2
541 ldm add-vnet vnet3 cluster-vsw1 test2
Primary DOmain<>
bash-3.00# dladm show-dev
vsw0 link: up speed: 1000 Mbps duplex: full
vsw1 link: up speed: 0 Mbps duplex: unknown
vsw2 link: up speed: 0 Mbps duplex: unknown
e1000g0 link: up speed: 1000 Mbps duplex: full
e1000g1 link: down speed: 0 Mbps duplex: half
e1000g2 link: down speed: 0 Mbps duplex: half
e1000g3 link: up speed: 1000 Mbps duplex: full
bash-3.00# dladm show-link
vsw0 type: non-vlan mtu: 1500 device: vsw0
vsw1 type: non-vlan mtu: 1500 device: vsw1
vsw2 type: non-vlan mtu: 1500 device: vsw2
e1000g0 type: non-vlan mtu: 1500 device: e1000g0
e1000g1 type: non-vlan mtu: 1500 device: e1000g1
e1000g2 type: non-vlan mtu: 1500 device: e1000g2
e1000g3 type: non-vlan mtu: 1500 device: e1000g3
bash-3.00#
NOde1<>
-bash-3.00# dladm show-link
vnet0 type: non-vlan mtu: 1500 device: vnet0
vnet1 type: non-vlan mtu: 1500 device: vnet1
vnet2 type: non-vlan mtu: 1500 device: vnet2
-bash-3.00# dladm show-dev
vnet0 link: unknown speed: 0 Mbps duplex: unknown
vnet1 link: unknown speed: 0 Mbps duplex: unknown
vnet2 link: unknown speed: 0 Mbps duplex: unknown
-bash-3.00#
NODE2<>
-bash-3.00# dladm show-link
vnet0 type: non-vlan mtu: 1500 device: vnet0
vnet1 type: non-vlan mtu: 1500 device: vnet1
vnet2 type: non-vlan mtu: 1500 device: vnet2
-bash-3.00#
-bash-3.00#
-bash-3.00# dladm show-dev
vnet0 link: unknown speed: 0 Mbps duplex: unknown
vnet1 link: unknown speed: 0 Mbps duplex: unknown
vnet2 link: unknown speed: 0 Mbps duplex: unknown
-bash-3.00#
and this configuration i give while setting up scinstall
Cluster Transport Adapters and Cables <<<You must identify the two cluster transport adapters which attach
this node to the private cluster interconnect.
For node "test1",
What is the name of the first cluster transport adapter [vnet1]?
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
All transport adapters support the "dlpi" transport type. Ethernet
and Infiniband adapters are supported only with the "dlpi" transport;
however, other adapter types may support other types of transport.
For node "test1",
Is "vnet1" an Ethernet adapter (yes/no) [yes]?
Is "vnet1" an Infiniband adapter (yes/no) [yes]? no
For node "test1",
What is the name of the second cluster transport adapter [vnet3]? vnet2
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
For node "test1",
Name of the switch to which "vnet2" is connected [switch2]?
For node "test1",
Use the default port name for the "vnet2" connection (yes/no) [yes]?
For node "test2",
What is the name of the first cluster transport adapter [vnet1]?
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
For node "test2",
Name of the switch to which "vnet1" is connected [switch1]?
For node "test2",
Use the default port name for the "vnet1" connection (yes/no) [yes]?
For node "test2",
What is the name of the second cluster transport adapter [vnet2]?
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
For node "test2",
Name of the switch to which "vnet2" is connected [switch2]?
For node "test2",
Use the default port name for the "vnet2" connection (yes/no) [yes]?
i have setup the configurations like.
ldm list -l nodename
NODE1<>
NETWORK
NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:61:63 1 1500
vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f8:87:27 1 1500
vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:f8:f0:db 1 1500
ldm list -l nodename
NODE2<>
NETWORK
NAME SERVICE ID DEVICE MAC MODE PVID VID MTU LINKPROP
vnet1 primary-vsw0@primary 0 network@0 00:14:4f:f9:a1:68 1 1500
vnet2 cluster-vsw0@primary 1 network@1 00:14:4f:f9:3e:3d 1 1500
vnet3 cluster-vsw1@primary 2 network@2 00:14:4f:fb:03:83 1 1500
ldm list-services
VSW
NAME LDOM MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
primary-vsw0 primary 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
cluster-vsw0 primary 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
cluster-vsw1 primary 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
ldm list-bindings primary
VSW
NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
primary-vsw0 00:14:4f:f9:25:5e e1000g0 0 switch@0 1 1 1500 on
PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
vnet1@gitserver 00:14:4f:f8:c0:5f 1 1500
vnet1@racc2 00:14:4f:f8:2e:37 1 1500
vnet1@test1 00:14:4f:f9:61:63 1 1500
vnet1@test2 00:14:4f:f9:a1:68 1 1500
NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
cluster-vsw0 00:14:4f:fb:db:cb 1 switch@1 1 1 1500 sc on
PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
vnet2@test1 00:14:4f:f8:87:27 1 1500
vnet2@test2 00:14:4f:f9:3e:3d 1 1500
NAME MAC NET-DEV ID DEVICE LINKPROP DEFAULT-VLAN-ID PVID VID MTU MODE INTER-VNET-LINK
cluster-vsw1 00:14:4f:fa:c1:58 2 switch@2 1 1 1500 sc on
PEER MAC PVID VID MTU LINKPROP INTERVNETLINK
vnet3@test1 00:14:4f:f8:f0:db 1 1500
vnet3@test2 00:14:4f:fb:03:83 1 1500
Any Idea Team, i beleive the cluster interconnect adapters were not successfull.
I need any guidance/any clue, how to correct the private interconnect for clustering in two guest LDOMS.You dont have to stick to default IP's or subnet . You can change to whatever IP's you need. Whatever subnet mask you need. Even change the private names.
You can do all this during install or even after install.
Read the cluster install doc at docs.sun.com -
Didadm: unable to determine hostname. error on Sun cluster 4.0 - Solaris11
Trying to install Sun Cluster 4.0 on Sun Solaris 11 (x86-64).
iscs sharedi Quorum Disk are available in /dev/rdsk/ .. ran
devfsadm
cldevice populate
But don't see DID devices getting populated in /dev/did.
Also when scdidadm -L is issued getting the following error. Has any seen the same error ??
- didadm: unable to determine hostname.
Found in cluster 3.2 there was a Bug 6380956: didadm should exit with error message if it cannot determine the hostname
The sun cluster command didadm, didadm -l in particular, requires the hostname to function correctly. It uses the standard C library function gethostname to achieve this.
Early in the cluster boot, prior to the service svc:/system/identity:node coming online, gethostname() returns an empty string. This breaks didadm.
Can anyone point me in the right direction to get past this issue with shared quorum disk DID.Let's step back a bit. First, what hardware are you installing on? Is it a supported platform or is it some guest VM? (That might contribute to the problems).
Next, after you installed Solaris 11, did the system boot cleanly and all the services come up? (svcs -x). If it did boot cleanly, what did 'uname -n' return? Do commands like 'getent hosts <your_hostname>' work? If there are problems here, Solaris Cluster won't be able to get round them.
If the Solaris install was clean, what were the results of the above host name commands after OSC was installed? Do the hostnames still resolve? If not, you need to look at why that is happening first.
Regards,
Tim
--- -
Sun Cluster 3.0 and VxVM 3.2 problems at boot
i 've a little problem with a two node cluster (2 x 480r + 2 x 3310 with a single raid ctl.)
Every 3310 has 3 (raid5) luns .
I've mirrored these 3 luns with VxVM, and i've mirror also the 2 internal (o.s.) disks.
One of the disk of the first 3310 is the quorum disk.
Every time i boot the nodes , i read an error at "block 0" of the quorum disk and then starts a fastidious synchronization of the mirrors. (sometimes also of the os mirror..)
Why does it happen?
Thanks.
Regards,
Mauro.We did another test today and again the resource group went into a STOP_FAILED state. On this occasion, the export for the corresponding ZFS pool timed-out. We were able to successfully bring the resource group online on the desired cluster node. Subsequent failovers worked fine. There's something strange happening when the zpool is being exported (eg error correction?). Once the zpool is exported, further imports of it seem to work fine.
When we first had the problem, we were able to manually export and import the zpools, though they did take quite some time to export/import.
"zpool list" shows we have a total of 7 zpools.
"zfs list" shows we have a total of 27 zfs file systems.
Is there any specific Sun or otherwise links to any problems with Sun Cluster and ZFS? -
Dear All,
Sooner we will upgrade the Sun Cluster 3.1. I am now working on a testing site.
What I am trying to setup is two server with SC 3.1 and simulate the migration procedure, however we don't have SAN in the testing site, so I stuck in the quorum configuration.
I was told SC 3.1 can be setup without any SAN but local disk, however I cannot locate any document related.
Could anyone please help with any tips? How can I setup the quorum device on NFS or even just local disk?
Thanks and Regards,
Donald
Edited by: Foo Donald on 2011/7/14 上午 1:07Hi Nik,
I have setup a Sun Cluster 3.2 Quorum Server in a third system which is listening on port 9000.
Please correct me if wrong, it seems like Sun Cluster 3.1 command scconf cannot see the quorum server, it cannot specify the IP nor port.
The testing site is on Solaris 8 + Sun Cluster 3.1, it will be upgrade to Solaris 10 + Sun Cluster 3.2 by Live Upgrade.
Thanks and Regards,
Donald
Maybe you are looking for
-
Hi all, I need help with my Nokia 3555. The display is broken, and I want to use a transfer device to get the pictures from the phones memory. To do this, the transfer device wants me to put the phone in USB mode, but with a broken screen it is ve
-
Got help fixing my youtube problem, now that is working. In the process of making the fix, lost my aol set up. Re installed it, but cannot access the icon. How do i get my AOL back?
-
The error message is: unreported exception son; must be caught or declared to be thrown The code: class father extends Exception{ public String toString() return "father"; class son extends father public String toString()
-
Homogeneous system copy ABAP with Oracle
Hi! I'm doing a homog. sys. copy with a CRM 7.0, unicode, on Oracle 10.2 and Windows 2008 64-bit. I'm doing this copy to the same server as the system was one previously, because the system was not installed correctly. I have uninstalled the instance
-
Placing text component on an image by drag and drop
Hi evrybody... I need to place annotation on the images. As part of that i need to drag text component on that image. I have no idea of doing this... If any body can help ..it is great.... Thanks in advance...