OUI failed to Select Cluster node on AIX- 9i RAC with HACMP

Dear
It seems HACMP cluster is working fine.
rlogin, rcp are also working.
But oracle installer failed to pop up the "Cluster Node Selection" window.
[p650_cdr1][root]/> lssrc -a | grep -E "ES|svcs"
clcomdES clcomdES 35888 active
topsvcs topsvcs 34634 active
grpsvcs grpsvcs 23258 active
emsvcs emsvcs 26588 active
emaixos emsvcs 25356 active
clstrmgrES cluster 26090 active
clsmuxpdES cluster 37556 active
clinfoES cluster 26118 active
grpglsm grpsvcs inoperative
[p650_cdr1][root]/> lssrc -g cluster
Subsystem Group PID Status
clstrmgrES cluster 26090 active
clsmuxpdES cluster 37556 active
clinfoES cluster 26118 active
[p650_cdr1][root]/>
Would you please let me know what to do now?
Oracle : 9.2.0.1
$ oslevel -r
5200-04
$
HACMP :5.2
Regards
Faruque
Message was edited by: Faruque
fahmed

The problem is resolved. A patch(IY73937) for cluster was required.
Thanks and regards
Faruque

Similar Messages

CRS oui failed to find the node

Hi every friend,
I am trying to install the CRS on 2 solaris 9 (VERITAS CFS). But when the oui came to the page "Specify Cluster Configuration" page, The OUI can not find any node and also the ADD, EDIT , REMOVE three buttons unpressable with grey color. From the help I know that when OUI detect the third-party vendor clusterware, It will use the node infomation obtained from third-party vendor clusterware and these infomation is not able to be edited by OUI. But it seems that it can not obtain anynode infomation. I had ping the PUB/PRIV IP of both node and they are accessible. How can I make the OUI to find the node infomation?
Thank you very much!

Hi,

Yes, it is. 
Environment >> Servers >> jmsserver2 >> Configuration >> General > Machine : sv-jms2

regards
 Lukas 
 WLS 9.2

Cluster nodes receiving by logical, sending with physical IP

Hi. I'm a cluster beginner and using Cluster 2.2 with two nodes, I'm having problems with logical IP.
Both nodes have got their physical and logical IPs configured in the same subnet (e.g., x.y.z.1, .2, and .3).
Primary node is reached fine via logical IP, but when it sends frames back it uses its physical IP as IPsource.
Is there anything I'm doing wrong? Does it happen the same with cluster 3.0?
Thanks in advance.

Your quite right this is a known problem, but it is actually a problem on all operating systems not just Solaris. We have been approached to solve it and we have, see following for the details:
http://www.DefaultRouter.com/
Although the WP describes an environment slightly different from how imagine your is you can see that the technology does solve the problem.

OSB 10.2.0.2 Implementation on AIX 5.2 with HACMP - SSL Trust Issues??

Hello All
I think I'm on a bit of a long shot with this one unfortunately, but I am trying to implement an OSB solution on a production HACMP cluster. The configuration would look as follows:
OSB Admin & Media Host : Windows 2003 x86 (Host: FPTXOSB01)
OSB Clients : Server 'pserver1' is node 1 in an HACMP cluster, public IP address 192.168.14.6
: Server 'pserver2' is node 2 in the same HACMP cluster, pubic IP address 192.168.14.10
: Server 'ptest1' is a stand alone AIX 5.2 host)
OSB Version : 10.2.0.2.0
I have implemented the solution on the stand alone host 'ptest1' without any problems, and performed a full database RMAN backup on this test servr at the first time of asking. The problem I am running into is with adding the HACMP clients to the OSB admin domain.
HACMP is configured in such a way (rightly or wrongly I do not know as yet) with boot, public and cluster service addresses. Eg. Server 'pserver1' will return 'pserver1' if you enter the 'hostname' command at the AIX command prompt. Entering the 'uname -a' command also returns 'pserver1' as the machine host name. However, in the folder '/usr/local/oracle/backup/bin there is a link to a binary called 'hostinfo' and this is called by the installob routine during the installation phase. When I run this command manually, it returns the HACMP host boot address 'pserver1_boot'. The /etc/hosts file looks like this on one of the nodes:
# Internet Address Hostname # Comments
# 192.9.200.1 net0sample # ethernet name/address
# 128.100.0.1 token0sample # token ring name/address
# 10.2.0.2 x25sample # x.25 name/address
127.0.0.1 loopback localhost
10.10.10.86 pserver1_boot1 pserver1
10.10.10.87 pserver2_boot1 pserver2
10.11.10.86 pserver1_boot2
10.11.10.87 pserver2_boot2
10.12.10.86 pserver1_hb
10.12.10.87 pserver2_hb
192.168.14.5 pserver_svc
192.168.14.6 pserver1_pers
192.168.14.10 pserver2_pers
As you can see, the main host name is tagged on the same line as the boot1 IP addresses. Unfortunately, the 10.10.10.xx range is private and dedicated to the HACMP cluster configuration. So the situation is, all of the clients on the network access the cluster via the 'pserver_svc' virtual IP, which is fine. The Oracle databases listen on that VIP too, no problems. For telnet/SSH access to the host, we log on via the '?_pers' addresses (persistent addresses), no problem. However, two hosts themselves see their own respective hosts as the '?boot1' name. So, on 'pserver1' if I were to ping 'pserver1' it resolves to the 10.10.10.86 IP. All good, however the OSB admin server is going to come in on the 192.168.14 public network.
When adding the host using either the 'mkhost' command or the web tool, the host creation just sits there and eventually times out. If I change the '/etc/hosts' file such that 'pserver1' as en entry sits on a line on its own and configured with the correct persistent address of 192.168.14.6 and then try adding the host in OSB, the host adds okay. However, if I then try and ping the host using OSB, it returns the following:
ob> pingh pserver1
Error: can't connect to NDMP server on pserver1 (address 192.168.14.6) - timeout waiting for connection status message
pserver1 (address 192.168.14.6): Oracle Secure Backup services are available
Additionally, we have to switch the '/etc/hosts' configuration back because the HACMP cluster services expect that configuration and it will fail over if it performs a cluster host state check.
With this in mind, we've introduced cabling on to another unused NIC port on the two hosts, and put these NICs on the network on 192.168.13.110 and 111. I have retried adding the hosts with the machines actual host name, with the boot address (pserver1_boot1) and also with a new alias for the new NICs of 'pserver1_en1'. In most of these cases, adding the host actually comes back with a success status. However, the OSB ping consistently fails.
I believe that the mismatch in host names on each of the cluster hosts is causing the OSB trust relationships to break down as the certificates will be created with the non routable host/IP combination. The following is an extract of the 'observiced.log' from 'pserver2' following the host addition specifying the '192.168.13 .xxx' network:
2009/01/07.14:33:53 listening for requests on --
2009/01/07.14:33:53 en0 (10.10.10.87) port 400
2009/01/07.14:33:53 en2 (10.11.10.87) port 400
2009/01/07.14:33:53 en1 (192.168.13.111) port 400
2009/01/07.14:34:01 listening for NDMP connections on --
2009/01/07.14:34:01 en0 (10.10.10.87) port 10000
2009/01/07.14:34:01 en2 (10.11.10.87) port 10000
2009/01/07.14:34:01 en1 (192.168.13.111) port 10000
2009/01/07.14:38:54 failure to negotiate SSL connection with component obtool on fd 6 - SSL fatal alert during negotation (FSP Oracle network security functions)
I am clearly looking for help from anyone else who has had the unfortunate experience of implementing OSB in an HACMP environment. Speaking to people who work with HACMP tell me that the configuration is perfectly normal. To me, its odd that machine called one thing should return another value when it looks up itself, one that isn't routable.
If anyone can suggest anything that might help, additional tracing, manually creating SSL certificates to work around the host name, disabling SSL, anything that might allow two way communications on ports 400 and 10000 using the OSB tools.
Any helps here would be much appreciated.
Regards
Simon

I already have.
Thanks,

RAC on AIX 6.1 with HACMP

Hi
Can I have installation procedure and documents to Install 10gR2 RAC on AIX 6.1 using HACMP(for cluster control) instead of Oracle Clusterware. Is that possible ? anyone tried it before and share the experience.
Regards,
Akthar

Do you have any particular reason for using HACMP cluster control instead of CRS?
As you are going to install 10GR2 RAC So there is no need to use HACMP because 10G CRS can be installed without OS clusterware.
However CRS can be installed along with HACMP but oracle services should be controlled with CRS inly.
Regards
Rajesh

Oracle 11g upgrade in AIX 6.1 with HACMP

Hi Friends,
As i have two Power Servers running in AIX 6.1 with oracle 10g in HACMP in which SAP application is running.
The one is standalone Database and other is central Instance.
I have done the 11g upgrades successfully in my DEV and QAS servers which are non-cluster Environment.
Now i want to do the same upgrade in PRD which is in HACMP.
Please let me know what are the areas should i concentrate specially for cluster environment servers.
Thanks,
Hari

DB Filesystems
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd4 4.00 2.62 35% 15438 3% /
/dev/hd2 8.00 5.03 38% 57744 5% /usr
/dev/hd9var 4.00 2.85 29% 10914 2% /var
/dev/hd3 4.00 3.50 13% 2575 1% /tmp
/dev/fwdump 1.00 1.00 1% 13 1% /var/adm/ras/platform
/dev/hd1 1.00 1.00 1% 6 1% /home
/dev/hd11admin 0.25 0.25 1% 107 1% /admin
/proc - - - - - /proc
/dev/hd10opt 1.00 0.58 43% 9040 7% /opt
/dev/livedump 0.25 0.25 1% 7 1% /var/adm/ras/livedump
/dev/lv_oracle 2.00 1.86 8% 21 1% /oracle
/dev/lv_ora_pip 2.00 2.00 1% 80 1% /oracle/PIP
/dev/lv_usr_sap 2.00 1.92 5% 78 1% /usr/sap
/dev/lv_sapmnt 2.00 0.62 70% 978 1% /sapmnt
/dev/dumplv 95.00 32.80 66% 26790 1% /dump
/dev/saparchlv 2.00 1.99 1% 57 1% /home/pipadm
/dev/lv_pip_64 10.00 5.73 43% 18988 2% /oracle/PIP/102_64
/dev/lv_mirlogA 1.00 0.61 40% 6 1% /oracle/PIP/mirrlogA
/dev/lv_mirlogB 1.00 0.61 40% 6 1% /oracle/PIP/mirrlogB
/dev/lv_oraarch 200.00 121.48 40% 433 1% /oracle/PIP/oraarch
/dev/lv_oralogA 1.00 0.59 41% 8 1% /oracle/PIP/origlogA
/dev/lv_oralogB 1.00 0.59 41% 8 1% /oracle/PIP/origlogB
/dev/fslv01 2.00 1.97 2% 102 1% /oracle/PIP/saparch
/dev/lv_sapbkp 5.00 5.00 1% 40 1% /oracle/PIP/sapbackup
/dev/lv_sapchk 5.00 5.00 1% 80 1% /oracle/PIP/sapcheck
/dev/lv_data1 200.00 86.26 57% 30 1% /oracle/PIP/sapdata1
/dev/lv_data2 200.00 84.92 58% 26 1% /oracle/PIP/sapdata2
/dev/lv_data3 200.00 84.92 58% 26 1% /oracle/PIP/sapdata3
/dev/lv_data4 200.00 84.92 58% 26 1% /oracle/PIP/sapdata4
/dev/lv_data5 200.00 84.92 58% 26 1% /oracle/PIP/sapdata5
/dev/lv_data6 200.00 84.92 58% 26 1% /oracle/PIP/sapdata6
/dev/lv_data7 200.00 84.92 58% 26 1% /oracle/PIP/sapdata7
/dev/lv_data8 200.00 84.93 58% 26 1% /oracle/PIP/sapdata8
/dev/lv_saporg 20.00 20.00 1% 7 1% /oracle/PIP/sapreorg
/dev/saptrance 5.00 4.92 2% 588 1% /oracle/PIP/saptrace
/dev/lv_inventry 2.00 1.99 1% 55 1% /oracle/oraInventory
/dev/lv_102_64 10.00 5.05 50% 11044 1% /oracle/stage/102_64
CI
/dev/hd4 4.00 1.99 51% 14429 3% /
/dev/hd2 8.00 5.01 38% 57680 5% /usr
/dev/hd9var 4.00 3.38 16% 10936 2% /var
/dev/hd3 4.00 3.82 5% 1362 1% /tmp
/dev/fwdump 1.00 1.00 1% 18 1% /var/adm/ras/platform
/dev/hd1 1.00 1.00 1% 55 1% /home
/dev/hd11admin 0.25 0.25 1% 5 1% /admin
/proc - - - - - /proc
/dev/hd10opt 1.00 0.58 42% 9024 7% /opt
/dev/livedump 0.25 0.25 1% 8 1% /var/adm/ras/livedump
/dev/lv_oracle 2.00 2.00 1% 9 1% /oracle
/dev/lv_ora_pip 2.00 2.00 1% 52 1% /oracle/PIP
/dev/lv_usr_sap 10.00 10.00 1% 17 1% /usr/sap
/dev/lv_client 2.00 1.86 8% 16 1% /oracle/client
/dev/lv_smnt_pip 10.00 2.20 78% 114142 18% /sapmnt/PIP
/dev/lv_sap_pip 10.00 8.10 19% 1577 1% /usr/sap/PIP
/dev/lv_sap_cms 5.00 5.00 1% 8 1% /usr/sap/ccms
root@pagedb:/ $ su - orapip
pagedb:orapip 1> echo $ORACLE_HOME
/oracle/PIP/102_64
i have upgraded successfully in my DEV and QAS.
So can i go with the same procedure as i went with non-cluster Env.
Thanks

SQL LOG Backup failed in one Cluster Node

I have 02 node SQL fail over cluster, NOD01 and NODE 02. and configure SQL log backup job via SQL Logshipping
When the sql service is mounted to node 02 job backup will work without any issues, Once its connected to node 01 this will provide below issue
Executed as user: <domain>\administrator. The process could not be created for step 1 of job 0xAC90A0F3623AE44285089E9EF53B12C7 (reason: The system cannot find the file specified). The step failed.
could anyone have on fix for this
Thanx

SQL Server Agent on both nodes run under same domain account?
Are you sure that path location is correct?
Best Regards,Uri Dimant SQL Server MVP,
http://sqlblog.com/blogs/uri_dimant/
MS SQL optimization: MS SQL Development and Optimization
MS SQL Consulting:
Large scale of database and data cleansing
Remote DBA Services:
Improves MS SQL Database Performance
SQL Server Integration Services:
Business Intelligence

Grid installation: root.sh failed on the first node on Solaris cluster 4.1

Hi all,
I'm trying to install the Grid (11.2.0.3.0) on the 2 node-clusters (OSC 4.1).
When I run the root.sh on the first node, I got the out put as follow:
xha239080-root-5.11# root.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /Grid/CRShome
Enter the full pathname of the local bin directory: [/usr/local/bin]:
/usr/local/bin is read only. Continue without copy (y/n) or retry (r)? [y]:
Warning: /usr/local/bin is read only. No files will be copied.
Creating /var/opt/oracle/oratab file...
Entries will be added to the /var/opt/oracle/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /Grid/CRShome/crs/install/crsconfig_params
Creating trace directory
User ignored Prerequisites during installation
OLR initialization - successful
root wallet
root wallet cert
root cert export
peer wallet
profile reader wallet
pa wallet
peer wallet keys
pa wallet keys
peer cert request
pa cert request
peer cert
pa cert
peer root cert TP
profile reader root cert TP
pa root cert TP
peer pa cert TP
pa peer cert TP
profile reader pa cert TP
profile reader peer cert TP
peer user cert
pa user cert
Adding Clusterware entries to inittab
CRS-2672: Attempting to start 'ora.mdnsd' on 'xha239080'
CRS-2676: Start of 'ora.mdnsd' on 'xha239080' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xha239080'
CRS-2676: Start of 'ora.gpnpd' on 'xha239080' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xha239080'
CRS-2672: Attempting to start 'ora.gipcd' on 'xha239080'
CRS-2676: Start of 'ora.cssdmonitor' on 'xha239080' succeeded
CRS-2676: Start of 'ora.gipcd' on 'xha239080' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xha239080'
CRS-2672: Attempting to start 'ora.diskmon' on 'xha239080'
CRS-2676: Start of 'ora.diskmon' on 'xha239080' succeeded
CRS-2676: Start of 'ora.cssd' on 'xha239080' succeeded
ASM created and started successfully.
Disk Group DATA created successfully.
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4256: Updating the profile
Successful addition of voting disk 9cdb938773bc4f16bf332edac499fd06.
Successful addition of voting disk 842907db11f74f59bf65247138d6e8f5.
Successful addition of voting disk 748852d2a5c84f72bfcd50d60f65654d.
Successfully replaced voting disk group with +DATA.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
1. ONLINE 9cdb938773bc4f16bf332edac499fd06 (/dev/did/rdsk/d10s6) [DATA]
2. ONLINE 842907db11f74f59bf65247138d6e8f5 (/dev/did/rdsk/d8s6) [DATA]
3. ONLINE 748852d2a5c84f72bfcd50d60f65654d (/dev/did/rdsk/d9s6) [DATA]
Located 3 voting disk(s).
Start of resource "ora.cssd" failed
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xha239080'
CRS-2672: Attempting to start 'ora.gipcd' on 'xha239080'
CRS-2676: Start of 'ora.cssdmonitor' on 'xha239080' succeeded
CRS-2676: Start of 'ora.gipcd' on 'xha239080' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xha239080'
CRS-2672: Attempting to start 'ora.diskmon' on 'xha239080'
CRS-2676: Start of 'ora.diskmon' on 'xha239080' succeeded
CRS-2674: Start of 'ora.cssd' on 'xha239080' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'xha239080'
CRS-2681: Clean of 'ora.cssd' on 'xha239080' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'xha239080'
CRS-2677: Stop of 'ora.gipcd' on 'xha239080' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'xha239080'
CRS-2677: Stop of 'ora.cssdmonitor' on 'xha239080' succeeded
CRS-5804: Communication error with agent process
CRS-4000: Command Start failed, or completed with errors.
Failed to start Oracle Grid Infrastructure stack
Failed to start Cluster Synchorinisation Service in clustered mode at /Grid/CRShome/crs/install/crsconfig_lib.pm line 1211.
/Grid/CRShome/perl/bin/perl -I/Grid/CRShome/perl/lib -I/Grid/CRShome/crs/install /Grid/CRShome/crs/install/rootcrs.pl execution failed
xha239080-root-5.11# history
checking the ocssd.log, I see some thing as follow:
2013-09-16 18:46:24.238: [    CSSD][1]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (clustered) mode with uniqueness value 1379371584
2013-09-16 18:46:24.239: [    CSSD][1]clssscmain: Environment is production
2013-09-16 18:46:24.239: [    CSSD][1]clssscmain: Core file size limit extended
2013-09-16 18:46:24.248: [    CSSD][1]clssscmain: GIPCHA down 1
2013-09-16 18:46:24.249: [    CSSD][1]clssscGetParameterOLR: OLR fetch for parameter logsize (8) failed with rc 21
2013-09-16 18:46:24.250: [    CSSD][1]clssscExtendLimits: The current soft limit for file descriptors is 65536, hard limit is 65536
2013-09-16 18:46:24.250: [    CSSD][1]clssscExtendLimits: The current soft limit for locked memory is 4294967293, hard limit is 4294967293
2013-09-16 18:46:24.250: [    CSSD][1]clssscGetParameterOLR: OLR fetch for parameter priority (15) failed with rc 21
2013-09-16 18:46:24.250: [    CSSD][1]clssscSetPrivEnv: Setting priority to 4
2013-09-16 18:46:24.253: [    CSSD][1]clssscSetPrivEnv: unable to set priority to 4
2013-09-16 18:46:24.253: [    CSSD][1]SLOS: cat=-2, opn=scls_mem_lockdown, dep=11, loc=mlockall
unable to lock memory
2013-09-16 18:46:24.253: [    CSSD][1](:CSSSC00011:)clssscExit: A fatal error occurred during initialization
Do anyone have any idea what going on and how can I fix it ?

Hi,
solaris has several issues with DISM, e.g.:
Solaris 10 and Solaris 11 Shared Memory Locking May Fail (Doc ID 1590151.1)
Sounds like Solaris Cluster has a similar bug. A "workaround" is to reboot the (cluster) zone, that "fixes" the mlock error. This bug was introduced with updates in september, atleast to our environment (Solaris 11.1). Prior i did not have the issue and now i have to restart the entire zone, whenever i stop crs.
With 11.2.0.3 the root.sh script can be rerun without prior cleaning up, so you should be able to continue installation at that point after the reboot. After the root.sh completes some configuration assistants need to be run, to complete the installation. You need to execute this manually as you wipe your oui session
Kind Regards
Thomas

The Cluster Node Selection screen doesn't appear installing RAC 9i

The Cluster Node Selection screen doesn't appear installing RAC 9i with OCFS on Windows 2003
We are using patch #2878462 OUI 2.2.0.18.0
The Oracle Cluster check was successful !

more info ...
We are following the doc. id. #178882.1 "Step-By-Step Installationg of RAC with OCFS on Windows 2000" instructions.

Hyper-V Guest Cluster Node Failing Regularly

Hi,
We currently have a 4-node Server 2012 R2 Cluster witch hosts among other things, a 3 node Guest Cluster running a single clustered file service.
Around once a week, the guest cluster node that is currently hosting the clustered file service will fail. It's as if the VM is blue screening. That in itself is fairly anoying and I'll be doing all the updates and checking event log for clues
as to the cause.
The problem then is that whichever physical cluster node that is hosting the VM when it fails, will not unlock some of the VM's files. The Virtual machine configuration lists as Online Pending. This means that the failed VM cannot be restarted
on any other cluster node. The only fix is to drain the physical host it failed on, and reboot.
Looking for suggestions on how to fix the following.
1. Crashing guest file cluster node
2. Failed VM with shared VHDX requiring Phyiscal host reboot.
Event messages for the physical host that was hosting the failed vm in order that they occured.
Hyper-V-Worker: Event ID 18590 - 'FS-03' has encountered a fatal error. The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x9E, ErrorCode1: 0x6C2A17C0, ErrorCode2: 0x3C, ErrorCode3: 0xA, ErrorCode4:
0x0. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID 36166B47-D003-4E51-AFB5-7B967A3EFD2D)
FailoverClustering: Event ID 1069 - Cluster resource 'Virtual Machine FS-03' of type 'Virtual Machine' in clustered role 'FS-03' failed.
Hyper-V-High-Availability: Event ID 21128 - 'Virtual Machine FS-03' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
Hyper-V-High-Availability: Event ID 21110 - 'Virtual Machine FS-03' failed to terminate.
Hyper-V-VMMS: Event ID 20108 - The Virtual Machine Management Service failed to start the virtual machine '36166B47-D003-4E51-AFB5-7B967A3EFD2D': The group or resource is not in the correct state to perform the requested operation. (0x8007139F).
Hyper-V-High-Availability: Event ID 21107 - 'Virtual Machine FS-03' failed to start.
FailoverClustering: Event ID 1205 - The Cluster service failed to bring clustered role 'FS-03' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Hi,
I don’t found the similar issue, Does your cluster can pass the cluster validation? Does all your Hyper-V host compatible with Server 2012r2? Have you try to disable all your
AV soft and firewall? Please rerun Storage validation on the Cluster in non-production hours, the cluster validation report will quickly locate the issue.
More information:
Cluster
http://technet.microsoft.com/en-us/library/dd581778(v=ws.10).aspx
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

Cluster node fails after testing removing both interconnects in a two node

Hi,
cluster node panics and fails to join cluster after testing removing both interconnects in a two node cluster. cluster is up on one node , but the panic'ed node fails to rejoin cluster saying no sufficient quorum yet and both clinterconn failed (even after conencting the interconn). Quorum device used is a shared disk.
Is this a bug?
Any workaround or solution?
Cluster is 3.2 SPARC
Thanking you
Ushas Symon

Sounds like a networking problem to me. If the failed node genuinely can't communicate with the remaining node then it will not be allowed to join the cluster, hence the quorum message. I would suspect either:
* Misconnected cables
* A switch that has block or disabled the port
* A failed auto-negotiation
This is of course without knowing anything about what your network infrastructure actually is!
Tim
---

WDRuntimeException: Failed to create J2EE cluster node in SLD

Hello,
I am getting the below error, but to my knowledge I have everything set up properly. Let me briefly outline the logistics (I am running everything LOCALLY (will move to remote later)):
WAS 6.4 SP12
Set up JCo and tests fine
Set up Visual Administrator / SLD Data Supplier / HTTP and CIM configured and seem to test fine
Created SLD and it tests OK
Created Technical Landscape
I have noticed that in SP12, in the SLD config I actually have a NEW category called "System Landscape" above my "Technical Landscape" link. I have not seen this option in previous versions SP9 or SP11. Is it mandatory to configure this?
Also, I created a model for Adaptive RFC and found the function I needed successfully.
Anyway, here is the error when trying to deploy...
com.sap.tc.webdynpro.services.exceptions.WDRuntimeException: Error while obtaining JCO connection.
at com.sap.tc.webdynpro.services.datatypes.core.DataTypeBroker$1.fillSldConnection(DataTypeBroker.java:90)
Caused by: com.sap.tc.webdynpro.services.sal.sl.api.WDSystemLandscapeException: Error while obtaining JCO connection.
Caused by: com.sap.tc.webdynpro.services.exceptions.WDRuntimeException: Failed to create J2EE cluster node in SLD for 'J2E.SystemHome.bc347792': com.sap.lcr.api.cimclient.LcrException: CIM_ERR_NOT_FOUND: No such instance: SAP_J2EEEngineCluster.CreationClassName="SAP_J2EEEngineCluster",Name="J2E.SystemHome.bc347792"
Any help will be appreciated!

I figured it out for those that may have a similar problem.
Although I had created and tested my JCo's properly and they were working fine, somehow, and I still don't know why, they went RED in the JCo Maintainence screen.
I had to "create" again and it works fine now.

Ora Fail Safe - under cluster ctl on one node but gives FS:10413 on other

Hi,
Trying to set up a single Fail Safe (3.3.3) instance (9.2) on a set of Windows 2003 servers running MSCS.
After creating the standalone DB on the first node, I was able to get it verified and added to the cluster control (while the second server was down - had to this as the error I'm about to mention kept rolling back the changes on both nodes, otherwise). Now that the first node is configured and the FS listener service and FS DB service are created, I tried setting the second node to match the first one (viz. getting listener.ora, sqlnet.ora and even tnsnames.ora).
The "verify group" operation kept failing on creation of the Windows FS service for the DB and listener.
Found out that there was an issue with picking up the Oracle_Home when even trying to invoke sqlplus manually from the command line (though the registry entries seemed to be right and match the ones set on the first node). Eventually on adding the ORACLE_HOME variable to the list of env variables, it started picking up the OH correctly and the next "verify group" operation successfully added the WIndows FS service for the DB (OracleService{SID}) on the second node. (The first node didn't have to encounter these problems as during the standalone DB install (using DBCA), this service was created).
Next, started the Windows FS DB service (on the second node with the problem) and tried starting the DB manually (out of cluster controll) and it worked fine.
However when trying to "verify group" again, I keep getting the error FS-10413 (and the subsequent FS:10046, FS:10111 and FS:10890) while trying to add the Windows FS listener service (OracleORA_HOMETNSListenerFsl{oradb_virtual_name}).
Could really do with extra eyes (and heads) helping to get to the bottom of this. Thanks.
<<< The gist of the output from the "verify group" action is below >>>
13 18:29:18 > FS-10300: Verifying Oracle Net listener resource OracleORA_HOME1TNSListenerFsloradbl
14 18:29:19 Oracle Net listener Fsloradbl for the group is not found
FS-10413: Error found in Oracle Net on node PANTHER; the group listener needs to be taken offline. Do you want it to be fixed? > Yes
15 18:29:19 >> FS-10600: Oracle Net configuration file updated: C:\ORACLE\PRODUCT\9.2\NETWORK\ADMIN\LISTENER.ORA
16 18:29:20 ** ERROR : FS-10066: Failed to start Windows service OracleORA_HOME1TNSListenerFsloradbl for the Oracle Net listener
<<<< >>>>>>>
Further to typing the above, read the section 7.10.8.2 in "Oracle Fail Safe Concepts and Admin guide" (http://download-uk.oracle.com/docs/html/B12070_01/ofs_db.htm#sthref985) and checked the output log. Bizarrely, the output file has a few "msg not found" errors (akin to when it doesn't find the correct OH).
Wonder where this has to be set in the registry!?!
The TNS-01151 (missing listener name in listener.ora) error is also a bit odd as this is the new listener OFS is trying to create and add to the new listener.ora file. Isn't it!?!?!
<<<< Fsloradbl.out >>>>>
LSNRCTL for 32-bit Windows: Version 9.2.0.7.0 - Production on 07-FEB-2007 21:50:46
Copyright (c) 1991, 2002, Oracle Corporation. All rights reserved.
Starting tnslsnr: please wait...
TNSLSNR for 32-bit Windows: Version 9.2.0.7.0 - Production
Message 279 not found; No message file for product=NETWORK, facility=NL
TNS-01151: Message 1151 not found; No message file for product=NETWORK, facility=TNS [Fsloradbl]
Listener failed to start. See the error message(s) above...
<<<< >>>>
Thanks for your time (reading and hopefully responding to this) and any comments or suggestions.
Regards,
Ranjit

yes i did. i mean, the db starts corretly using the Enteprise Manager and listener.ora and tnsnames.ora are the ones created during db creation. they seem alright.
i'm trying to execute those commands from the node of the cluster where the db resides.
but... i have one big question: do i have to create the db instance on BOTH cluster nodes (both pointing to the same oradata in the shared disk) or is it enough to create my standalone db only on the nodeA and then let everything to the FailSafe management (adding the standalone db to the cluster group FailSafe creates all the services needed)???
thanks

Cluster node addition fails on cleanup

We have a 2 node cluster setup already
(2) HP BL460c G8 servers connected to a VNX5300 SAN (Nodes 1 & 2)
Server 2012 Datacenter installed
Quarum: Node + Disk
all failover tests went perfectly and all VMs are healthy
Verification on the cluster show some warnings but no failures
We have rebuilt a server (node 3) renamed it and have run a single machine verification test to see if it is suitable for clustering. it succeeded with minor warnings
We ran verification on all three machines and received the formentioned warnings but no game stoppers, however when trying to add the host to the cluster we get the following error in the logs:
WARN mscs::ListenerWorker::operator (): ERROR_TIMEOUT(1460)' because of '[FTI][Initiator] Aborting connection because NetFT route to node <machine name> on virtual IP fe80::cdf2:f6ea:5ce:5f9c:~3343~ has failed to come up.'
This happens after the node is added to the cluster but reports a failure on cleanup processes and reverts everything back. I have done all of this under my domain_admin account.
before and after the attempt to add the NetFT adapter is in media disconnect, during the attempts it does pull down a 169 address as it is supposed to
Node 3 Networking breakdown
The new host uses an Intel/HP NC365T Quard port adaptor
port 1: Mgmt : Static assignment subnet 1
port 2: VM net: Static assignment sibmet 2
port 3: Heartbeat: assigned via DHCP subnet 1 pool (we have attempted the above with this disabled as well)
NCU is not installed for the adapter and bridging in server 2012 is not enabled.
I am at a loss, and would appreciate any additional help as i have spent 3 days researching this to try and find the cause.

Hi,
The error message mentioned an IPv6 address, have you enable IPv6 network for the cluster?
Check the IPv6 network configuration in the 3rd node server, what’s the status, enabled or disabled?
When two or more cluster nodes are running IPv6 for heartbeat communications, they will require any additional nodes that join to also running IPv6. If the node server has IPv6 disabled, it will fail to join.
Also whether these cluster node server has antivirus software installed, you may temporarily disable it and rejoin the new node.
Check that and give us feedback for further troubleshooting, for more information please refer to following MS articles:
Failover Cluster Creation Issue
http://social.technet.microsoft.com/Forums/en-US/winserverClustering/thread/1ed1936d-6283-46cc-951d-9c236329b8be
Failure to re-add rebuilt cluster node to Windows 2008 R2 Cluster: System error 1460 has occurred (0x000005b4). Timeout.
http://social.technet.microsoft.com/Forums/en-US/winserverClustering/thread/a21e9a8e-9f68-4d83-a747-204000cda65a
Hope this helps!
TechNet Subscriber Support
If you are
TechNet Subscription
user and have any feedback on our support quality, please send your feedback
here.
Lawrence
TechNet Community Support

Microsoft Cluster node service failing automatically

Hello Expert,
We have Net weaver 7.0 EHP 2 installed on Windows 2008 R2 for EP. It is installed on cluster environment.
We have 2 cluster node Host A and Host B. Also we have 2 services one is for database and another is for SCS. During the failover these 2 services will move to another node.
My problem is SCS cluster service is getting offline automatically which is making my entire EP production server down. As it gets down i manually start cluster service first then app server and my EP system gets start.
Please suggest how can i find the root cause for getting SCS service offline or How we can make it always online?
Regards,

HI Sunil,
I checked dev_ms.old file and below is log:
trc file: "dev_ms", trc level: 1, release: "720"
[Thr 7224] Fri Mar 21 14:05:02 2014
[Thr 7224] ms/http_max_clients = 500 -> 500
[Thr 7224] MsSSetTrcLog: trc logging active, max size = 52428800 bytes
systemid 562 (PC with Windows NT)
relno 7200
patchlevel 0
patchno 101
intno 20020600
make multithreaded, Unicode, 64 bit, optimized
pid 9488
[Thr 7224] ***LOG Q01=> MsSInit, MSStart (Msg Server 1 9488) [msxxserv.c 2274]
[Thr 7224] Fri Mar 21 14:05:03 2014
[Thr 7224] load acl file = \\EP1SAPGRP\sapmnt\EP1\SYS\global\ms_acl_info.DAT
[Thr 7224] MsGetOwnIpAddr: my host addresses are :
[Thr 7224] 1 : [IP] HOST (HOSTNAME)
[Thr 7224] 2 : [127.0.0.1] FQDN (LOCALHOST)
[Thr 7224] 3 : [IP] FQDN (NILIST)
[Thr 7224] 4 : [IP] EPCLUSTER (NILIST)
[Thr 7224] 5 : [IP] EP1SAPGRP (NILIST)
[Thr 7224] 6 : [IP] EP1ORAGRP (NILIST)
[Thr 7224] 7 : [IP] FQDN (NILIST)
[Thr 7224] 8 : [IP] FQDN (NILIST)
[Thr 7224] MsHttpInit: full qualified hostname = NODE A
[Thr 7224] HTTP logging is switch off
[Thr 7224] set HTTP state to LISTEN
[Thr 7224] *** HTTP port 8110 state LISTEN ***
[Thr 7224] *** I listen to internal port 3910 (3910) ***
[Thr 7224] *** HTTP port 8110 state LISTEN ***
[Thr 7224] CUSTOMER KEY: ><
[Thr 7224] build version=720.2011.05.04
[Thr 7224] MsJ2EE_CheckLoggedInNode: logged in list is not initialized -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [114836600] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [114836600] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [128683700] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [128683700] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [128683700] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [128683751] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [128683751] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [128683751] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [139051900] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [139051900] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [139051900] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [114836650] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [114836650] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [114836650] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [139051951] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [139051951] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [139051951] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [139051950] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [139051950] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [139051950] into logged in list
[Thr 7224] MsJ2EE_CheckLoggedInNode: node [128683750] isn't in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_CheckDisconnectedNode: node [128683750] is not in the logged in list -> reconnect ok
[Thr 7224] MsJ2EE_AddLoggedInNode: add node [128683750] into logged in list

OUI failed to Select Cluster node on AIX- 9i RAC with HACMP

Similar Messages

Maybe you are looking for