Cluster Node Crashes
I'm not sure this is the proper forum for this post, if it's not please feel free to move it.
The situation I'm facing is this:
My company has clusters setup across North America with our software that utilizes the Oracle database. 90% of the time everything functions exactly as it is supposed to. However, it is the other 10% of sites that I am here to ask about.
Our clusters are setup in a dual-server environment that basically act as a single server. The application runs on one server and the database runs on another, and in the case of problems, either can be failed over to run both sets of services on a single server (basic, I realize). At certain sites we are unable to run services on one of the nodes. When they are run as they are supposed to, every so often (at some sites a matter of minutes/hours, at others it can be a couple weeks) they will BSOD.
I fully understand what the blue screen is. The minidump shows that it's the orafencedrv.sys stop, where the Oracle database shuts down a node after loss of communications in order to prevent corruption of the database. This is a great feature and I'm grateful for it, however it has caused us many headaches in diagnosing what it actually causing the drop in communications.
The interconnect and the public IP are both hooked up over a single switch but they operate on different subnets. Could operating on a single switch be part of the problem?
Could the problem be that the switches are being overloaded with traffic causing temporary packet losses between the two nodes, which I know is enough to have Oracle BSOD a node?
Below I'm posting one of the dumps listed in the CSSD log when the node crashes, hopefully this will provide some sort of information as to what is happening.
If any other information is needed, please feel free to let me know. Thanks for your help in advance.
[ CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: clssnmvDiskKillCheck: Aborting, evicted by node 1, sync 13, stamp 99832890,
[ CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: ###################################
[ CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: clssscExit: CSSD aborting
[ CSSD]2008-10-29 13:30:06.211 [2732] >ERROR: ###################################
[ CSSD]--- DUMP GROCK STATE DB ---
[ CSSD]----------
[ CSSD] type 2, Id 3, Name = (crs_version)
[ CSSD] flags: 0x0
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 0
[ CSSD] . . . . .
[ CSSD] memberNo =0, seq 5
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 6
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD] . . . . .
[ CSSD] memberNo =1, seq 11
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 1, nodeBirth 12
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 2, Id 2, Name = (ocr_STLRZOPRCL)
[ CSSD] flags: 0x0
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 2
[ CSSD] . . . . .
[ CSSD] memberNo =2, seq 5
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 6
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 32
[ CSSD] . . . . .
[ CSSD] memberNo =1, seq 11
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 1, nodeBirth 12
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 32
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 3, Id 15, Name = (_ORA_CRS_MEMBER_stlrzoprcl1)
[ CSSD] flags: 0x0
[ CSSD] grant: count=1, type 3, wait 1
[ CSSD] Member Count =1, master -3
[ CSSD] . . . . .
[ CSSD] memberNo =0, seq 0
[ CSSD] flags = 0x12, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 1, nodeBirth 12
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 3, Id 15, Name = (_ORA_CRS_MEMBER_stlrzoprcl2)
[ CSSD] flags: 0x0
[ CSSD] grant: count=1, type 3, wait 1
[ CSSD] Member Count =1, master -3
[ CSSD] . . . . .
[ CSSD] memberNo =0, seq 0
[ CSSD] flags = 0x12, granted 1
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 6
[ CSSD] privateDataSize = 0
[ CSSD] publicDataSize = 0
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 2, Id 4, Name = (CRSDMAIN)
[ CSSD] flags: 0x0
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 2
[ CSSD] . . . . .
[ CSSD] memberNo =2, seq 5
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 6
[ CSSD] privateDataSize = 128
[ CSSD] publicDataSize = 128
[ CSSD] . . . . .
[ CSSD] memberNo =1, seq 11
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 1, nodeBirth 12
[ CSSD] privateDataSize = 128
[ CSSD] publicDataSize = 128
[ CSSD]----------
[ CSSD]----------
[ CSSD] type 2, Id 1, Name = (EVMDMAIN)
[ CSSD] flags: 0x0
[ CSSD] grant: count=0, type 0, wait 0
[ CSSD] Member Count =2, master 2
[ CSSD] . . . . .
[ CSSD] memberNo =2, seq 5
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 2, nodeBirth 6
[ CSSD] privateDataSize = 508
[ CSSD] publicDataSize = 504
[ CSSD] . . . . .
[ CSSD] memberNo =1, seq 11
[ CSSD] flags = 0x0, granted 0
[ CSSD] refCnt = 1
[ CSSD] nodeNum = 1, nodeBirth 12
[ CSSD] privateDataSize = 508
[ CSSD] publicDataSize = 504
[ CSSD]----------
[ CSSD]--- END OF GROCK STATE DUMP ---
[ CSSD]------- End Dump -------
Hi user10508733
Seems to be your first post, welcome to this forum!!
What is the OS (blue screen that should be windows? ) and what is the release of your CRS and RDBMS ? hopefully not 10.1x.x.x, if yes please patch it to 10.2.0.4.
Seems to have a lot of bugs about CRS before 10.2.0.3 see that list
Doc ID: Note:391116.1
Subject: 10.2.0.3 Patch Set - List of Bug Fixes by Problem Type
let us know what's the result
thanks
Similar Messages
-
Hi all, we have a 2-node cluster running Solaris 10 11/06 and Sun Cluster 3.2.
Recently, we were asked to nfs mount on node 1 of the cluster, a directory from an external Linux host (ie node 1 of the cluster is the nfs client; the linux server is the nfs server).
A few days later, early on a Sunday morning, the linux server developed a high load and was very slow to log into. Around the same time, node 1 of the cluster rebooted. Was this reboot of node 1 a coincidence? I'm not sure.
Anyone got ideas/suggestions about this situation (eg the slow response of the nfs linux server caused node 1 of the cluster to reboot; the external nfs mount is a bad idea)?
StewartHi,
your assumption sounds very unreasonable. But without any hard facts like
- the panic string
- contents of /var/adm/messages at time of crash
- configuration information
- etc.
it is impossible to tell.
Regards
Hartmut -
SCVMM losing connection to cluster nodes
Hey guys'n girls, I hope this is the right forum for this question. I already opened a ticket at MS support as well because it's impacting our production environment indirectly, but even after a week there's been no contact. Losing faith in MS support there
The problem we're having is that scvmm is that a host enters the 'needs attention' state, with a winrm error 0x80338126. I guess it has something to do with the network or with Kerberos, and I've found some info on it, but I still haven't been able to solve
it. Do you guys have any ideas?
Problem summary:
We are seeing an issue on our new hyper-v platform. The platform should have been in production last week, but this issue is delaying our project as we can't seem to get it stable.
The problem we are experiencing is that SCVMM loses the connection to some of the Hyper-V nodes. Not one
specific node. Last week it happened to two nodes, and today it happened to another node. I see issues with WinRM, and I expect something to do with kerberos. See the bottom of this post for background details and software versions.
The host gets the status 'needs attention', and if you look at the status of the machine, WinRM gives an error. The error is:
Error (2916)
VMM is unable to complete the request. The connection to the agent cc1-hyp-10.domaincloud1.local was lost.
WinRM: URL: [http://cc1-hyp-10.domaincloud1.local:5985], Verb: [ENUMERATE], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_Service], Filter: [select * from Win32_Service where Name="WinRM"]
Unknown error (0x80338126)
Recommended Action
Ensure that the Windows Remote Management (WinRM) service and the VMM agent are installed and running and that a firewall is not blocking HTTP/HTTPS traffic. Ensure that VMM server is able to communicate with cc1-hyp-10.domaincloud1.local over WinRM by successfully
running the following command:
winrm id –r:cc1-hyp-10.domaincloud1.local
This
problem can also be caused by a Windows Management Instrumentation (WMI) service crash. If the server is running Windows Server 2008 R2, ensure that KB 982293 (http://support.microsoft.com/kb/982293)
is installed on it.
If the error persists, restart cc1-hyp-10.domaincloud1.local and then try the operation again. /nRefer to
http://support.microsoft.com/kb/2742275 for more details.
Doing a simple test from the VMM server to the problematic cluster node shows this error:
PS C:\> hostname
CC1-VMM-01
PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
WSManFault
Message = WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this
computer. By default, the WinRM firewall exception for public profiles limits access to remote computers within the same local subnet.
Error number: -2144108250 0x80338126
WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this computer. By default, the WinRM
firewall exception for public profiles limits access to remote computers within the same local subnet.
I CAN connect from other hosts to this problematic cluster node:
PS C:\> hostname
CC1-HYP-16
PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
IdentifyResponse
ProtocolVersion =
http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
ProductVendor = Microsoft Corporation
ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
SecurityProfiles
SecurityProfileName =
http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
And I can connect from the vmm server to all other cluster nodes:
PS C:\> hostname
CC1-VMM-01
PS C:\> winrm id -r:cc1-hyp-11.domaincloud1.local
IdentifyResponse
ProtocolVersion =
http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
ProductVendor = Microsoft Corporation
ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
SecurityProfiles
SecurityProfileName =
http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
So at this point only the test from the cc1-vmm-01 to cc1-hyp-10 seems to be problematic.
I followed the steps in the page
https://support.microsoft.com/kb/2742275 (which is referred to above). I tried the VMMCA, but it can't really get it working the way I want, or it seems to give outdated recommendations.
I tried checking for duplicate SPN's by running setspn -x on affected machines. No results (although I do not understand
what an SPN is or how it works). I rebuilt the performance counters.
It tried setting 'sc config winrm type= own' as described in [http://blinditandnetworkadmin.blogspot.nl/2012/08/kb-how-to-troubleshoot-needs-attention.html].
If I reboot this cc1-hyp-10 machine, it will start working perfectly again. However, then I can't troubleshoot the issue, and it will happen again.
I want this problem to be solved, so vmm never loses connection to the hypervisors it's managing again!
Background information:
We've set up a platform with Hyper-V to run a VM workload. The platform consists of the following hardware:
2 Dell R620's with 32GB of RAM, running hyper-v to virtualize the cloud management layer (DC's, VMM, SQL). These machines are called cc1-hyp-01 and cc1-hyp-02. They run the management vm's like cc1-dc-01/02, cc1-sql-01, cc1-vmm-01, etc. The names are self-explanatory.
The VMM machine is NOT clustered.
8 Dell M620 blades with 320GB of RAM, running hyper-v to virtualize the customer workload. The machines are
called cc1-hyp-10 until cc1-hyp-17. They are in a cluster.
2 Equallogic units form a SAN (premium storage), and we have a Dell R515 running iscsi target (budget storage).
We have Dell Force10 switches and Cisco C3750X switches to connect everything together (mostly 10GB links).
All hosts run Windows Server 2012R2 Datacenter edition. The VMM server runs System Center Virtual Machine Manage 2012 R2.
All the latest Windows updates are installed on every host. There are no firewalls between any host (vmm and hypervisors) at this level. Windows firewalls are all disabled. No antivirus software is installed, no symantec software is installed.
The only non-standard software that is installed is the Dell Host Integration Tools 4.7.1, Dell Openmanage Server Administrator, and some small stuff like 7-zip, bginfo, net-snap, etc.
The SCVMM service is running under the domain account DOMAINCLOUD1\scvmm. This machine is in the local administrators group of each cluster node.
On top of this cloud layer we're running the tenant layer with a lot of vm's for a specific customer (although they are all off now).I think I found the culprit, after an hour of analyzing wireshark dumps I found the vmm had jumbo frames enabled on the management interface to the hosts (and the underlying infrastructure does not).. Now my winrm commands started working again.
-
Services not starting after a node crash
hi
We have a 3 node cluster and one of the nodes crashed today, also the services did not get relocated to the other node and when we try to manullay stop/start/relocate the service we get the following error
srvctl stop service -d BCB -s BCB_J2EE -f
PRCD-1085 : Failed to stop service BCB_J2EE
PRCR-1065 : Failed to stop resource ora.BCB.BCB_j2ee.svc
CRS-2533: Server 'bcb528' is down. Unable to perform the operation on 'ora.BCB.BCB_j2ee.svc'
Would anyone has seen this before
Thx
JJthis is what i can find in log
[ CRSPE][60] Server [bcb528] is unreachable. Stopping the sequencer for: bcbCRON 1 1
2011-02-28 08:15:21.778: [ CRSPE][60] Sequencer for [bcbCRON 1 1] has completed with error: CRS-2533: Server 'bcb528' is down. Unable to pe
rform the operation on 'bcbCRON'
2011-02-28 08:15:21.778: [ CRSPE][60] Required instruction failed in op: START of [bcbCRON 1 1] on [bcb529] : 105247290
2011-02-28 08:15:21.781: [UiServer][62] Container [ Name: ORDER
MESSAGE:
TextMessage[CRS-2533: Server 'bcb528' is down. Unable to perform the operation on 'bcbCRON']
MSGTYPE:
TextMessage[1]
OBJID:
TextMessage[bcbCRON 1 1]
WAIT:
TextMessage[0] -
Hyper-V Failover Cluster Node Corruption
Dear All,
Some of my nodes are showing abnormal behavior. They are restarting every now and then. I had updated the cluster nodes, but all updates were OS specific, there was nothing specific
with respect to hardware update.
I have analyzed crash dumps and find out that following is causing the crash:
page_fault_in_nonpaged_area
anyone has any idea about this?
Thanks in advance.Hi ,
What is the OS of the cluster node ?
Did you try to remove the protection client for troubleshooing ?
If it is a 2008R2 cluster , please refer to this thread :
http://social.technet.microsoft.com/Forums/en-US/32ab6a85-6002-4c3c-97ea-27cb1091e9b3/windows-cluster-server-is-getting-restarted?forum=winservergen
Hope it helps
Best Regards
Elton Ji
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place. -
Hyper-V Guest Cluster Node Failing Regularly
Hi,
We currently have a 4-node Server 2012 R2 Cluster witch hosts among other things, a 3 node Guest Cluster running a single clustered file service.
Around once a week, the guest cluster node that is currently hosting the clustered file service will fail. It's as if the VM is blue screening. That in itself is fairly anoying and I'll be doing all the updates and checking event log for clues
as to the cause.
The problem then is that whichever physical cluster node that is hosting the VM when it fails, will not unlock some of the VM's files. The Virtual machine configuration lists as Online Pending. This means that the failed VM cannot be restarted
on any other cluster node. The only fix is to drain the physical host it failed on, and reboot.
Looking for suggestions on how to fix the following.
1. Crashing guest file cluster node
2. Failed VM with shared VHDX requiring Phyiscal host reboot.
Event messages for the physical host that was hosting the failed vm in order that they occured.
Hyper-V-Worker: Event ID 18590 - 'FS-03' has encountered a fatal error. The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x9E, ErrorCode1: 0x6C2A17C0, ErrorCode2: 0x3C, ErrorCode3: 0xA, ErrorCode4:
0x0. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID 36166B47-D003-4E51-AFB5-7B967A3EFD2D)
FailoverClustering: Event ID 1069 - Cluster resource 'Virtual Machine FS-03' of type 'Virtual Machine' in clustered role 'FS-03' failed.
Hyper-V-High-Availability: Event ID 21128 - 'Virtual Machine FS-03' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
Hyper-V-High-Availability: Event ID 21110 - 'Virtual Machine FS-03' failed to terminate.
Hyper-V-VMMS: Event ID 20108 - The Virtual Machine Management Service failed to start the virtual machine '36166B47-D003-4E51-AFB5-7B967A3EFD2D': The group or resource is not in the correct state to perform the requested operation. (0x8007139F).
Hyper-V-High-Availability: Event ID 21107 - 'Virtual Machine FS-03' failed to start.
FailoverClustering: Event ID 1205 - The Cluster service failed to bring clustered role 'FS-03' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.Hi,
I don’t found the similar issue, Does your cluster can pass the cluster validation? Does all your Hyper-V host compatible with Server 2012r2? Have you try to disable all your
AV soft and firewall? Please rerun Storage validation on the Cluster in non-production hours, the cluster validation report will quickly locate the issue.
More information:
Cluster
http://technet.microsoft.com/en-us/library/dd581778(v=ws.10).aspx
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place. -
ASM disk busy 99% only on one cluster node
Hello,
We have a three node Oracle RAC cluster. Our dba(s) called us and said they are getting OEM critical alers for an asm disk on one node only. I checked and the SAN attached drive does not show the same high utilization on either of the other two nodes. I checked the hardware and it seems fine. If the issue was with the SAN attached disk, we would be seeing the same errors on all three nodes since they share the same disks. The system crashed last week(alert dump in the +asm directories), and at the disk has been busy ever since. I asked if the dba reviewed the ADDM reports and he said he had and that there were no suspicious looking entries that would lead us to the root cause based on those reports. CPU utilization is fine. I am not sure where to look at this point and any help pointing me in the right direction would be appreciated. They do use RMAN, could there be a backup running using those disks only on one node? Has anyone ever seen this before?
Thank you,
Benita Ulisano
Unix/SAN Team
Chicago Public Schools
[email protected]Hi Harish,
Thank you for responding. To answer your question, yes, the disks are all of the same spec and are shared among the three cluster node. The asm disk sdw1 is the one with the issue.
Problem Node: coefsdb02
three nodes in RAC cluster
coefsdb01, coefsdb02, coefsdb03
iostat results for all three nodes - same disk
coefsdb01
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdw1 0.00 1.71 0.12 0.58 1.27 18.78 28.63 0.01 13.38 1.75 0.12
coefsdb02
sdw1 0.11 0.02 4.00 0.62 305.84 21.72 70.93 2.96 12.58 211.95 97.88
coefdb03
sdw1 0.21 0.01 4.70 0.33 224.05 13.52 47.22 0.05 10.11 6.15 3.09
The dba(s) run RMAN backups, but only on coefsdb01.
Benita -
Re-installing Hyper-V 2012 R2 cluster node
We have a four HP BL460 Gen8 servers acting as a part of Hyper-V Cluster, running Windows Server 2012 R2 Datacenter.
Storage is provided by two node 3PAR StoreServ 7400.
All network and fc connections are managed by HP Virtual Connect.
One of the four nodes crashed during HP SPP upgrade which resulted as non booting OS.
I managed to get the OS alive by running multiple check disks and by manually restoring registry hives from backup via Windows 7 installation media's recovery console.
After the recovery there were still some issues with filesystem. Corrupted, orphaned and missing files here and there.
Now I want to re-install the OS from scratch to make sure everything will work correctly and to avoid any future errors.
What I need to know is that is the best practice to re-install the OS with new computername, or should I drop the current OS to workgroup, re-install it and join the AD domain with same computer name? I've already evicted the node from Hyper-V cluster
but the server is still running as a member server on AD.
Any other things I should take into consideration before doing the re-installation?
Thanks in advance!I agree that after a major problem it is much safer to rebuild the system. It sounds like you have the node rebuilt, so I would evict it from the cluster and then remove it from the domain. Rebuild it and you can use the same name because those two
actions will clean up its 'footprints'.
If the machine were not running, you would still evict the node from the cluster, but you would need to go into Active Directory to delete the computer account. Then rebuild.
. : | : . : | : . tim -
Node crashes when enabling RDS for private interconnect.
OS: oel6.3 - 2.6.39-300.17.2.el6uek.x86_64
Grid and DB: 11.2.0.3.4
This is a two node Standard Edition cluster.
The node crashes upon restart of clusterware after following the instructions from note:751343.1 (RAC Support for RDS Over Infiniband) to enable RDS.
The cluster is running fine using ipoib for the cluster_interconnect.
1) As the ORACLE_HOME/GI_HOME owner, stop all resources (database, listener, ASM etc) that's running from the home. When stopping database, use NORMAL or IMMEDIATE option.
2) As root, if relinking 11gR2 Grid Infrastructure (GI) home, unlock GI home: GI_HOME/crs/install/rootcrs.pl -unlock
3) As the ORACLE_HOME/GI_HOME owner, go to ORACLE_HOME/GI_HOME and cd to rdbms/lib
4) As the ORACLE_HOME/GI_HOME owner, issue "make -f ins_rdbms.mk ipc_rds ioracle"
5) As root, if relinking 11gR2 Grid Infrastructure (GI) home, lock GI home: GI_HOME/crs/install/rootcrs.pl -patch
Looks to abend when asm tries to start with the message below on the console.
I have a service request open for this issue but, I am hoping someone may have seen this and has
some way around it.
Thanks
Alan
kernel BUG at net/rds/ib_send.c:547!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: 8021q garp stp llc iptable_filter ip_tables nfs lockd
fscache auth_rpcgss nfs_acl sunrpc cpufreq_ondemand powernow_k8
freq_table mperf rds_rdma rds_tcp rds ib_ipoib rdma_ucm ib_ucm ib_uverbs
ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa sr_mod cdrom microcode
serio_raw pcspkr ghes hed k10temp hwmon amd64_edac_mod edac_core
edac_mce_amd i2c_piix4 i2c_core sg igb dca mlx4_ib ib_mad ib_core
mlx4_en mlx4_core ext4 mbcache jbd2 usb_storage sd_mod crc_t10dif ahci
libahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
scsi_wait_scan]
Pid: 4140, comm: kworker/u:1 Not tainted 2.6.39-300.17.2.el6uek.x86_64
#1 Supermicro BHDGT/BHDGT
RIP: 0010:[<ffffffffa02db829>] [<ffffffffa02db829>]
rds_ib_xmit+0xa69/0xaf0 [rds_rdma]
RSP: 0018:ffff880fb84a3c50 EFLAGS: 00010202
RAX: ffff880fbb694000 RBX: ffff880fb3e4e600 RCX: 0000000000000000
RDX: 0000000000000030 RSI: ffff880fbb6c3a00 RDI: ffff880fb058a048
RBP: ffff880fb84a3d30 R08: 0000000000000fd0 R09: ffff880fbb6c3b90
R10: 0000000000000000 R11: 000000000000001a R12: ffff880fbb6c3a00
R13: ffff880fbb6c3a00 R14: 0000000000000000 R15: ffff880fb84a3d90
FS: 00007fd0a3a56700(0000) GS:ffff88101e240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000002158ca2 CR3: 0000000001783000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:1 (pid: 4140, threadinfo ffff880fb84a2000, task
ffff880fae970180)
Stack:
0000000000012200 0000000000012200 ffff880f00000000 0000000000000000
000000000000e5b0 ffffffff8115af81 ffffffff81b8d6c0 ffffffffa02b2e12
00000001bf272240 ffffffff81267020 ffff880fbb6c3a00 0000003000000002
Call Trace:
[<ffffffff8115af81>] ? __kmalloc+0x1f1/0x200
[<ffffffffa02b2e12>] ? rds_message_alloc+0x22/0x90 [rds]
[<ffffffff81267020>] ? sg_init_table+0x30/0x50
[<ffffffffa02b2db2>] ? rds_message_alloc_sgs+0x62/0xa0 [rds]
[<ffffffffa02b31e4>] ? rds_message_map_pages+0xa4/0x110 [rds]
[<ffffffffa02b4f3b>] rds_send_xmit+0x38b/0x6e0 [rds]
[<ffffffff81089d53>] ? cwq_activate_first_delayed+0x53/0x100
[<ffffffffa02b6040>] ? rds_recv_worker+0xc0/0xc0 [rds]
[<ffffffffa02b6075>] rds_send_worker+0x35/0xc0 [rds]
[<ffffffff81089fd6>] process_one_work+0x136/0x450
[<ffffffff8108bbe0>] worker_thread+0x170/0x3c0
[<ffffffff8108ba70>] ? manage_workers+0x120/0x120
[<ffffffff810907e6>] kthread+0x96/0xa0
[<ffffffff81515544>] kernel_thread_helper+0x4/0x10
[<ffffffff81090750>] ? kthread_worker_fn+0x1a0/0x1a0
[<ffffffff81515540>] ? gs_change+0x13/0x13
Code: ff ff e9 b1 fe ff ff 48 8b 0d b4 54 4b e1 48 89 8d 70 ff ff ff e9
71 ff ff ff 83 bd 7c ff ff ff 00 0f 84 f4 f5 ff ff 0f 0b eb fe <0f> 0b
eb fe 44 8b 8d 48 ff ff ff 41 b7 01 e9 51 f6 ff ff 0f 0b
RIP [<ffffffffa02db829>] rds_ib_xmit+0xa69/0xaf0 [rds_rdma]
RSP <ffff880fb84a3c50>
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.39-300.17.2.el6uek.x86_64
([email protected]) (gcc version 4.4.6 20110731 (Red
Hat 4.4.6-3) (GCC) ) #1 SMP Wed Nov 7 17:48:36 PST 2012
Command line: ro root=UUID=5ad1a268-b813-40da-bb76-d04895215677
rd_DM_UUID=ddf1_stor rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD
SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us numa=off
console=ttyS1,115200n8 irqpoll maxcpus=1 nr_cpus=1 reset_devices
cgroup_disable=memory mce=off memmap=exactmap memmap=538K@64K
memmap=130508K@770048K elfcorehdr=900556K memmap=72K#3668608K
memmap=184K#3668680K
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000100 - 0000000000096800 (usable)
BIOS-e820: 0000000000096800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000dfe90000 (usable)
BIOS-e820: 00000000dfe9e000 - 00000000dfea0000 (reserved)
BIOS-e820: 00000000dfea0000 - 00000000dfeb2000 (ACPI data)
BIOS-e820: 00000000dfeb2000 - 00000000dfee0000 (ACPI NVS)
BIOS-e820: 00000000dfee0000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)I believe OFED version is 1.5.3.3 but I am not sure if this is correct.
We have not added any third parry drivers. All that has been done to add infiniband to our build is
a yum groupinstall iInfiniband support.
I have not tries rds-stress but rds-ping works fine and rds-info seems fine.
A service request has been opened but so far I have had better response here.
oracle@blade1-6:~> rds-info
RDS IB Connections:
LocalAddr RemoteAddr LocalDev RemoteDev
10.10.0.116 10.10.0.119 fe80::25:90ff:ff07:df1d fe80::25:90ff:ff07:e0e5
TCP Connections:
LocalAddr LPort RemoteAddr RPort HdrRemain DataRemain SentNxt ExpectUna SeenUna
Counters:
CounterName Value
conn_reset 5
recv_drop_bad_checksum 0
recv_drop_old_seq 0
recv_drop_no_sock 1
recv_drop_dead_sock 0
recv_deliver_raced 0
recv_delivered 18
recv_queued 18
recv_immediate_retry 0
recv_delayed_retry 0
recv_ack_required 4
recv_rdma_bytes 0
recv_ping 14
send_queue_empty 18
send_queue_full 0
send_lock_contention 0
send_lock_queue_raced 0
send_immediate_retry 0
send_delayed_retry 0
send_drop_acked 0
send_ack_required 3
send_queued 32
send_rdma 0
send_rdma_bytes 0
send_pong 14
page_remainder_hit 0
page_remainder_miss 0
copy_to_user 0
copy_from_user 0
cong_update_queued 0
cong_update_received 1
cong_send_error 0
cong_send_blocked 0
ib_connect_raced 4
ib_listen_closed_stale 0
ib_tx_cq_call 6
ib_tx_cq_event 6
ib_tx_ring_full 0
ib_tx_throttle 0
ib_tx_sg_mapping_failure 0
ib_tx_stalled 16
ib_tx_credit_updates 0
ib_rx_cq_call 33
ib_rx_cq_event 38
ib_rx_ring_empty 0
ib_rx_refill_from_cq 0
ib_rx_refill_from_thread 0
ib_rx_alloc_limit 0
ib_rx_credit_updates 0
ib_ack_sent 4
ib_ack_send_failure 0
ib_ack_send_delayed 0
ib_ack_send_piggybacked 0
ib_ack_received 3
ib_rdma_mr_alloc 0
ib_rdma_mr_free 0
ib_rdma_mr_used 0
ib_rdma_mr_pool_flush 8
ib_rdma_mr_pool_wait 0
ib_rdma_mr_pool_depleted 0
ib_atomic_cswp 0
ib_atomic_fadd 0
iw_connect_raced 0
iw_listen_closed_stale 0
iw_tx_cq_call 0
iw_tx_cq_event 0
iw_tx_ring_full 0
iw_tx_throttle 0
iw_tx_sg_mapping_failure 0
iw_tx_stalled 0
iw_tx_credit_updates 0
iw_rx_cq_call 0
iw_rx_cq_event 0
iw_rx_ring_empty 0
iw_rx_refill_from_cq 0
iw_rx_refill_from_thread 0
iw_rx_alloc_limit 0
iw_rx_credit_updates 0
iw_ack_sent 0
iw_ack_send_failure 0
iw_ack_send_delayed 0
iw_ack_send_piggybacked 0
iw_ack_received 0
iw_rdma_mr_alloc 0
iw_rdma_mr_free 0
iw_rdma_mr_used 0
iw_rdma_mr_pool_flush 0
iw_rdma_mr_pool_wait 0
iw_rdma_mr_pool_depleted 0
tcp_data_ready_calls 0
tcp_write_space_calls 0
tcp_sndbuf_full 0
tcp_connect_raced 0
tcp_listen_closed_stale 0
RDS Sockets:
BoundAddr BPort ConnAddr CPort SndBuf RcvBuf Inode
0.0.0.0 0 0.0.0.0 0 131072 131072 340441
RDS Connections:
LocalAddr RemoteAddr NextTX NextRX Flg
10.10.0.116 10.10.0.119 33 38 --C
Receive Message Queue:
LocalAddr LPort RemoteAddr RPort Seq Bytes
Send Message Queue:
LocalAddr LPort RemoteAddr RPort Seq Bytes
Retransmit Message Queue:
LocalAddr LPort RemoteAddr RPort Seq Bytes
10.10.0.116 0 10.10.0.119 40549 32 0
oracle@blade1-6:~> cat /etc/rdma/rdma.conf
# Load IPoIB
IPOIB_LOAD=yes
# Load SRP module
SRP_LOAD=no
# Load iSER module
ISER_LOAD=no
# Load RDS network protocol
RDS_LOAD=yes
# Should we modify the system mtrr registers? We may need to do this if you
# get messages from the ib_ipath driver saying that it couldn't enable
# write combining for the PIO buffs on the card.
# Note: recent kernels should do this for us, but in case they don't, we'll
# leave this option
FIXUP_MTRR_REGS=no
# Should we enable the NFSoRDMA service?
NFSoRDMA_LOAD=yes
NFSoRDMA_PORT=2050
oracle@blade1-6:~> /etc/init.d/rdma status
Low level hardware support loaded:
mlx4_ib
Upper layer protocol modules:
rds_rdma ib_ipoib
User space access modules:
rdma_ucm ib_ucm ib_uverbs ib_umad
Connection management modules:
rdma_cm ib_cm iw_cm
Configured IPoIB interfaces: none
Currently active IPoIB interfaces: ib0 -
Unable to failover the services in active-active cluster node
Hi,
i am applying the sp2 patch for sql server 2008 r2 in active-active cluster, we have 3 services in the cluster , node 1 as 2 prefered owner and node 2 as 1 prefered owner, when i try to move the service from node 2 to node1 , i am getting the below errors
DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols.
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server XXXXXXXXX. The target name used was RPCSS/XXXXXX. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal
name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using
a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server
name is not fully qualified, and the target domain (XXXXXX) is different from the client domain (XXXXXXX), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
The Cluster service failed to bring clustered service or application 'CHCROCHC045' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
Cluster resource 'SQL Server (CHCROCHC045)' in clustered service or application 'CHCROCHC045' failed.
any inputs appreciated to resolve this issue as i could not procedd with patching
BR
PGRHi PGR,
As the issue is more related to Windows Server, I would like to recommend you post the issue in the
Windows Server forums for better support.
In addition, below are some article about troubleshooting error ” DCOM was unable to communicate with the computer XXXXXXXXX using any of the configured protocols” for your reference.
Event ID 10009 — COM Remote Service Availability
How to troubleshoot DCOM 10009 error logged in system event?
Thanks,
Lydia Zhang
Lydia Zhang
TechNet Community Support -
Error while getting cluster node subtree
Hi,
We are on SP15.
The console logs show the following error
log generation timestamp : 2006_01_17_at_17_14_05
java.rmi.RemoteException: Error while getting cluster node subtree of :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=""; nested exception is:
com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:242)
at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImplp4_Skel.dispatch(ConvenienceEngineAdministratorImplp4_Skel.java:99)
at com.sap.engine.services.rmi_p4.DispatchImpl._runInternal(DispatchImpl.java:304)
at com.sap.engine.services.rmi_p4.DispatchImpl._run(DispatchImpl.java:193)
at com.sap.engine.services.rmi_p4.server.P4SessionProcessor.request(P4SessionProcessor.java:122)
at com.sap.engine.core.service630.context.cluster.session.ApplicationSessionMessageListener.process(ApplicationSessionMessageListener.java:33)
at com.sap.engine.core.cluster.impl6.session.MessageRunner.run(MessageRunner.java:41)
at com.sap.engine.core.thread.impl3.ActionObject.run(ActionObject.java:37)
at java.security.AccessController.doPrivileged(Native Method)
at com.sap.engine.core.thread.impl3.SingleThread.execute(SingleThread.java:100)
at com.sap.engine.core.thread.impl3.SingleThread.run(SingleThread.java:170)
Caused by: com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:816)
at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.invoke(MBeanServerInterceptorChain.java:330)
at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:239)
... 10 more
Caused by: com.sap.engine.services.jmx.exception.JmxConnectorException: Unable to de-serialize request parameters, message [ JMX request (java) v1.0 len: 345 | src: cluster target-node: 2053400 req: invoke params-number: 4 params-bytes: 0 | :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster="" null null null ]
at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invokeMbsInternal(MBeanServerConnectionImpl.java:680)
at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invoke(MBeanServerConnectionImpl.java:467)
at com.sap.engine.services.jmx.MBeanServerConnectionSecurityWrapper.invoke(MBeanServerConnectionSecurityWrapper.java:221)
at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:813)
... 12 more
Caused by: javax.management.InstanceNotFoundException: MBean with name com.sap.default:name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=XD1 not found in repository
at com.sap.pj.jmx.server.MBeanServerImpl.getClassLoaderFor(MBeanServerImpl.java:1408)
at com.sap.pj.jmx.server.interceptor.MBeanServerWrapperInterceptor.getClassLoaderFor(MBeanServerWrapperInterceptor.java:455)
at com.sap.engine.services.jmx.CompletionInterceptor.getClassLoaderFor(CompletionInterceptor.java:567)
at com.sap.pj.jmx.server.interceptor.BasicMBeanServerInterceptor.getClassLoaderFor(BasicMBeanServerInterceptor.java:438)
at com.sap.jmx.provider.ProviderInterceptor.getClassLoaderFor(ProviderInterceptor.java:330)
at com.sap.engine.services.jmx.RedirectInterceptor.getClassLoaderFor(RedirectInterceptor.java:501)
at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.getClassLoaderFor(MBeanServerInterceptorChain.java:443)
at com.sap.engine.services.jmx.RequestMessage.readParams(RequestMessage.java:523)
at com.sap.engine.services.jmx.RequestMessage.getParams(RequestMessage.java:578)
at com.sap.engine.services.jmx.MBeanServerInvoker.invokeMbs(MBeanServerInvoker.java:106)
at com.sap.engine.services.jmx.JmxServiceConnectorServer.receiveWait(JmxServiceConnectorServer.java:173)
at com.sap.engine.core.service630.context.cluster.message.MessageListenerWrapper.process(MessageListenerWrapper.java:81)
at com.sap.engine.core.cluster.impl6.ms.MSListenerThread.run(MSListenerThread.java:47)
at com.sap.engine.frame.core.thread.Task.run(Task.java:64)
at com.sap.engine.core.thread.impl6.SingleThread.execute(SingleThread.java:78)
at com.sap.engine.core.thread.impl6.SingleThread.run(SingleThread.java:148)
java.lang.NullPointerException
at com.sap.engine.services.adminadapter.gui.ClusterView.addGlobalDispatcherServiceProperties(ClusterView.java:455)
at com.sap.engine.services.adminadapter.gui.ClusterView.createGlobalTrees(ClusterView.java:508)
at com.sap.engine.services.adminadapter.gui.ClusterView.access$1200(ClusterView.java:29)
at com.sap.engine.services.adminadapter.gui.ClusterView$4.run(ClusterView.java:420)
java.rmi.RemoteException: Error while getting cluster node subtree of :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=""; nested exception is:
com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:242)
at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImplp4_Skel.dispatch(ConvenienceEngineAdministratorImplp4_Skel.java:99)
at com.sap.engine.services.rmi_p4.DispatchImpl._runInternal(DispatchImpl.java:304)
at com.sap.engine.services.rmi_p4.DispatchImpl._run(DispatchImpl.java:193)
at com.sap.engine.services.rmi_p4.server.P4SessionProcessor.request(P4SessionProcessor.java:122)
at com.sap.engine.core.service630.context.cluster.session.ApplicationSessionMessageListener.process(ApplicationSessionMessageListener.java:33)
at com.sap.engine.core.cluster.impl6.session.MessageRunner.run(MessageRunner.java:41)
at com.sap.engine.core.thread.impl3.ActionObject.run(ActionObject.java:37)
at java.security.AccessController.doPrivileged(Native Method)
at com.sap.engine.core.thread.impl3.SingleThread.execute(SingleThread.java:100)
at com.sap.engine.core.thread.impl3.SingleThread.run(SingleThread.java:170)
Caused by: com.sap.engine.services.jmx.exception.MBeanServerClusterException: Exception during invocation of remote MBeanServer method, target node: 2053400
at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:816)
at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.invoke(MBeanServerInterceptorChain.java:330)
at com.sap.engine.services.adminadapter.impl.ConvenienceEngineAdministratorImpl.getClusterNodeSubTree(ConvenienceEngineAdministratorImpl.java:239)
... 10 more
Caused by: com.sap.engine.services.jmx.exception.JmxConnectorException: Unable to de-serialize request parameters, message [ JMX request (java) v1.0 len: 345 | src: cluster target-node: 2053400 req: invoke params-number: 4 params-bytes: 0 | :name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster="" null null null ]
at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invokeMbsInternal(MBeanServerConnectionImpl.java:680)
at com.sap.engine.services.jmx.MBeanServerConnectionImpl.invoke(MBeanServerConnectionImpl.java:467)
at com.sap.engine.services.jmx.MBeanServerConnectionSecurityWrapper.invoke(MBeanServerConnectionSecurityWrapper.java:221)
at com.sap.engine.services.jmx.ClusterInterceptor.invoke(ClusterInterceptor.java:813)
... 12 more
Caused by: javax.management.InstanceNotFoundException: MBean with name com.sap.default:name=ClusterNodeRepresentative,j2eeType=com.sap.engine.services.adminadapter.impl.ClusterNodeRepresentative,SAP_J2EEClusterNode=2053400,SAP_J2EECluster=XD1 not found in repository
at com.sap.pj.jmx.server.MBeanServerImpl.getClassLoaderFor(MBeanServerImpl.java:1408)
at com.sap.pj.jmx.server.interceptor.MBeanServerWrapperInterceptor.getClassLoaderFor(MBeanServerWrapperInterceptor.java:455)
at com.sap.engine.services.jmx.CompletionInterceptor.getClassLoaderFor(CompletionInterceptor.java:567)
at com.sap.pj.jmx.server.interceptor.BasicMBeanServerInterceptor.getClassLoaderFor(BasicMBeanServerInterceptor.java:438)
at com.sap.jmx.provider.ProviderInterceptor.getClassLoaderFor(ProviderInterceptor.java:330)
at com.sap.engine.services.jmx.RedirectInterceptor.getClassLoaderFor(RedirectInterceptor.java:501)
at com.sap.pj.jmx.server.interceptor.MBeanServerInterceptorChain.getClassLoaderFor(MBeanServerInterceptorChain.java:443)
at com.sap.engine.services.jmx.RequestMessage.readParams(RequestMessage.java:523)
at com.sap.engine.services.jmx.RequestMessage.getParams(RequestMessage.java:578)
at com.sap.engine.services.jmx.MBeanServerInvoker.invokeMbs(MBeanServerInvoker.java:106)
at com.sap.engine.services.jmx.JmxServiceConnectorServer.receiveWait(JmxServiceConnectorServer.java:173)
at com.sap.engine.core.service630.context.cluster.message.MessageListenerWrapper.process(MessageListenerWrapper.java:81)
at com.sap.engine.core.cluster.impl6.ms.MSListenerThread.run(MSListenerThread.java:47)
at com.sap.engine.frame.core.thread.Task.run(Task.java:64)
at com.sap.engine.core.thread.impl6.SingleThread.execute(SingleThread.java:78)
at com.sap.engine.core.thread.impl6.SingleThread.run(SingleThread.java:148)
java.lang.NullPointerException
at com.sap.engine.services.adminadapter.gui.ClusterView.addGlobalDispatcherServiceProperties(ClusterView.java:455)
at com.sap.engine.services.adminadapter.gui.ClusterView.createGlobalTrees(ClusterView.java:508)
at com.sap.engine.services.adminadapter.gui.ClusterView.access$1200(ClusterView.java:29)
at com.sap.engine.services.adminadapter.gui.ClusterView$4.run(ClusterView.java:420)
Any clue whats it?
rgdsGo the same error
+ /usr/java14_64/bin/java -showversion -Duser.language=en -DP4ClassLoad=P4Connection -Dp4Cache=clean -jar go.jar
java version "1.4.2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2)
Classic VM (build 1.4.2, J2RE 1.4.2 IBM AIX 5L for PowerPC (64 bit JVM) build caix64142ifx-20061222 (ifix 113727: SR7 + 112603) (JIT enabled: jitc))
java.lang.NullPointerException
at com.sap.engine.services.adminadapter.gui.ClusterView$4.run(ClusterView.java:405)
Need some help!
Bernard -
Hi All,
I am facing the below error while installing Oracle RAC in Silent Mode.
SEVERE: There are no common subnets represented by network interfaces across all cluster nodes.
SEVERE: [FATAL] [INS-40925] One or more nodes have interfaces not configured with a subnet that is common across all cluster nodes.
CAUSE: Not all nodes have network interfaces that are configured on subnets that are common to all nodes in the cluster.
ACTION: Ensure all cluster nodes have a public interface defined with the same subnet accessible by all nodes in the cluster.
My /etc/hosts is given below.
127.0.0.1 localhost localhost.localdomain
#Public
192.168.1.101 rac1 rac1.localdomain
192.168.1.102 rac2 rac2.localdomain
#Private
192.168.2.101 rac1-priv rac1-priv.localdomain
192.168.2.102 rac2-priv rac2-priv.localdomain
#Virtual
192.168.1.103 rac1-vip rac1-vip.localdomain
192.168.1.104 rac2-vip rac2-vip.localdomain
#SCAN
192.168.1.105 rac-scan rac-scan.localdomain
Could you please help me to get rid of the error INS-40925....Any Idea...???Hi Ramesh,
Please find the result of ifconfig -a from both nodes RAC1 & RAC2.
ifconfig -a in RAC1
[oracle@rac1 Desktop]$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 08:00:27:17:7A:D5
inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe17:7ad5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:102 errors:0 dropped:0 overruns:0 frame:0
TX packets:48 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:25472 (24.8 KiB) TX bytes:3322 (3.2 KiB)
Interrupt:19 Base address:0xd020
eth1 Link encap:Ethernet HWaddr 08:00:27:C0:AC:DB
inet addr:192.168.2.101 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fec0:acdb/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4 errors:0 dropped:0 overruns:0 frame:0
TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:240 (240.0 b) TX bytes:816 (816.0 b)
Interrupt:16 Base address:0xd240
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:56 errors:0 dropped:0 overruns:0 frame:0
TX packets:56 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6394 (6.2 KiB) TX bytes:6394 (6.2 KiB)
virbr0 Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
virbr0-nic Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
ifconfig -a in RAC2
[oracle@rac2 Desktop]$ ifconfig -a
eth0 Link encap:Ethernet HWaddr 08:00:27:C9:38:82
inet addr:192.168.1.102 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fec9:3882/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:122 errors:0 dropped:0 overruns:0 frame:0
TX packets:59 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:32617 (31.8 KiB) TX bytes:5157 (5.0 KiB)
Interrupt:19 Base address:0xd020
eth1 Link encap:Ethernet HWaddr 08:00:27:90:B5:A0
inet addr:192.168.2.102 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe90:b5a0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4 errors:0 dropped:0 overruns:0 frame:0
TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:240 (240.0 b) TX bytes:746 (746.0 b)
Interrupt:16 Base address:0xd240
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:56 errors:0 dropped:0 overruns:0 frame:0
TX packets:56 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6390 (6.2 KiB) TX bytes:6390 (6.2 KiB)
virbr0 Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
virbr0-nic Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:500
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) -
Cluster with 2 hosts 2012 R2
Scheduled CAU fails with:
CAU run {4EFE116C-AB49-456D-8EED-F7EDC764DA49} on cluster Cluster1 failed. Error Message:One or more errors occurred while checking the status of Windows Firewall on the cluster nodes. Review the errors for more information on how to resolve the problems.
Error Code:-2146233088 Stack: at MS.Internal.ClusterAwareUpdating.Util.<CheckFirewallsAsync>d__3a.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.ClusterAwareUpdating.Commands.InvokeCauRunCommand.<_ProcessCluster>d__78.MoveNext()
If I run CAU "Analyze Readiness" ALL comes as PASS
If I run CUA by hand on same hosts with NO change to the system (not even reboot) it finishes OK
Anybody any ideas?
Thanks
SebHi,
In some case if you disabled the connection in Windows firewall inbound of
"Cluster aware updating" service it will can’t use the CAU.
More information:
Starting with Cluster-Aware Updating: Self-Updating
http://blogs.technet.com/b/filecab/archive/2012/05/17/starting-with-cluster-aware-updating-self-updating.aspx
What is Cluster Aware Updating in Windows Server 2012? (Part 1)
http://blogs.technet.com/b/mspfe/archive/2013/02/06/what-is-cluster-aware-updating-in-windows-server-2012.aspx
Cluster-Aware Updating Overview
http://technet.microsoft.com/en-us/library/hh831694.aspx
Hope this helps.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place. -
What are the preferred methods for backing up a cluster node bootdisk?
Hi,
I would like to use flarcreate to backup the bootdisks for each of the nodes in my cluster... but I cannot see this method mentioned in any cluster documentation...
Has anybody used flash backups for cluster nodes before (and more importantly - successfully restored a cluster node from a flash image..?)
Thanks very much,
TrevorHi, some backround on this - I need to patch some production cluster nodes, and obviously would like to backup the rootdisk of each node before doing this.
What I really need is some advice about the best method to backup & patch my cluster node (with a recovery method also).
The sun documentation for this says to use ufsdump, which i have used in the past - but will FLAR do the same job? - has anyone had experiance using FLAR to restore a cluster node?
Or if someone has some other solutions for patching the nodes? - maybe offline my root mirror (SVM) - patch root disk - barring any major problems - online the mirror again??
Cheers, Trevor -
File Being processed in two cluster nodes
Hi ,
We are having two cluster nodes and when my adapter picks the file, the file is getting processed in 2 cluster nodes.
I believe the file should get processed in either of the cluster node but not in both cluster nodes.
Has any one faced this kind of situation in any of your projects where you might be having different cluster nodes.
Thanks,
Chandra.Hi Chandra
Did u get a chance to see this post.. it may help
Processing in Multiple Cluster Nodes
Regards,
Sandeep
Maybe you are looking for
-
Can't save any custom workspaces AE CS6? (recently upgraded to OSX 10.9 rMBP)
Hello - since transferring my CS6 across to my new laptop running 10.9 - I can't save any custom workspaces... it's driving me crazy having to move everything around every time i restart AE. - I arrange the workspace and save it as a custom name - th
-
Master Keyboard no longer working in Logic 9
Hello, Everything was working fine with my set-up, and I regularly recorded into Logic Pro ) by playing on my Master Keyboard ( Studiologic SL-900 Pro). Then, I dismantled all my equipment to move it into another room, I've re-connected it but now, n
-
My EL Statements aren't being recognized by Weblogic 9.0 server
Hello, Does anyone know why Weblogic Server 9.0 is not displaying my EL statement output correctly. For eg. the following fragment displays just as it is written; nothing is evaluated. <%@ page import="java.util.*"%> <html> <head><title>Topic Qu
-
How to Import pictures in iphoto on ipad with original name
How to import pictures into iphoto on my ipad air with the original filenames. IT now Seems my pictures get a new name.
-
Native Client in SuSE 9.3
Install the native client of 3.42.947 in SuSE Pro 9,3, when sending an application closes everything and appears the line "Segmentation fault" to me, to that it must?