Local NFS / LDAP on cluster nodes

Hi,
I have a 2-node cluster (3.2 1/09) on Solaris 10 U8, providing NFS (/home) and LDAP for clients. I would like to configure LDAP and NFS clients on each cluster node, so they share user information with the rest of the machines.
I assume the right way to do this is to configure the cluster nodes the same as other clients, using the HA Logical Hostnames for the LDAP and NFS server; this way, there's always a working LDAP and NFS server for each node. However, what happens if both nodes reboot at once (for example, power failure)? As the first node boots, there is no working LDAP or NFS server, because it hasn't been started yet. Will this cause the boot to fail and require manual intervention, or will the cluster boot without NFS and LDAP clients enabled, allowing me to fix it later?

Thanks. In that case, is it safe to configure the NFS-exported filesystem as a global mount, and symlink e.g. "/home" -> "/global/home", so home directories are accessible via the normal path on both nodes? (I understand global filesystems have worse performance, but this would just be for administrators logging in with their LDAP accounts.)
For LDAP, my concern is that if svc:/network/ldap/client:default fails during startup (because no LDAP server is running yet), it might prevent the cluster services from starting, even though all names required by cluster are available from /etc.

Similar Messages

OrainstRoot.sh: Failure to promote local gpnp setup to other cluster nodes

I'm trying to build a 2 node cluster and everything appeared to be going swimmingly until the end of the 1st nodes running of the orainstRoot.sh script.
The following is the end of the output:
Disk Group OCR_VOTE created successfully.
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4256: Updating the profile
Successful addition of voting disk 4e3f692529584f8bbf7f16146bd90346.
Successful addition of voting disk 728bed918cf54f6cbf904d37638c674b.
Successful addition of voting disk 8ac20793405d4fdcbfcafc7e311f877d.
Successfully replaced voting disk group with +OCR_VOTE.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
1. ONLINE 4e3f692529584f8bbf7f16146bd90346 (ORCL:VOTE01) [OCR_VOTE]
2. ONLINE 728bed918cf54f6cbf904d37638c674b (ORCL:VOTE02) [OCR_VOTE]
3. ONLINE 8ac20793405d4fdcbfcafc7e311f877d (ORCL:VOTE03) [OCR_VOTE]
Located 3 voting disk(s).
Failed to rmtcopy "/tmp/fileLgKPGV" to "/u01/app/11.2.0/grid/gpnp/manifest.txt" for nodes {ilprevzedb01,ilprevzedb02}, rc=256
Failed to rmtcopy "/u01/app/11.2.0/grid/gpnp/ilprevzedb01/profiles/peer/profile.xml" to "/u01/app/11.2.0/grid/gpnp/profiles/peer/profile.xml" for nodes {ilprevzedb01,ilprevzedb02}, rc=256
rmtcopy aborted
Failed to promote local gpnp setup to other cluster nodes at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 6504.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
Has anyone run into this problem and found a solution?
Thanks in advance!

Ok, for everyone out there, I resolved the issue. Hopefully this will help others encountering the same problem.
It turns out that when the OS was installed, iptables firewall was enabled. This will cause havoc with the installer scripts.
My first inkling should have been when the installer stalled at 65% trying to copy home directories between nodes, the first time I ran through the installer.
At that time, Googling around found that iptables might be the problem and indeed it was running, so I just did a 'service iptables stop' WITHOUT REBOOTING THE NODES and re-ran the installer.
Well, it looks as though NOT REBOOTING THE NODES doesn't quite cut it. I then did a 'chkconfig iptables off' and REBOOTED BOTH NODES.
Oracle support simply provided me with: How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation (Doc ID 942166.1), which didn't really work all that well, lots of failures, errors, etc. So I just deleted the 11.2.0 directory and tried running the installer again.
This time the install went through without problems.
Thanks!

NFS cluster node crashed

Hi all, we have a 2-node cluster running Solaris 10 11/06 and Sun Cluster 3.2.
Recently, we were asked to nfs mount on node 1 of the cluster, a directory from an external Linux host (ie node 1 of the cluster is the nfs client; the linux server is the nfs server).
A few days later, early on a Sunday morning, the linux server developed a high load and was very slow to log into. Around the same time, node 1 of the cluster rebooted. Was this reboot of node 1 a coincidence? I'm not sure.
Anyone got ideas/suggestions about this situation (eg the slow response of the nfs linux server caused node 1 of the cluster to reboot; the external nfs mount is a bad idea)?
Stewart

Hi,
your assumption sounds very unreasonable. But without any hard facts like
- the panic string
- contents of /var/adm/messages at time of crash
- configuration information
- etc.
it is impossible to tell.
Regards
Hartmut

After reboot cluster node went into maintanance mode (CONTROL-D)

Hi there!
I have configured 2 node cluster on 2 x SUN Enterprise 220R and StoreEdge D1000.
Each time when rebooted any of the cluster nodes i get the following error during boot up:
The / file system (/dev/rdsk/c0t1d0s0) is being checked.
/dev/rdsk/c0t1d0s0: UNREF DIR I=35540 OWNER=root MODE=40755
/dev/rdsk/c0t1d0s0: SIZE=512 MTIME=Jun 5 15:02 2006 (CLEARED)
/dev/rdsk/c0t1d0s0: UNREF FILE I=1192311 OWNER=root MODE=100600
/dev/rdsk/c0t1d0s0: SIZE=96 MTIME=Jun 5 13:23 2006 (RECONNECTED)
/dev/rdsk/c0t1d0s0: LINK COUNT FILE I=1192311 OWNER=root MODE=100600
/dev/rdsk/c0t1d0s0: SIZE=96 MTIME=Jun 5 13:23 2006 COUNT 0 SHOULD BE 1
/dev/rdsk/c0t1d0s0: LINK COUNT INCREASING
/dev/rdsk/c0t1d0s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
In maintanance mode i do:
# fsck -y -F ufs /dev/rdsk/c0t1d0s0
and it managed to correct the problem ... but problem occured again after each reboot on each cluster node!
I have installed Sun CLuster 3.1 on Solaris 9 SPARC
How can i get rid of it?
Any ideas?
Brgds,
Sergej

Hi i get this:
112941-09 SunOS 5.9: sysidnet Utility Patch
116755-01 SunOS 5.9: usr/snadm/lib/libadmutil.so.2 Patch
113434-30 SunOS 5.9: /usr/snadm/lib Library and Differential Flash Patch
112951-13 SunOS 5.9: patchadd and patchrm Patch
114711-03 SunOS 5.9: usr/sadm/lib/diskmgr/VDiskMgr.jar Patch
118064-04 SunOS 5.9: Admin Install Project Manager Client Patch
113742-01 SunOS 5.9: smcpreconfig.sh Patch
113813-02 SunOS 5.9: Gnome Integration Patch
114501-01 SunOS 5.9: drmproviders.jar Patch
112943-09 SunOS 5.9: Volume Management Patch
113799-01 SunOS 5.9: solregis Patch
115697-02 SunOS 5.9: mtmalloc lib Patch
113029-06 SunOS 5.9: libaio.so.1 librt.so.1 and abi_libaio.so.1 Patch
113981-04 SunOS 5.9: devfsadm Patch
116478-01 SunOS 5.9: usr platform links Patch
112960-37 SunOS 5.9: patch libsldap ldap_cachemgr libldap
113332-07 SunOS 5.9: libc_psr.so.1 Patch
116500-01 SunOS 5.9: SVM auto-take disksets Patch
114349-04 SunOS 5.9: sbin/dhcpagent Patch
120441-03 SunOS 5.9: libsec patch
114344-19 SunOS 5.9: kernel/drv/arp Patch
114373-01 SunOS 5.9: UMEM - abi_libumem.so.1 patch
118558-27 SunOS 5.9: Kernel Patch
115675-01 SunOS 5.9: /usr/lib/liblgrp.so Patch
112958-04 SunOS 5.9: patch pci.so
113451-11 SunOS 5.9: IKE Patch
112920-02 SunOS 5.9: libipp Patch
114372-01 SunOS 5.9: UMEM - llib-lumem patch
116229-01 SunOS 5.9: libgen Patch
116178-01 SunOS 5.9: libcrypt Patch
117453-01 SunOS 5.9: libwrap Patch
114131-03 SunOS 5.9: multi-terabyte disk support - libadm.so.1 patch
118465-02 SunOS 5.9: rcm_daemon Patch
113490-04 SunOS 5.9: Audio Device Driver Patch
114926-02 SunOS 5.9: kernel/drv/audiocs Patch
113318-25 SunOS 5.9: patch /kernel/fs/nfs and /kernel/fs/sparcv9/nfs
113070-01 SunOS 5.9: ftp patch
114734-01 SunOS 5.9: /usr/ccs/bin/lorder Patch
114227-01 SunOS 5.9: yacc Patch
116546-07 SunOS 5.9: CDRW DVD-RW DVD+RW Patch
119494-01 SunOS 5.9: mkisofs patch
113471-09 SunOS 5.9: truss Patch
114718-05 SunOS 5.9: usr/kernel/fs/pcfs Patch
115545-01 SunOS 5.9: nss_files patch
115544-02 SunOS 5.9: nss_compat patch
118463-01 SunOS 5.9: du Patch
116016-03 SunOS 5.9: /usr/sbin/logadm patch
115542-02 SunOS 5.9: nss_user patch
116014-06 SunOS 5.9: /usr/sbin/usermod patch
116012-02 SunOS 5.9: ps utility patch
117433-02 SunOS 5.9: FSS FX RT Patch
117431-01 SunOS 5.9: nss_nis Patch
115537-01 SunOS 5.9: /kernel/strmod/ptem patch
115336-03 SunOS 5.9: /usr/bin/tar, /usr/sbin/static/tar Patch
117426-03 SunOS 5.9: ctsmc and sc_nct driver patch
121319-01 SunOS 5.9: devfsadmd_mod.so Patch
121316-01 SunOS 5.9: /kernel/sys/doorfs Patch
121314-01 SunOS 5.9: tl driver patch
116554-01 SunOS 5.9: semsys Patch
112968-01 SunOS 5.9: patch /usr/bin/renice
116552-01 SunOS 5.9: su Patch
120445-01 SunOS 5.9: Toshiba platform token links (TSBW,Ultra-3i)
112964-15 SunOS 5.9: /usr/bin/ksh Patch
112839-08 SunOS 5.9: patch libthread.so.1
115687-02 SunOS 5.9:/var/sadm/install/admin/default Patch
115685-01 SunOS 5.9: sbin/netstrategy Patch
115488-01 SunOS 5.9: patch /kernel/misc/busra
115681-01 SunOS 5.9: usr/lib/fm/libdiagcode.so.1 Patch
113032-03 SunOS 5.9: /usr/sbin/init Patch
113031-03 SunOS 5.9: /usr/bin/edit Patch
114259-02 SunOS 5.9: usr/sbin/psrinfo Patch
115878-01 SunOS 5.9: /usr/bin/logger Patch
116543-04 SunOS 5.9: vmstat Patch
113580-01 SunOS 5.9: mount Patch
115671-01 SunOS 5.9: mntinfo Patch
113977-01 SunOS 5.9: awk/sed pkgscripts Patch
122716-01 SunOS 5.9: kernel/fs/lofs patch
113973-01 SunOS 5.9: adb Patch
122713-01 SunOS 5.9: expr patch
117168-02 SunOS 5.9: mpstat Patch
116498-02 SunOS 5.9: bufmod Patch
113576-01 SunOS 5.9: /usr/bin/dd Patch
116495-03 SunOS 5.9: specfs Patch
117160-01 SunOS 5.9: /kernel/misc/krtld patch
118586-01 SunOS 5.9: cp/mv/ln Patch
120025-01 SunOS 5.9: ipsecconf Patch
116527-02 SunOS 5.9: timod Patch
117155-08 SunOS 5.9: pcipsy Patch
114235-01 SunOS 5.9: libsendfile.so.1 Patch
117152-01 SunOS 5.9: magic Patch
116486-03 SunOS 5.9: tsalarm Driver Patch
121998-01 SunOS 5.9: two-key mode fix for 3DES Patch
116484-01 SunOS 5.9: consconfig Patch
116482-02 SunOS 5.9: modload Utils Patch
117746-04 SunOS 5.9: patch platform/sun4u/kernel/drv/sparcv9/pic16f819
121992-01 SunOS 5.9: fgrep Patch
120768-01 SunOS 5.9: grpck patch
119438-01 SunOS 5.9: usr/bin/login Patch
114389-03 SunOS 5.9: devinfo Patch
116510-01 SunOS 5.9: wscons Patch
114224-05 SunOS 5.9: csh Patch
116670-04 SunOS 5.9: gld Patch
114383-03 SunOS 5.9: Enchilada/Stiletto - pca9556 driver
116506-02 SunOS 5.9: traceroute patch
112919-01 SunOS 5.9: netstat Patch
112918-01 SunOS 5.9: route Patch
112917-01 SunOS 5.9: ifrt Patch
117132-01 SunOS 5.9: cachefsstat Patch
114370-04 SunOS 5.9: libumem.so.1 patch
114010-02 SunOS 5.9: m4 Patch
117129-01 SunOS 5.9: adb Patch
117483-01 SunOS 5.9: ntwdt Patch
114369-01 SunOS 5.9: prtvtoc patch
117125-02 SunOS 5.9: procfs Patch
117480-01 SunOS 5.9: pkgadd Patch
112905-02 SunOS 5.9: ippctl Patch
117123-06 SunOS 5.9: wanboot Patch
115030-03 SunOS 5.9: Multiterabyte UFS - patch mount
114004-01 SunOS 5.9: sed Patch
113335-03 SunOS 5.9: devinfo Patch
113495-05 SunOS 5.9: cfgadm Library Patch
113494-01 SunOS 5.9: iostat Patch
113493-03 SunOS 5.9: libproc.so.1 Patch
113330-01 SunOS 5.9: rpcbind Patch
115028-02 SunOS 5.9: patch /usr/lib/fs/ufs/df
115024-01 SunOS 5.9: file system identification utilities
117471-02 SunOS 5.9: fifofs Patch
118897-01 SunOS 5.9: stc Patch
115022-03 SunOS 5.9: quota utilities
115020-01 SunOS 5.9: patch /usr/lib/adb/ml_odunit
113720-01 SunOS 5.9: rootnex Patch
114352-03 SunOS 5.9: /etc/inet/inetd.conf Patch
123056-01 SunOS 5.9: ldterm patch
116243-01 SunOS 5.9: umountall Patch
113323-01 SunOS 5.9: patch /usr/sbin/passmgmt
116049-01 SunOS 5.9: fdfs Patch
116241-01 SunOS 5.9: keysock Patch
113480-02 SunOS 5.9: usr/lib/security/pam_unix.so.1 Patch
115018-01 SunOS 5.9: patch /usr/lib/adb/dqblk
113277-44 SunOS 5.9: sd and ssd Patch
117457-01 SunOS 5.9: elfexec Patch
113110-01 SunOS 5.9: touch Patch
113077-17 SunOS 5.9: /platform/sun4u/kernal/drv/su Patch
115006-01 SunOS 5.9: kernel/strmod/kb patch
113072-07 SunOS 5.9: patch /usr/sbin/format
113071-01 SunOS 5.9: patch /usr/sbin/acctadm
116782-01 SunOS 5.9: tun Patch
114331-01 SunOS 5.9: power Patch
112835-01 SunOS 5.9: patch /usr/sbin/clinfo
114927-01 SunOS 5.9: usr/sbin/allocate Patch
119937-02 SunOS 5.9: inetboot patch
113467-01 SunOS 5.9: seg_drv & seg_mapdev Patch
114923-01 SunOS 5.9: /usr/kernel/drv/logindmux Patch
117443-01 SunOS 5.9: libkvm Patch
114329-01 SunOS 5.9: /usr/bin/pax Patch
119929-01 SunOS 5.9: /usr/bin/xargs patch
113459-04 SunOS 5.9: udp patch
113446-03 SunOS 5.9: dman Patch
116009-05 SunOS 5.9: sgcn & sgsbbc patch
116557-04 SunOS 5.9: sbd Patch
120241-01 SunOS 5.9: bge: Link & Speed LEDs flash constantly on V20z
113984-01 SunOS 5.9: iosram Patch
113220-01 SunOS 5.9: patch /platform/sun4u/kernel/drv/sparcv9/upa64s
113975-01 SunOS 5.9: ssm Patch
117165-01 SunOS 5.9: pmubus Patch
116530-01 SunOS 5.9: bge.conf Patch
116529-01 SunOS 5.9: smbus Patch
116488-03 SunOS 5.9: Lights Out Management (lom) patch
117131-01 SunOS 5.9: adm1031 Patch
117124-12 SunOS 5.9: platmod, drmach, dr, ngdr, & gptwocfg Patch
114003-01 SunOS 5.9: bbc driver Patch
118539-02 SunOS 5.9: schpc Patch
112837-10 SunOS 5.9: patch /usr/lib/inet/in.dhcpd
114975-01 SunOS 5.9: usr/lib/inet/dhcp/svcadm/dhcpcommon.jar Patch
117450-01 SunOS 5.9: ds_SUNWnisplus Patch
113076-02 SunOS 5.9: dhcpmgr.jar Patch
113572-01 SunOS 5.9: docbook-to-man.ts Patch
118472-01 SunOS 5.9: pargs Patch
122709-01 SunOS 5.9: /usr/bin/dc patch
113075-01 SunOS 5.9: pmap patch
113472-01 SunOS 5.9: madv & mpss lib Patch
115986-02 SunOS 5.9: ptree Patch
115693-01 SunOS 5.9: /usr/bin/last Patch
115259-03 SunOS 5.9: patch usr/lib/acct/acctcms
114564-09 SunOS 5.9: /usr/sbin/in.ftpd Patch
117441-01 SunOS 5.9: FSSdispadmin Patch
113046-01 SunOS 5.9: fcp Patch
118191-01 gtar patch
114818-06 GNOME 2.0.0: libpng Patch
117177-02 SunOS 5.9: lib/gss module Patch
116340-05 SunOS 5.9: gzip and Freeware info files patch
114339-01 SunOS 5.9: wrsm header files Patch
122673-01 SunOS 5.9: sockio.h header patch
116474-03 SunOS 5.9: libsmedia Patch
117138-01 SunOS 5.9: seg_spt.h
112838-11 SunOS 5.9: pcicfg Patch
117127-02 SunOS 5.9: header Patch
112929-01 SunOS 5.9: RIPv2 Header Patch
112927-01 SunOS 5.9: IPQos Header Patch
115992-01 SunOS 5.9: /usr/include/limits.h Patch
112924-01 SunOS 5.9: kdestroy kinit klist kpasswd Patch
116231-03 SunOS 5.9: llc2 Patch
116776-01 SunOS 5.9: mipagent patch
117420-02 SunOS 5.9: mdb Patch
117179-01 SunOS 5.9: nfs_dlboot Patch
121194-01 SunOS 5.9: usr/lib/nfs/statd Patch
116502-03 SunOS 5.9: mountd Patch
113331-01 SunOS 5.9: usr/lib/nfs/rquotad Patch
113281-01 SunOS 5.9: patch /usr/lib/netsvc/yp/ypbind
114736-01 SunOS 5.9: usr/sbin/nisrestore Patch
115695-01 SunOS 5.9: /usr/lib/netsvc/yp/yppush Patch
113321-06 SunOS 5.9: patch sf and socal
113049-01 SunOS 5.9: luxadm & liba5k.so.2 Patch
116663-01 SunOS 5.9: ntpdate Patch
117143-01 SunOS 5.9: xntpd Patch
113028-01 SunOS 5.9: patch /kernel/ipp/flowacct
113320-06 SunOS 5.9: patch se driver
114731-08 SunOS 5.9: kernel/drv/glm Patch
115667-03 SunOS 5.9: Chalupa platform support Patch
117428-01 SunOS 5.9: picl Patch
113327-03 SunOS 5.9: pppd Patch
114374-01 SunOS 5.9: Perl patch
115173-01 SunOS 5.9: /usr/bin/sparcv7/gcore /usr/bin/sparcv9/gcore Patch
114716-02 SunOS 5.9: usr/bin/rcp Patch
112915-04 SunOS 5.9: snoop Patch
116778-01 SunOS 5.9: in.ripngd patch
112916-01 SunOS 5.9: rtquery Patch
112928-03 SunOS 5.9: in.ndpd Patch
119447-01 SunOS 5.9: ses Patch
115354-01 SunOS 5.9: slpd Patch
116493-01 SunOS 5.9: ProtocolTO.java Patch
116780-02 SunOS 5.9: scmi2c Patch
112972-17 SunOS 5.9: patch /usr/lib/libssagent.so.1 /usr/lib/libssasnmp.so.1 mibiisa
116480-01 SunOS 5.9: IEEE 1394 Patch
122485-01 SunOS 5.9: 1394 mass storage driver patch
113716-02 SunOS 5.9: sar & sadc Patch
115651-02 SunOS 5.9: usr/lib/acct/runacct Patch
116490-01 SunOS 5.9: acctdusg Patch
117473-01 SunOS 5.9: fwtmp Patch
116180-01 SunOS 5.9: geniconvtbl Patch
114006-01 SunOS 5.9: tftp Patch
115646-01 SunOS 5.9: libtnfprobe shared library Patch
113334-03 SunOS 5.9: udfs Patch
115350-01 SunOS 5.9: ident_udfs.so.1 Patch
122484-01 SunOS 5.9: preen_md.so.1 patch
117134-01 SunOS 5.9: svm flasharchive patch
116472-02 SunOS 5.9: rmformat Patch
112966-05 SunOS 5.9: patch /usr/sbin/vold
114229-01 SunOS 5.9: action_filemgr.so.1 Patch
114335-02 SunOS 5.9: usr/sbin/rmmount Patch
120443-01 SunOS 5.9: sed core dumps on long lines
121588-01 SunOS 5.9: /usr/xpg4/bin/awk Patch
113470-02 SunOS 5.9: winlock Patch
119211-07 NSS_NSPR_JSS 3.11: NSPR 4.6.1 / NSS 3.11 / JSS 4.2
118666-05 J2SE 5.0: update 6 patch
118667-05 J2SE 5.0: update 6 patch, 64bit
114612-01 SunOS 5.9: ANSI-1251 encodings file errors
114276-02 SunOS 5.9: Extended Arabic support in UTF-8
117400-01 SunOS 5.9: ISO8859-6 and ISO8859-8 iconv symlinks
113584-16 SunOS 5.9: yesstr, nostr nl_langinfo() strings incorrect in S9
117256-01 SunOS 5.9: Remove old OW Xresources.ow files
112625-01 SunOS 5.9: Dcam1394 patch
114600-05 SunOS 5.9: vlan driver patch
117119-05 SunOS 5.9: Sun Gigabit Ethernet 3.0 driver patch
117593-04 SunOS 5.9: Manual Page updates for Solaris 9
112622-19 SunOS 5.9: M64 Graphics Patch
115953-06 Sun Cluster 3.1: Sun Cluster sccheck patch
117949-23 Sun Cluster 3.1: Core Patch for Solaris 9
115081-06 Sun Cluster 3.1: HA-Sun One Web Server Patch
118627-08 Sun Cluster 3.1: Manageability and Serviceability Agent
117985-03 SunOS 5.9: XIL 1.4.2 Loadable Pipeline Libraries
113896-06 SunOS 5.9: en_US.UTF-8 locale patch
114967-02 SunOS 5.9: FDL patch
114677-11 SunOS 5.9: International Components for Unicode Patch
112805-01 CDE 1.5: Help volume patch
113841-01 CDE 1.5: answerbook patch
113839-01 CDE 1.5: sdtwsinfo patch
115713-01 CDE 1.5: dtfile patch
112806-01 CDE 1.5: sdtaudiocontrol patch
112804-02 CDE 1.5: sdtname patch
113244-09 CDE 1.5: dtwm patch
114312-02 CDE1.5: GNOME/CDE Menu for Solaris 9
112809-02 CDE:1.5 Media Player (sdtjmplay) patch
113868-02 CDE 1.5: PDASync patch
119976-01 CDE 1.5: dtterm patch
112771-30 Motif 1.2.7 and 2.1.1: Runtime library patch for Solaris 9
114282-01 CDE 1.5: libDtWidget patch
113789-01 CDE 1.5: dtexec patch
117728-01 CDE1.5: dthello patch
113863-01 CDE 1.5: dtconfig patch
112812-01 CDE 1.5: dtlp patch
113861-04 CDE 1.5: dtksh patch
115972-03 CDE 1.5: dtterm libDtTerm patch
114654-02 CDE 1.5: SmartCard patch
117632-01 CDE1.5: sun_at patch for Solaris 9
113374-02 X11 6.6.1: xpr patch
118759-01 X11 6.6.1: Font Administration Tools patch
117577-03 X11 6.6.1: TrueType fonts patch
116084-01 X11 6.6.1: font patch
113098-04 X11 6.6.1: X RENDER extension patch
112787-01 X11 6.6.1: twm patch
117601-01 X11 6.6.1: libowconfig.so.0 patch
117663-02 X11 6.6.1: xwd patch
113764-04 X11 6.6.1: keyboard patch
113541-02 X11 6.6.1: XKB patch
114561-01 X11 6.6.1: X splash screen patch
113513-02 X11 6.6.1: platform support for new hardware
116121-01 X11 6.4.1: platform support for new hardware
114602-04 X11 6.6.1: libmpg_psr patch
Is there a bundle to install or i have to install each patch separatly_?

Cluster node is hung but not killed

Hello,
one of two SC3.2 cluster nodes was hung under the heavy load probably due to low memory.
One of visible symptoms was these error messages:
Jan 29 17:48:56 node2 genunix: [ID 661778 kern.warning] WARNING: clcomm: memory low: freemem 0xff8
Jan 29 17:49:15 node2 genunix: [ID 661778 kern.warning] WARNING: clcomm: memory low: freemem 0xff6The problem that the node wasn't killed and the whole cluster (it's an HA NFS Active/Passive configuration) became unfunctional.
What can be done to prevent such situation?
TIA,
-- leon

Those error messages seem to indicate that at least some part of the system was
working enough to complain that more memory is needed.
Can you tell us a bit more about the exact problem you were experiencing? You mention
that there was heavy load, which indicates perhaps that there was lots of IO going on
on the system? If so, that is opposite of "hung" which, to me, means that the system is
not able to perform any useful work at all.
Perhaps the system was merely very slow, because of lack of memory and very heavy
load?
It is possible that the lack of memory is not because of lots of load, but because of a
bug in the system (a daemon which is leaking memory, perhaps?). However, in that
case, doing a "prstat" should help you find if that is the case. Otherwise, start with
memory analysis on you system and try to figure out what it is which is consuming memory.
Assuming that, as you suggested, the problem is heavy load on the system, read on...
You mention that HA-NFS is the application running on the system. If you want the system to
failover in cases of extrerem load and slowness, you can configure HA-NFS to do so by
reducing its timeouts etc. However, please realize that after the failover to another node,
(or restart on the local node), the client load would resume, the system would again become
slow, and you haven't really achieved anything.
You ask, "What can be done?", i would say adding more memory would be a start. But
i personally suspect that you are running into a bug in the system somewhere which is
causing this slowness. If so, let us start with figuring that out by looking closely at the
system. Do a prstat, note the size of processes on the system and rule out any user level
processes. Next, look at the filesystem in use by NFS, and make sure that is working
fine (you are able to create files etc.), look at the CPU/disk usage to rule out "maxed out
CPU or disk usage" as the cause of the slowness/almost_hungness....
HTH,
-ashu

SCVMM losing connection to cluster nodes

Hey guys'n girls, I hope this is the right forum for this question. I already opened a ticket at MS support as well because it's impacting our production environment indirectly, but even after a week there's been no contact. Losing faith in MS support there
The problem we're having is that scvmm is that a host enters the 'needs attention' state, with a winrm error 0x80338126. I guess it has something to do with the network or with Kerberos, and I've found some info on it, but I still haven't been able to solve
it. Do you guys have any ideas?
Problem summary:
We are seeing an issue on our new hyper-v platform. The platform should have been in production last week, but this issue is delaying our project as we can't seem to get it stable.
The problem we are experiencing is that SCVMM loses the connection to some of the Hyper-V nodes. Not one
specific node. Last week it happened to two nodes, and today it happened to another node. I see issues with WinRM, and I expect something to do with kerberos. See the bottom of this post for background details and software versions.
The host gets the status 'needs attention', and if you look at the status of the machine, WinRM gives an error. The error is:
Error (2916)
VMM is unable to complete the request. The connection to the agent cc1-hyp-10.domaincloud1.local was lost.
WinRM: URL: [http://cc1-hyp-10.domaincloud1.local:5985], Verb: [ENUMERATE], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/cimv2/Win32_Service], Filter: [select * from Win32_Service where Name="WinRM"]
Unknown error (0x80338126)
Recommended Action
Ensure that the Windows Remote Management (WinRM) service and the VMM agent are installed and running and that a firewall is not blocking HTTP/HTTPS traffic. Ensure that VMM server is able to communicate with cc1-hyp-10.domaincloud1.local over WinRM by successfully
running the following command:
winrm id –r:cc1-hyp-10.domaincloud1.local
This
problem can also be caused by a Windows Management Instrumentation (WMI) service crash. If the server is running Windows Server 2008 R2, ensure that KB 982293 (http://support.microsoft.com/kb/982293)
is installed on it.
If the error persists, restart cc1-hyp-10.domaincloud1.local and then try the operation again. /nRefer to
http://support.microsoft.com/kb/2742275 for more details.
Doing a simple test from the VMM server to the problematic cluster node shows this error:
PS C:\> hostname
CC1-VMM-01
PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
WSManFault
    Message = WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this
computer. By default, the WinRM firewall exception for public profiles limits access to remote computers within the same local subnet.
Error number: -2144108250 0x80338126
WinRM cannot complete the operation. Verify that the specified computer name is valid, that the computer is accessible over the network, and that a firewall exception for the WinRM service is enabled and allows access from this computer. By default, the WinRM
firewall exception for public profiles limits access to remote computers within the same local subnet.
I CAN connect from other hosts to this problematic cluster node:
PS C:\> hostname
CC1-HYP-16
PS C:\> winrm id -r:cc1-hyp-10.domaincloud1.local
IdentifyResponse
    ProtocolVersion =
http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
    ProductVendor = Microsoft Corporation
    ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
    SecurityProfiles
        SecurityProfileName =
http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
And I can connect from the vmm server to all other cluster nodes:
PS C:\> hostname
CC1-VMM-01
PS C:\> winrm id -r:cc1-hyp-11.domaincloud1.local
IdentifyResponse
    ProtocolVersion =
http://schemas.dmtf.org/wbem/wsman/1/wsman.xsd
    ProductVendor = Microsoft Corporation
    ProductVersion = OS: 6.3.9600 SP: 0.0 Stack: 3.0
    SecurityProfiles
        SecurityProfileName =
http://schemas.dmtf.org/wbem/wsman/1/wsman/secprofile/http/spnego-kerberos
So at this point only the test from the cc1-vmm-01 to cc1-hyp-10 seems to be problematic.
I followed the steps in the page
https://support.microsoft.com/kb/2742275 (which is referred to above). I tried the VMMCA, but it can't really get it working the way I want, or it seems to give outdated recommendations.
I tried checking for duplicate SPN's by running setspn -x on affected machines. No results (although I do not understand
what an SPN is or how it works). I rebuilt the performance counters.
It tried setting 'sc config winrm type= own' as described in [http://blinditandnetworkadmin.blogspot.nl/2012/08/kb-how-to-troubleshoot-needs-attention.html].
If I reboot this cc1-hyp-10 machine, it will start working perfectly again. However, then I can't troubleshoot the issue, and it will happen again.
I want this problem to be solved, so vmm never loses connection to the hypervisors it's managing again!
Background information:
We've set up a platform with Hyper-V to run a VM workload. The platform consists of the following hardware:
2 Dell R620's with 32GB of RAM, running hyper-v to virtualize the cloud management layer (DC's, VMM, SQL). These machines are called cc1-hyp-01 and cc1-hyp-02. They run the management vm's like cc1-dc-01/02, cc1-sql-01, cc1-vmm-01, etc. The names are self-explanatory.
The VMM machine is NOT clustered.
8 Dell M620 blades with 320GB of RAM, running hyper-v to virtualize the customer workload. The machines are
called cc1-hyp-10 until cc1-hyp-17. They are in a cluster.
2 Equallogic units form a SAN (premium storage), and we have a Dell R515 running iscsi target (budget storage).
We have Dell Force10 switches and Cisco C3750X switches to connect everything together (mostly 10GB links).
All hosts run Windows Server 2012R2 Datacenter edition. The VMM server runs System Center Virtual Machine Manage 2012 R2.
All the latest Windows updates are installed on every host. There are no firewalls between any host (vmm and hypervisors) at this level. Windows firewalls are all disabled. No antivirus software is installed, no symantec software is installed.
The only non-standard software that is installed is the Dell Host Integration Tools 4.7.1, Dell Openmanage Server Administrator, and some small stuff like 7-zip, bginfo, net-snap, etc.
The SCVMM service is running under the domain account DOMAINCLOUD1\scvmm. This machine is in the local administrators group of each cluster node.
On top of this cloud layer we're running the tenant layer with a lot of vm's for a specific customer (although they are all off now).

I think I found the culprit, after an hour of analyzing wireshark dumps I found the vmm had jumbo frames enabled on the management interface to the hosts (and the underlying infrastructure does not).. Now my winrm commands started working again.

INS-40925 - One or more nodes have interfaces not configured with a subnet that is common across all cluster nodes.

Hi All,
I am facing the below error while installing Oracle RAC in Silent Mode.
SEVERE: There are no common subnets represented by network interfaces across all cluster nodes.
SEVERE: [FATAL] [INS-40925] One or more nodes have interfaces not configured with a subnet that is common across all cluster nodes.
   CAUSE: Not all nodes have network interfaces that are configured on subnets that are common to all nodes in the cluster.
   ACTION: Ensure all cluster nodes have a public interface defined with the same subnet accessible by all nodes in the cluster.
My /etc/hosts is given below.
127.0.0.1        localhost    localhost.localdomain
#Public
192.168.1.101      rac1        rac1.localdomain
192.168.1.102    rac2        rac2.localdomain
#Private
192.168.2.101    rac1-priv    rac1-priv.localdomain
192.168.2.102    rac2-priv    rac2-priv.localdomain
#Virtual
192.168.1.103      rac1-vip    rac1-vip.localdomain
192.168.1.104    rac2-vip    rac2-vip.localdomain
#SCAN
192.168.1.105    rac-scan    rac-scan.localdomain
Could you please help me to get rid of the error INS-40925....Any Idea...???

Hi Ramesh,
Please find the result of ifconfig -a from both nodes RAC1 & RAC2.
ifconfig -a in RAC1
[oracle@rac1 Desktop]$ ifconfig -a
eth0      Link encap:Ethernet HWaddr 08:00:27:17:7A:D5
          inet addr:192.168.1.101 Bcast:192.168.1.255 Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe17:7ad5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:102 errors:0 dropped:0 overruns:0 frame:0
          TX packets:48 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:25472 (24.8 KiB) TX bytes:3322 (3.2 KiB)
          Interrupt:19 Base address:0xd020
eth1      Link encap:Ethernet HWaddr 08:00:27:C0:AC:DB
          inet addr:192.168.2.101 Bcast:192.168.2.255 Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fec0:acdb/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:4 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:240 (240.0 b) TX bytes:816 (816.0 b)
          Interrupt:16 Base address:0xd240
lo        Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:56 errors:0 dropped:0 overruns:0 frame:0
          TX packets:56 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:6394 (6.2 KiB) TX bytes:6394 (6.2 KiB)
virbr0    Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
          inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
virbr0-nic Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
ifconfig -a in RAC2
[oracle@rac2 Desktop]$ ifconfig -a
eth0      Link encap:Ethernet HWaddr 08:00:27:C9:38:82
          inet addr:192.168.1.102 Bcast:192.168.1.255 Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fec9:3882/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:122 errors:0 dropped:0 overruns:0 frame:0
          TX packets:59 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:32617 (31.8 KiB) TX bytes:5157 (5.0 KiB)
          Interrupt:19 Base address:0xd020
eth1      Link encap:Ethernet HWaddr 08:00:27:90:B5:A0
          inet addr:192.168.2.102 Bcast:192.168.2.255 Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe90:b5a0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:4 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:240 (240.0 b) TX bytes:746 (746.0 b)
          Interrupt:16 Base address:0xd240
lo        Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:56 errors:0 dropped:0 overruns:0 frame:0
          TX packets:56 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:6390 (6.2 KiB) TX bytes:6390 (6.2 KiB)
virbr0    Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
          inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
virbr0-nic Link encap:Ethernet HWaddr 52:54:00:CC:BD:FB
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

Processing in Multiple Cluster Nodes

Hi All,
In our PI system we have 2 Java nodes due to some requirement. When the communication channel runs and we check the message log, in one Cluster node we have a successful message. In other Cluster Node we have an error message that says "File not found".
The file processing is completeing successfully on one Cluster node. But I wanted to know if there is any way to suppress the processing of the same file by same channel on another Node. Some setting in administration or IB where we can get this done.
Is there any way to get this done by some setting?
Thanks,
Rashmi.

Hello!
As per note #801926, please set the clusterSyncMode parameter on Advanced tab of the communication channel with LOCK value.
And also check the entries 4 and 48 of the FAQ note #821267:
4. FTP Sender File Processing in Cluster Environment
48. File System(NFS) File Sender Processing in Cluster Environment
Best regards,
Lucas

Distributed Transaction Coordinator not displaying remotely on a server core cluster node..

We setup a server core single node cluster (W2012 R2). The MS DTC is running, and the Distributed Transaction Coordinator firewall rules are enabled. I can connect to the firewall rules and compmgmt.msc remotely for this server. When
I attempt to connect to the Component Services Management console with this server, the MS DTC object is not displayed. Only the COM+ applications are displayed. We've setup other server core instances with out clustering and DTC is displayed.

Hi Steve,
We just finished the local test and found this behavior is by design. It’s expected that we can’t see the DTC remotely from component services for a cluster node.
In my lab, 2012R2 node3 is a single node cluster and has DTC role.

Error: Halting this cluster node due to unrecoverable service failure

Our cluster has experienced some sort of fault that has only become apparent today. The origin appears to have been nearly a month ago yet the symptoms have only just manifested.
The node in question is a standalone instance running a DistributedCache service with local storage. It output the following to stdout on Jan-22:
Coherence <Error>: Halting this cluster node due to unrecoverable service failure
It finally failed today with OutOfMemoryError: Java heap space.
We're running coherence-3.5.2.jar.
Q1: It looks like this node failed on Jan-22 yet we did not notice. What is the best way to monitor node health?
Q2: What might the root cause be for such a fault?
I found the following in the logs:
2011-01-22 01:18:58,296 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:18:58.296/9910749.462 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Attempting recovery (due to soft timeout) of Guard{Daemon=DistributedCache}
2011-01-22 01:18:58,296 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:18:58.296/9910749.462 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Attempting recovery (due to soft timeout) of Guard{Daemon=DistributedCache}
2011-01-22 01:19:04,772 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:04.772/9910755.938 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Terminating guarded execution (due to hard timeout) of Guard{Daemon=DistributedCache}
2011-01-22 01:19:04,772 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:04.772/9910755.938 Oracle Coherence EE 3.5.2/463 <Error> (thread=Cluster, member=33): Terminating guarded execution (due to hard timeout) of Guard{Daemon=DistributedCache}
2011-01-22 01:19:05,785 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:05.785/9910756.951 Oracle Coherence EE 3.5.2/463 <Error> (thread=Termination Thread, member=33): Full Thread Dump
Thread[Reference Handler,10,system]
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
Thread[DistributedCache,5,Cluster]
java.nio.Bits.copyToByteArray(Native Method)
java.nio.DirectByteBuffer.get(DirectByteBuffer.java:224)
com.tangosol.io.nio.ByteBufferInputStream.read(ByteBufferInputStream.java:123)
java.io.DataInputStream.readFully(DataInputStream.java:178)
java.io.DataInputStream.readFully(DataInputStream.java:152)
com.tangosol.util.Binary.readExternal(Binary.java:1066)
com.tangosol.util.Binary.<init>(Binary.java:183)
com.tangosol.io.nio.BinaryMap$Block.readValue(BinaryMap.java:4304)
com.tangosol.io.nio.BinaryMap$Block.getValue(BinaryMap.java:4130)
com.tangosol.io.nio.BinaryMap.get(BinaryMap.java:377)
com.tangosol.io.nio.BinaryMapStore.load(BinaryMapStore.java:64)
com.tangosol.net.cache.SerializationPagedCache$WrapperBinaryStore.load(SerializationPagedCache.java:1547)
com.tangosol.net.cache.SerializationPagedCache$PagedBinaryStore.load(SerializationPagedCache.java:1097)
com.tangosol.net.cache.SerializationMap.get(SerializationMap.java:121)
com.tangosol.net.cache.SerializationPagedCache.get(SerializationPagedCache.java:247)
com.tangosol.net.cache.AbstractSerializationCache$1.getOldValue(AbstractSerializationCache.java:315)
com.tangosol.net.cache.OverflowMap$Status.registerBackEvent(OverflowMap.java:4210)
com.tangosol.net.cache.OverflowMap.onBackEvent(OverflowMap.java:2316)
com.tangosol.net.cache.OverflowMap$BackMapListener.onMapEvent(OverflowMap.java:4544)
com.tangosol.util.MultiplexingMapListener.entryDeleted(MultiplexingMapListener.java:49)
com.tangosol.util.MapEvent.dispatch(MapEvent.java:214)
com.tangosol.util.MapEvent.dispatch(MapEvent.java:166)
com.tangosol.util.MapListenerSupport.fireEvent(MapListenerSupport.java:556)
com.tangosol.net.cache.AbstractSerializationCache.dispatchEvent(AbstractSerializationCache.java:338)
com.tangosol.net.cache.AbstractSerializationCache.dispatchPendingEvent(AbstractSerializationCache.java:321)
com.tangosol.net.cache.AbstractSerializationCache.removeBlind(AbstractSerializationCache.java:155)
com.tangosol.net.cache.SerializationPagedCache.removeBlind(SerializationPagedCache.java:348)
com.tangosol.util.AbstractKeyBasedMap$KeySet.remove(AbstractKeyBasedMap.java:556)
com.tangosol.net.cache.OverflowMap.removeInternal(OverflowMap.java:1299)
com.tangosol.net.cache.OverflowMap.remove(OverflowMap.java:380)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$Storage.clear(DistributedCache.CDB:24)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onClearRequest(DistributedCache.CDB:32)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ClearRequest.run(DistributedCache.CDB:1)
com.tangosol.coherence.component.net.message.requestMessage.DistributedCacheRequest.onReceived(DistributedCacheRequest.CDB:12)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:9)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:136)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onNotify(DistributedCache.CDB:3)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Finalizer,8,system]
java.lang.Object.wait(Native Method)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
Thread[PacketReceiver,7,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketReceiver.onWait(PacketReceiver.CDB:2)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[RMI TCP Accept-0,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
Thread[PacketSpeaker,8,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.queue.ConcurrentQueue.waitForEntry(ConcurrentQueue.CDB:16)
com.tangosol.coherence.component.util.queue.ConcurrentQueue.remove(ConcurrentQueue.CDB:7)
com.tangosol.coherence.component.util.Queue.remove(Queue.CDB:1)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketSpeaker.onNotify(PacketSpeaker.CDB:62)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Logger@9216774 3.5.2/463,3,main]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[PacketListener1,8,Cluster]
java.net.PlainDatagramSocketImpl.receive0(Native Method)
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
java.net.DatagramSocket.receive(DatagramSocket.java:712)
com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[main,5,main]
java.lang.Object.wait(Native Method)
com.tangosol.net.DefaultCacheServer.main(DefaultCacheServer.java:79)
com.networkfleet.cacheserver.Launcher.main(Launcher.java:122)
Thread[Signal Dispatcher,9,system]
Thread[RMI TCP Accept-41006,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
ThreadCluster
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[TcpRingListener,6,Cluster]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
com.tangosol.coherence.component.net.socket.TcpSocketAccepter.accept(TcpSocketAccepter.CDB:18)
com.tangosol.coherence.component.util.daemon.TcpRingListener.acceptConnection(TcpRingListener.CDB:10)
com.tangosol.coherence.component.util.daemon.TcpRingListener.onNotify(TcpRingListener.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[PacketPublisher,6,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher.onWait(PacketPublisher.CDB:2)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[RMI TCP Accept-0,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
Thread[PacketListenerN,8,Cluster]
java.net.PlainDatagramSocketImpl.receive0(Native Method)
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
java.net.DatagramSocket.receive(DatagramSocket.java:712)
com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Invocation:Management,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[DistributedCache:PofDistributedCache,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[Invocation:Management:EventDispatcher,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.Service$EventDispatcher.onWait(Service.CDB:7)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[Termination Thread,5,Cluster]
java.lang.Thread.dumpThreads(Native Method)
java.lang.Thread.getAllStackTraces(Thread.java:1487)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
com.tangosol.net.GuardSupport.logStackTraces(GuardSupport.java:791)
com.tangosol.coherence.component.net.Cluster.onServiceFailed(Cluster.CDB:5)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$Guard.terminate(Grid.CDB:17)
com.tangosol.net.GuardSupport$2.run(GuardSupport.java:652)
java.lang.Thread.run(Thread.java:619)
2011-01-22 01:19:05,785 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:05.785/9910756.951 Oracle Coherence EE 3.5.2/463 <Error> (thread=Termination Thread, member=33): Full Thread Dump
Thread[Reference Handler,10,system]
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
Thread[DistributedCache,5,Cluster]
java.nio.Bits.copyToByteArray(Native Method)
java.nio.DirectByteBuffer.get(DirectByteBuffer.java:224)
com.tangosol.io.nio.ByteBufferInputStream.read(ByteBufferInputStream.java:123)
java.io.DataInputStream.readFully(DataInputStream.java:178)
java.io.DataInputStream.readFully(DataInputStream.java:152)
com.tangosol.util.Binary.readExternal(Binary.java:1066)
com.tangosol.util.Binary.<init>(Binary.java:183)
com.tangosol.io.nio.BinaryMap$Block.readValue(BinaryMap.java:4304)
com.tangosol.io.nio.BinaryMap$Block.getValue(BinaryMap.java:4130)
com.tangosol.io.nio.BinaryMap.get(BinaryMap.java:377)
com.tangosol.io.nio.BinaryMapStore.load(BinaryMapStore.java:64)
com.tangosol.net.cache.SerializationPagedCache$WrapperBinaryStore.load(SerializationPagedCache.java:1547)
com.tangosol.net.cache.SerializationPagedCache$PagedBinaryStore.load(SerializationPagedCache.java:1097)
com.tangosol.net.cache.SerializationMap.get(SerializationMap.java:121)
com.tangosol.net.cache.SerializationPagedCache.get(SerializationPagedCache.java:247)
com.tangosol.net.cache.AbstractSerializationCache$1.getOldValue(AbstractSerializationCache.java:315)
com.tangosol.net.cache.OverflowMap$Status.registerBackEvent(OverflowMap.java:4210)
com.tangosol.net.cache.OverflowMap.onBackEvent(OverflowMap.java:2316)
com.tangosol.net.cache.OverflowMap$BackMapListener.onMapEvent(OverflowMap.java:4544)
com.tangosol.util.MultiplexingMapListener.entryDeleted(MultiplexingMapListener.java:49)
com.tangosol.util.MapEvent.dispatch(MapEvent.java:214)
com.tangosol.util.MapEvent.dispatch(MapEvent.java:166)
com.tangosol.util.MapListenerSupport.fireEvent(MapListenerSupport.java:556)
com.tangosol.net.cache.AbstractSerializationCache.dispatchEvent(AbstractSerializationCache.java:338)
com.tangosol.net.cache.AbstractSerializationCache.dispatchPendingEvent(AbstractSerializationCache.java:321)
com.tangosol.net.cache.AbstractSerializationCache.removeBlind(AbstractSerializationCache.java:155)
com.tangosol.net.cache.SerializationPagedCache.removeBlind(SerializationPagedCache.java:348)
com.tangosol.util.AbstractKeyBasedMap$KeySet.remove(AbstractKeyBasedMap.java:556)
com.tangosol.net.cache.OverflowMap.removeInternal(OverflowMap.java:1299)
com.tangosol.net.cache.OverflowMap.remove(OverflowMap.java:380)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$Storage.clear(DistributedCache.CDB:24)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onClearRequest(DistributedCache.CDB:32)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache$ClearRequest.run(DistributedCache.CDB:1)
com.tangosol.coherence.component.net.message.requestMessage.DistributedCacheRequest.onReceived(DistributedCacheRequest.CDB:12)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:9)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:136)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.DistributedCache.onNotify(DistributedCache.CDB:3)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Finalizer,8,system]
java.lang.Object.wait(Native Method)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
Thread[PacketReceiver,7,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketReceiver.onWait(PacketReceiver.CDB:2)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[RMI TCP Accept-0,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
Thread[PacketSpeaker,8,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.queue.ConcurrentQueue.waitForEntry(ConcurrentQueue.CDB:16)
com.tangosol.coherence.component.util.queue.ConcurrentQueue.remove(ConcurrentQueue.CDB:7)
com.tangosol.coherence.component.util.Queue.remove(Queue.CDB:1)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketSpeaker.onNotify(PacketSpeaker.CDB:62)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Logger@9216774 3.5.2/463,3,main]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[PacketListener1,8,Cluster]
java.net.PlainDatagramSocketImpl.receive0(Native Method)
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
java.net.DatagramSocket.receive(DatagramSocket.java:712)
com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[main,5,main]
java.lang.Object.wait(Native Method)
com.tangosol.net.DefaultCacheServer.main(DefaultCacheServer.java:79)
com.networkfleet.cacheserver.Launcher.main(Launcher.java:122)
Thread[Signal Dispatcher,9,system]
Thread[RMI TCP Accept-41006,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
ThreadCluster
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[TcpRingListener,6,Cluster]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
com.tangosol.coherence.component.net.socket.TcpSocketAccepter.accept(TcpSocketAccepter.CDB:18)
com.tangosol.coherence.component.util.daemon.TcpRingListener.acceptConnection(TcpRingListener.CDB:10)
com.tangosol.coherence.component.util.daemon.TcpRingListener.onNotify(TcpRingListener.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[PacketPublisher,6,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketPublisher.onWait(PacketPublisher.CDB:2)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[RMI TCP Accept-0,5,system]
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
java.net.ServerSocket.implAccept(ServerSocket.java:453)
java.net.ServerSocket.accept(ServerSocket.java:421)
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
java.lang.Thread.run(Thread.java:619)
Thread[PacketListenerN,8,Cluster]
java.net.PlainDatagramSocketImpl.receive0(Native Method)
java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
java.net.DatagramSocket.receive(DatagramSocket.java:712)
com.tangosol.coherence.component.net.socket.UdpSocket.receive(UdpSocket.CDB:20)
com.tangosol.coherence.component.net.UdpPacket.receive(UdpPacket.CDB:4)
com.tangosol.coherence.component.util.daemon.queueProcessor.packetProcessor.PacketListener.onNotify(PacketListener.CDB:19)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
java.lang.Thread.run(Thread.java:619)
Thread[Invocation:Management,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[DistributedCache:PofDistributedCache,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onWait(Grid.CDB:9)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[Invocation:Management:EventDispatcher,5,Cluster]
java.lang.Object.wait(Native Method)
com.tangosol.coherence.component.util.Daemon.onWait(Daemon.CDB:18)
com.tangosol.coherence.component.util.daemon.queueProcessor.Service$EventDispatcher.onWait(Service.CDB:7)
com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:39)
java.lang.Thread.run(Thread.java:619)
Thread[Termination Thread,5,Cluster]
java.lang.Thread.dumpThreads(Native Method)
java.lang.Thread.getAllStackTraces(Thread.java:1487)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
com.tangosol.net.GuardSupport.logStackTraces(GuardSupport.java:791)
com.tangosol.coherence.component.net.Cluster.onServiceFailed(Cluster.CDB:5)
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$Guard.terminate(Grid.CDB:17)
com.tangosol.net.GuardSupport$2.run(GuardSupport.java:652)
java.lang.Thread.run(Thread.java:619)
2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 INFO 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Info> (thread=main, member=33): Restarting Service: DistributedCache
2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 INFO 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Info> (thread=main, member=33): Restarting Service: DistributedCache
2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Error> (thread=main, member=33): Failed to restart services: java.lang.IllegalStateException: Failed to unregister: Distr
butedCache{Name=DistributedCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=257, BackupCount=1, AssignedPartitions=16, BackupPartitions=16}
2011-01-22 01:19:06,738 Coherence Logger@9216774 3.5.2/463 ERROR 2011-01-22 01:19:06.738/9910757.904 Oracle Coherence EE 3.5.2/463 <Error> (thread=main, member=33): Failed to restart services: java.lang.IllegalStateException: Failed to unregister: Distr
butedCache{Name=DistributedCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=257, BackupCount=1, AssignedPartitions=16, BackupPartitions=16}

Hi
It seems like the problem in this case is the call to clear() which will try to load all entries stored in the overflow scheme to emit potential cache events to listeners. This probably requires much more memory than there is Java heap available, hence the OOM.
Our recommendation in this case is to call destroy() since this will bypass the even firing.
/Charlie

WDRuntimeException: Failed to create J2EE cluster node in SLD

Hello,
I am getting the below error, but to my knowledge I have everything set up properly. Let me briefly outline the logistics (I am running everything LOCALLY (will move to remote later)):
WAS 6.4 <b>SP12</b>
Set up JCo and tests fine
Set up Visual Administrator / SLD Data Supplier / HTTP and CIM configured and seem to test fine
Created SLD and it tests OK
Created Technical Landscape
I have noticed that in SP12, in the SLD config I actually have a NEW category called "<b>System Landscape</b>" above my "Technical Landscape" link. I have not seen this option in previous versions SP9 or SP11. Is it mandatory to configure this?
Also, I created a model for Adaptive RFC and found the function I needed successfully.
Anyway, here is the error when trying to deploy...
com.sap.tc.webdynpro.services.exceptions.WDRuntimeException: Error while obtaining JCO connection.
at com.sap.tc.webdynpro.services.datatypes.core.DataTypeBroker$1.fillSldConnection(DataTypeBroker.java:90)
Caused by: com.sap.tc.webdynpro.services.sal.sl.api.WDSystemLandscapeException: Error while obtaining JCO connection.
Caused by: com.sap.tc.webdynpro.services.exceptions.WDRuntimeException: Failed to create J2EE cluster node in SLD for 'J2E.SystemHome.bc347792': com.sap.lcr.api.cimclient.LcrException: CIM_ERR_NOT_FOUND: No such instance: SAP_J2EEEngineCluster.CreationClassName="SAP_J2EEEngineCluster",Name="J2E.SystemHome.bc347792"
Any help will be appreciated!

I figured it out for those that may have a similar problem.
Although I had created and tested my JCo's properly and they were working fine, somehow, and I still don't know why, they went RED in the JCo Maintainence screen.
I had to "create" again and it works fine now.

Add cluster nodes from multiple machines to WebLogic domain in OEM 10.2.0.5

Hello,
I want to monitor a WebLogic domain in Oracle Enterprise Manager 10.2.0.5 with the following layout:
- Admin server on machine 1
- managed server, cluster node a on machine 2
- managed server, cluster node b on machine 3
How can I do this?
When I go to "Add Weblogic Domain", I can enter the admin adress (machine 1) and tick the box to say that there is an agent running on another host (where I specify machine 2). However I do not see a possibility to discover managed servers from machine 3.
Does anyone know how to do this?
Thanks,
Nadja

LSNRCTL> status
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
Alias LISTENER
Version TNSLSNR for Linux: Version 11.1.0.6.0 - Production
Start Date 28-JAN-2010 00:36:10
Uptime 0 days 17 hr. 11 min. 52 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /oracle/app/oracle/product/11.1.0/db/network/admin/listener.ora
Listener Log File /oracle/app/oracle/diag/tnslsnr/corp1052/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=corp1052)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM2", status READY, has 1 handler(s) for this service...
Service "+ASM_XPT" has 1 instance(s).
Instance "+ASM2", status READY, has 1 handler(s) for this service...
Service "dex.example.com" has 2 instance(s).
Instance "dex1", status READY, has 1 handler(s) for this service...
Instance "dex2", status READY, has 2 handler(s) for this service...
Service "dexXDB.example.com" has 2 instance(s).
Instance "dex1", status READY, has 1 handler(s) for this service...
Instance "dex2", status READY, has 1 handler(s) for this service...
Service "dex_XPT.example.com" has 2 instance(s).
Instance "dex1", status READY, has 1 handler(s) for this service...
Instance "dex2", status READY, has 2 handler(s) for this service...
The command completed successfully
The output of SQLPlus:
[oracle@dbhost: db]$ bin/sqlplus dex@DEX
SQL*Plus: Release 11.1.0.6.0 - Production on Thu Jan 28 18:40:11 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Enter password:
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

Cluster Shared Volume is no longer accessible from cluster node

Hello,
We have a 3 nodes Hyper-v Cluster running Windows Server 2012. Recently we start having error below intermittently on a node, and the VMs running on this host and LUN will power off.
Alert: Cluster Shared Volume is no longer accessible from cluster node
Source: Cluster Service
Path: HV01.itl.local
Last modified by: System
Last modified time: 12/1/2013 12:27:18 AM
Alert description: Cluster Shared Volume 'Volume1' ('Cluster_Vol1_R6') is no longer accessible from this cluster node because of error 'ERROR_TIMEOUT(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.
The only changes made recently is we installed VEEAM on test basis for DR replication. We switched off the Veeam server and stop the Veeam Services on the Hyper-V Hosts but we are still having same issue.
We are using an EMC SAN connected via FC as Shared storage and Powerpath as Multi-Pathing. No errors were found on the SAN.
I don't think the issue is related to the number of IO as we also experienced the issue at midnight during the week-end where no one was working.
Any help would be very much appreciated.
Thanks.
Irfan
Irfan Goolab SALES ENGINEER (Microsoft UC) MCP, MCSA, MCTS, MCITP, MCT

Hi,
Also, try to install the following recommend KBs.
Recommended hotfixes and updates for Windows Server 2012-based Failover Clusters
http://support.microsoft.com/kb/2784261
Also, there please confirm your VSS provider have the correct version.
The third party article:
VSS Provider with 2012 HyperV and CSV
https://community.emc.com/thread/170636
Thanks.
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

Cluster Node parameter file

Hi all,
I have two node cluster. should i set the remote_login_passwordfile=exculsive or share. I think in RAC environment we can have remote_login_passwordfile share or exclusive. under share mode where will be place the init.ora file to be shared by all the instances. Under exclusive mode we can have instance with different configurations. Is my understanding right?
regards
NICK

You are confusing password file with initialization parameter files.
remote_login_passwordfile deals with sharing of password file among different databases (not instances).
If you want to share init file in RAC, you should create local init files on each node that will be pointing to a single spfile in the shared location.

Cluster Node paused

Hi there
My Setup:
2 Cluster Nodes (HP DL380 G7 & HP DL380 Gen8)
HP P2000 G3 FC MSA (MPIO)
The Gen8 Cluster Node pauses after a few minutes, but stays online if the G7 is paused (no drain) My troubleshooting has led me to believe that there is a problem with the Cluster Shared Volume:
00001508.000010b4::2015/02/19-14:51:14.189 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:cf2dec1d-ee88-4fb6-a86d-0c2d1aa888b4:Netbios
00000d1c.0000299c::2015/02/19-14:51:14.615 INFO [API] s_ApiGetQuorumResource final status 0.
00000d1c.0000299c::2015/02/19-14:51:14.616 INFO [RCM [RES] Virtual Machine VirtualMachine1 embedded failure notification, code=0 _isEmbeddedFailure=false _embeddedFailureAction=2
00001508.000010b4::2015/02/19-14:51:15.010 INFO [RES] Network Name <Cluster Name>: Getting Read only private properties
00000d1c.00002294::2015/02/19-14:51:15.096 INFO [API] s_ApiGetQuorumResource final status 0.
00000d1c.00002294::2015/02/19-14:51:15.121 INFO [API] s_ApiGetQuorumResource final status 0.
000014a8.000024f4::2015/02/19-14:51:15.269 INFO [RES] Physical Disk <Quorum>: VolumeIsNtfs: Volume
\\?\GLOBALROOT\Device\Harddisk1\ClusterPartition2\ has FS type NTFS
00000d1c.00002294::2015/02/19-14:51:15.343 WARN [RCM] ResourceTypeChaseTheOwnerLoop::DoCall: ResType MSMQ's DLL is not present on this node. Attempting to find a good node...
00000d1c.00002294::2015/02/19-14:51:15.352 WARN [RCM] ResourceTypeChaseTheOwnerLoop::DoCall: ResType MSMQTriggers's DLL is not present on this node. Attempting to find a good node...
000014a8.000024f4::2015/02/19-14:51:15.386 INFO [RES] Physical Disk: HardDiskpQueryDiskFromStm: ClusterStmFindDisk returned device='\\?\mpio#disk&ven_hp&prod_p2000_g3_fc&rev_t250#1&7f6ac24&0&36304346463030314145374646423434393243353331303030#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
000014a8.000024f4::2015/02/19-14:51:15.386 ERR   [RES] Physical Disk: HardDiskpGetDiskInfo: GetVolumeInformation failed for
\\?\GLOBALROOT\Device\Harddisk3\ClusterPartition2\, status 3
000014a8.000024f4::2015/02/19-14:51:15.386 ERR   [RES] Physical Disk: HardDiskpGetDiskInfo: failed to get partition size for
\\?\GLOBALROOT\Device\Harddisk3\ClusterPartition2\, status 3
00000d1c.00001420::2015/02/19-14:51:15.847 WARN [RCM] ResourceTypeChaseTheOwnerLoop::DoCall: ResType MSMQ's DLL is not present on this node. Attempting to find a good node...
00000d1c.00001420::2015/02/19-14:51:15.855 WARN [RCM] ResourceTypeChaseTheOwnerLoop::DoCall: ResType MSMQTriggers's DLL is not present on this node. Attempting to find a good node...
000014a8.000024f4::2015/02/19-14:51:15.887 INFO [RES] Physical Disk: HardDiskpQueryDiskFromStm: ClusterStmFindDisk returned device='\\?\mpio#disk&ven_hp&prod_p2000_g3_fc&rev_t250#1&7f6ac24&0&36304346463030314145374646423434393243353331303030#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
000014a8.000024f4::2015/02/19-14:51:15.888 ERR   [RES] Physical Disk: HardDiskpGetDiskInfo: GetVolumeInformation failed for
\\?\GLOBALROOT\Device\Harddisk3\ClusterPartition2\, status 3
000014a8.000024f4::2015/02/19-14:51:15.888 ERR   [RES] Physical Disk: HardDiskpGetDiskInfo: failed to get partition size for
\\?\GLOBALROOT\Device\Harddisk3\ClusterPartition2\, status 3
00000d1c.00001420::2015/02/19-14:51:15.928 WARN [RCM] ResourceTypeChaseTheOwnerLoop::DoCall: ResType MSMQ's DLL is not present on this node. Attempting to find a good node...
00000d1c.00001420::2015/02/19-14:51:15.939 WARN [RCM] ResourceTypeChaseTheOwnerLoop::DoCall: ResType MSMQTriggers's DLL is not present on this node. Attempting to find a good node...
000014a8.000024f4::2015/02/19-14:51:15.968 INFO [RES] Physical Disk: HardDiskpQueryDiskFromStm: ClusterStmFindDisk returned device='\\?\mpio#disk&ven_hp&prod_p2000_g3_fc&rev_t250#1&7f6ac24&0&36304346463030314145374646423434393243353331303030#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
000014a8.000024f4::2015/02/19-14:51:15.969 ERR   [RES] Physical Disk: HardDiskpGetDiskInfo: GetVolumeInformation failed for
\\?\GLOBALROOT\Device\Harddisk3\ClusterPartition2\, status 3
000014a8.000024f4::2015/02/19-14:51:15.969 ERR   [RES] Physical Disk: HardDiskpGetDiskInfo: failed to get partition size for
\\?\GLOBALROOT\Device\Harddisk3\ClusterPartition2\, status 3
00000d1c.00001420::2015/02/19-14:51:16.005 WARN [RCM] ResourceTypeChaseTheOwnerLoop::DoCall: ResType MSMQ's DLL is not present on this node. Attempting to find a good node...
00000d1c.00001420::2015/02/19-14:51:16.015 WARN [RCM] ResourceTypeChaseTheOwnerLoop::DoCall: ResType MSMQTriggers's DLL is not present on this node. Attempting to find a good node...
000014a8.000024f4::2015/02/19-14:51:16.059 INFO [RES] Physical Disk: HardDiskpQueryDiskFromStm: ClusterStmFindDisk returned device='\\?\mpio#disk&ven_hp&prod_p2000_g3_fc&rev_t250#1&7f6ac24&0&36304346463030314145374646423434393243353331303030#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
000014a8.000024f4::2015/02/19-14:51:16.059 ERR   [RES] Physical Disk: HardDiskpGetDiskInfo: GetVolumeInformation failed for
\\?\GLOBALROOT\Device\Harddisk3\ClusterPartition2\, status 3
000014a8.000024f4::2015/02/19-14:51:16.059 ERR   [RES] Physical Disk: HardDiskpGetDiskInfo: failed to get partition size for
\\?\GLOBALROOT\Device\Harddisk3\ClusterPartition2\, status 3
00000d1c.00002568::2015/02/19-14:51:17.110 INFO [GEM] Node 1: Deleting [2:395 , 2:396] (both included) as it has been ack'd by every node
00000d1c.0000299c::2015/02/19-14:51:17.444 INFO [RCM [RES] Virtual Machine VirtualMachine2 embedded failure notification, code=0 _isEmbeddedFailure=false _embeddedFailureAction=2
00000d1c.0000299c::2015/02/19-14:51:18.103 INFO [RCM] rcm::DrainMgr::PauseNodeNoDrain: [DrainMgr] PauseNodeNoDrain
00000d1c.0000299c::2015/02/19-14:51:18.103 INFO [GUM] Node 1: Processing RequestLock 1:164
00000d1c.00002568::2015/02/19-14:51:18.104 INFO [GUM] Node 1: Processing GrantLock to 1 (sent by 2 gumid: 1470)
00000d1c.0000299c::2015/02/19-14:51:18.104 INFO [GUM] Node 1: executing request locally, gumId:1471, my action: /nsm/stateChange, # of updates: 1
00000d1c.00001420::2015/02/19-14:51:18.104 INFO [DM] Starting replica transaction, paxos: 99:99:50133, smartPtr: HDL( c9b16cf1e0 ), internalPtr: HDL( c9b21
This issue has been bugging me for some time now. The Cluster is fully functional and works great until the node gets paused again. I've read somewhere that the MSMQ errors can be ignored, but can't find anything about the
HardDiskpGetDiskInfo: GetVolumeInformation failed messages. No errors in the san or the Server Event logs. Driver and Firmware are up to date. Any help would be greatly appreciated.
Best regards

Thank you for your replies.
First some information I left out in my original post. We're using Windows Server 2012 R2 Datacenter and are currently only hosting virtual machines on the cluster.
I did some testing over the weekend, including a firmware update on the san and cluster validation.
The problem doesn't seem to be related to backup. We use Microsoft DPM to make a full express backup once every day, the getvolumeinformation Failed error gets logged periodically every half an hour.
Excerpts from the validation report:
Validate Disk Failover
Description: Validate that a disk can fail over successfully with
data intact.
Start: 21.02.2015 18:02:17.
Node Node2 holds the SCSI PR on Test Disk 3
and brought the disk online, but failed in its attempt to write file data to
partition table entry 1. The disk structure is corrupted and
unreadable.
Stop: 21.02.2015 18:02:37.
Node Node1 holds the SCSI PR on Test Disk 3
and brought the disk online, but failed in its attempt to write file data to
partition table entry 1. The disk structure is corrupted and unreadable.
Validate File System
Description: Validate that the file system on disks in shared
storage is supported by failover clusters and Cluster Shared Volumes (CSVs).
Failover cluster physical disk resources support NTFS, ReFS, FAT32, FAT, and
RAW. Only volumes formatted as NTFS or ReFS are accessible in disks added as
CSVs.
The test was canceled.
Validate Simultaneous Failover
Description: Validate that disks can fail over simultaneously with
data intact.
The test was canceled.
Validate Storage Spaces Persistent Reservation
Description: Validate that storage supports the SCSI-3 Persistent
Reservation commands needed by Storage Spaces to support clustering.
Start: 21.02.2015 18:01:00.
Verifying there are no Persistent Reservations, or Registration
keys, on Test Disk 3 from node Node1. Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY
using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0x30000000a for Test
Disk 3 from node Node1.
Issuing Persistent Reservation RESERVE on Test Disk 3 from node
Node1 using key 0x30000000a.
Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY
using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0x3000100aa for Test
Disk 3 from node Node2.
Issuing Persistent Reservation REGISTER using RESERVATION KEY
0x30000000a SERVICE ACTION RESERVATION KEY 0x30000000b for Test Disk 3 from node
Node1 to change the registered key while holding the
reservation for the disk.
Verifying there are no Persistent Reservations, or Registration
keys, on Test Disk 2 from node Node1.
Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY
using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0x20000000a for Test
Disk 2 from node Node1.
Issuing Persistent Reservation RESERVE on Test Disk 2 from node
Node1 using key 0x20000000a.
Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY
using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0x2000100aa for Test
Disk 2 from node Node2.
Issuing Persistent Reservation REGISTER using RESERVATION KEY
0x20000000a SERVICE ACTION RESERVATION KEY 0x20000000b for Test Disk 2 from node
Node1 to change the registered key while holding the
reservation for the disk.
Verifying there are no Persistent Reservations, or Registration
keys, on Test Disk 0 from node Node1.
Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY
using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0xa for Test Disk 0
from node Node1.
Issuing Persistent Reservation RESERVE on Test Disk 0 from node
Node1 using key 0xa.
Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY
using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0x100aa for Test Disk 0
from node Node2.
Issuing Persistent Reservation REGISTER using RESERVATION KEY
0xa SERVICE ACTION RESERVATION KEY 0xb for Test Disk 0 from node
Node1 to change the registered key while holding the
reservation for the disk.
Verifying there are no Persistent Reservations, or Registration
keys, on Test Disk 1 from node Node1.
Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY
using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0x10000000a for Test
Disk 1 from node Node1.
Issuing Persistent Reservation RESERVE on Test Disk 1 from node
Node1 using key 0x10000000a.
Issuing Persistent Reservation REGISTER AND IGNORE EXISTING KEY
using RESERVATION KEY 0x0 SERVICE ACTION RESERVATION KEY 0x1000100aa for Test
Disk 1 from node Node2.
Issuing Persistent Reservation REGISTER using RESERVATION KEY
0x10000000a SERVICE ACTION RESERVATION KEY 0x10000000b for Test Disk 1 from node
Node1 to change the registered key while holding the
reservation for the disk.
Failure. Persistent Reservation not present on Test Disk 3 from
node Node1 after successful call to update reservation holder's
registration key 0x30000000b.
Failure. Persistent Reservation not present on Test Disk 1 from
node Node1 after successful call to update reservation holder's
registration key 0x10000000b.
Failure. Persistent Reservation not present on Test Disk 0 from
node Node1 after successful call to update reservation holder's
registration key 0xb.
Failure. Persistent Reservation not present on Test Disk 2 from
node Node1 after successful call to update reservation holder's
registration key 0x20000000b.
Test Disk 0 does not support SCSI-3 Persistent Reservations
commands needed by clustered storage pools that use the Storage Spaces
subsystem. Some storage devices require specific firmware versions or settings
to function properly with failover clusters. Contact your storage administrator
or storage vendor for help with configuring the storage to function properly
with failover clusters that use Storage Spaces.
Test Disk 1 does not support SCSI-3 Persistent Reservations
commands needed by clustered storage pools that use the Storage Spaces
subsystem. Some storage devices require specific firmware versions or settings
to function properly with failover clusters. Contact your storage administrator
or storage vendor for help with configuring the storage to function properly
with failover clusters that use Storage Spaces.
Test Disk 2 does not support SCSI-3 Persistent Reservations
commands needed by clustered storage pools that use the Storage Spaces
subsystem. Some storage devices require specific firmware versions or settings
to function properly with failover clusters. Contact your storage administrator
or storage vendor for help with configuring the storage to function properly
with failover clusters that use Storage Spaces.
Test Disk 3 does not support SCSI-3 Persistent Reservations
commands needed by clustered storage pools that use the Storage Spaces
subsystem. Some storage devices require specific firmware versions or settings
to function properly with failover clusters. Contact your storage administrator
or storage vendor for help with configuring the storage to function properly
with failover clusters that use Storage Spaces.
Stop: 21.02.2015 18:01:02
Thank you for your help.
David

Local NFS / LDAP on cluster nodes

Similar Messages

Maybe you are looking for