Connecting to the cluster

Hi,
          I have a cluster running on a multihomed Solaris Server. It is made of two
          servers running on IP1 and IP2 (both using port 7001). It says in the
          documentation that I can put in the IPs of the two servers separated by a
          comma for the cluster address. So the address that I put in was
          "IP1:7001,IP2:7001" instead of a DNS name.
          How do I connect to the cluster. If I connect to IP1:7001 then I am
          connecting to the server and not the cluster, correct?
          I would appreciate any help,
          Mohammad

Sorry for the delay, I have been extremely busy...
          There are several alternatives -- DNS round-robin, hardware load balancers
          (e.g., Cisco LocalDirector), etc.
          Mica Cooper wrote:
          > Robert,
          >
          > WLBS is Windows Load Balancing Service. It was designed to only use 1 IP per
          > box. It comes on the server editions. Do you know of something else? Linux
          > or Solaris maybe?
          >
          > Mica
          >
          > "Robert Patrick" <[email protected]> wrote in message
          > news:[email protected]...
          > > Hi,
          > >
          > > I'm not sure what WLBS is but I guess my question is does it really need
          > to know
          > > that the IP addresses happen to point to the same machine? What happens
          > if you
          > > just tell it that there are twice as many machines as there really are?
          > >
          > > Robert
          > >
          > > Mica Cooper wrote:
          > >
          > > > Currently,
          > > > I am using WLBS to do the load balancing and it will only handle one IP
          > per
          > > > box. Do you know of any other software to do this?
          > > > Mica
          > > >
          > > > "Robert Patrick" <[email protected]> wrote in message
          > > > news:[email protected]...
          > > > > The real answer here is that the "cluster" is virtual -- it only
          > exists
          > > > because
          > > > > there are one or more server instances running. Regardless of how you
          > > > attain
          > > > > it, you are going to connect to one or more of the server instances.
          > > > There are
          > > > > several ways to do this as has been previously discussed (e.g., DNS
          > alias
          > > > that
          > > > > does DNS round-robin-style load-balancing, a comma-separated list of
          > IP
          > > > > addresses, etc.).
          > > > >
          > > > > Mica, as for your question, you need to teach your "load-balancing IP"
          > > > about all
          > > > > of the IP addresses for the servers in the cluster and not just "one
          > IP
          > > > per
          > > > > physical machine".
          > > > >
          > > > > Hope this helps,
          > > > > Robert
          > > > >
          > > > > Mica Cooper wrote:
          > > > >
          > > > > > Tao,
          > > > > >
          > > > > > Thats not what he is asking.
          > > > > >
          > > > > > He's asking how to call MyServerClusterName instead of trying to
          > call
          > > > > > different instances by IP. I would like to know how to do this also.
          > > > > > Currently we are calling a load balancing IP and it proxies to each
          > > > instance
          > > > > > but it only works for 1 instance per box and I need 2 instances per
          > box.
          > > > > >
          > > > > > Mica Cooper
          > > > > >
          > > > > > "Tao Zhang" <[email protected]> wrote in message
          > > > > > news:[email protected]...
          > > > > > > If you don't use DNS, you can write the IP1:7001,IP2:7001 in your
          > > > > > > PROVIDER_URL, and then transfer it to InitialContext.
          > > > > > >
          > > > > > >
          > > > > > > Mohammad Khan <[email protected]> wrote in message
          > > > > > > news:[email protected]...
          > > > > > > > Hi,
          > > > > > > >
          > > > > > > > I have a cluster running on a multihomed Solaris Server. It is
          > made
          > > > of
          > > > > > two
          > > > > > > > servers running on IP1 and IP2 (both using port 7001). It says
          > in
          > > > the
          > > > > > > > documentation that I can put in the IPs of the two servers
          > separated
          > > > by
          > > > > > a
          > > > > > > > comma for the cluster address. So the address that I put in was
          > > > > > > > "IP1:7001,IP2:7001" instead of a DNS name.
          > > > > > > >
          > > > > > > > How do I connect to the cluster. If I connect to IP1:7001 then I
          > am
          > > > > > > > connecting to the server and not the cluster, correct?
          > > > > > > >
          > > > > > > > I would appreciate any help,
          > > > > > > > Mohammad
          > > > > > > >
          > > > > > > >
          > > > > > > >
          > > > > > >
          > > > > > >
          > > > >
          > >

Similar Messages

Can not connect to a cluster and hyper-v manager(to one of my hosts) but virtual machines are running (hyperv2012r2)

Hello guys
I'm using a cluster hyper-v 2012 r2 with 9 hosts and after 12 hours without energy, it back again. my problem is:
my cluster do not back
all hyper-v hosts are and running hypervisor... all vms running... my problem is the cluster.... quorum do not back online...
i found the problem... the wmi is not working in one of my hosts and in that, i cant connect to this hyperv manager... from anywhere.... the virtual machines are running on this host... but i cant manage them...
i found an article talking about this and now, im running this script.... but nothing hapen
"C:\windows\system32\wbem> mofcomp.exe cluswmi.mof
Microsoft (R) MOF Compiler Version 6.3.9600.16384
Copyright (c) Microsoft Corp. 1997-2006. All rights reserved.
Parsing MOF file: cluswmi.mof
MOF file has been successfully parsed
Storing data in the repository..."
someone can help me?
tnx!

Tnx for the ansywer..
"When you say "my
cluster do not back" do you mean that the cluster is not running?"
yes... i cant connect to the cluster
Is "quorum
do not back online" another way you are saying the cluster is not running?
yes
Are you using disk witness or file share witness? Recommend
to use disk witness as it provides a more robust level of operation.
witness
Did you run the cluster validation wizard? Turn off the disk tests to prevent VMs from
failing over, but run all other tests. What errors/warnings do you get?
When i try to run validation tests, one of my hosts (same that i cant manage vitual machines), fail to join in tests but all others are tested

How to find whether any disk arry is connected in the system

Hi Guys,
Please help me regarding Disk Array.
1) What command(s) should I use to check whether any disk array is connected in the server or not.
2) Please share the method for,
For single server system (A single sun server is connected with exernal disk array).
For Cluster server system (Two servers are connected as a sun cluster and Disk array is connected to the cluster as resource).
3) If the disk array is from SUN or from other vendor, how can i know that?
4) How to find how many disks are available in the Disk array?
Thanks

Wow... I got a reply from nik !!!
Thanks for your reply..
*1) I have a confussion that format command only report the local disks not external disks(Disk array). Is that correct?*
2) We have a cluster server V890+V890 + Disk-Array(Sun StorageTek). The format output is below.
*(5,6,7 & 8 are disk array?)*
root# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e01924a4e1,0
1. c1t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e01a467d91,0
2. c1t2d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e019dd3151,0
3. c1t3d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e01a467321,0
4. c1t4d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e01922a001,0
5. c2t600A0B8000482D560000031B482A46BCd0 <SUN-CSM200_R-0619 cyl 51838 alt 2 hd 512 sec 64>
/scsi_vhci/ssd@g600a0b8000482d560000031b482a46bc
6. c2t600A0B8000482D5600000318482A46A6d0 <SUN-CSM200_R-0619 cyl 51838 alt 2 hd 512 sec 64>
/scsi_vhci/ssd@g600a0b8000482d5600000318482a46a6
7. c2t600A0B8000482DBC00000659482B59D3d0 <SUN-CSM200_R-0619 cyl 51838 alt 2 hd 512 sec 64>
/scsi_vhci/ssd@g600a0b8000482dbc00000659482b59d3
8. c2t600A0B8000482E200000049E482B599Ad0 <SUN-CSM200_R-0619 cyl 51838 alt 2 hd 512 sec 64>
/scsi_vhci/ssd@g600a0b8000482e200000049e482b599a
(End 0f output)
*3) For cfgadm output is given below*
root# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
PCI0 unknown empty unconfigured unknown
PCI1 unknown empty unconfigured unknown
PCI2 unknown empty unconfigured unknown
PCI3 unknown empty unconfigured unknown
PCI4 scsi/hp connected configured ok
PCI5 pci-pci/hp connected configured ok
PCI6 pci-pci/hp connected configured ok
PCI7 fibre/hp connected configured ok
PCI8 fibre/hp connected configured ok
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 fc-private connected configured unknown
c1::500000e01922a001 disk connected configured unknown
c1::500000e01924a4e1 disk connected configured unknown
c1::500000e019dd3151 disk connected configured unknown
c1::500000e01a467321 disk connected configured unknown
c1::500000e01a467d91 disk connected configured unknown
c1::508002000065adb9 ESI connected configured unknown
c3 scsi-bus connected configured unknown
c3::rmt/0 tape connected configured unknown
c4 scsi-bus connected unconfigured unknown
c5 scsi-bus connected unconfigured unknown
c6 scsi-bus connected configured unknown
c6::rmt/1 tape connected configured unknown
c7 scsi-bus connected unconfigured unknown
c8 scsi-bus connected unconfigured unknown
c9 fc-private connected configured unknown
c9::200600a0b8482dab disk connected configured unknown
c10 fc-private connected configured unknown
c10::200500a0b8482dbd disk connected configured unknown
c11 fc-private connected configured unknown
c11::200700a0b8482dab disk connected configured unknown
c12 fc-private connected configured unknown
c12::200400a0b8482dbd disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok
*4) root# cfgadm -al -o show_FCP_dev*
Ap_Id Type Receptacle Occupant Condition
c1 fc-private connected configured unknown
c1::500000e01922a001,0 disk connected configured unknown
c1::500000e01924a4e1,0 disk connected configured unknown
c1::500000e019dd3151,0 disk connected configured unknown
c1::500000e01a467321,0 disk connected configured unknown
c1::500000e01a467d91,0 disk connected configured unknown
c1::508002000065adb9 ESI connected configured unknown
c9 fc-private connected configured unknown
c9::200600a0b8482dab,0 disk connected configured unknown
c9::200600a0b8482dab,1 disk connected configured unknown
c9::200600a0b8482dab,31 disk connected configured unknown
c10 fc-private connected configured unknown
c10::200500a0b8482dbd,0 disk connected configured unknown
c10::200500a0b8482dbd,1 disk connected configured unknown
c10::200500a0b8482dbd,31 disk connected configured unknown
c11 fc-private connected configured unknown
c11::200700a0b8482dab,0 disk connected configured unknown
c11::200700a0b8482dab,1 disk connected configured unknown
c11::200700a0b8482dab,31 disk connected configured unknown
c12 fc-private connected configured unknown
c12::200400a0b8482dbd,0 disk connected configured unknown
c12::200400a0b8482dbd,1 disk connected configured unknown
c12::200400a0b8482dbd,31 disk connected configured unknown
Which part of output is indicating diskarry.. Please help me to understand....
FYI
root@# more /etc/vfstab
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#/dev/dsk/c1d0s2 /dev/rdsk/c1d0s2 /usr ufs 1 yes -
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/md/dsk/d103 - - swap - no -
/dev/md/dsk/d100 /dev/md/rdsk/d100 / ufs 1 no nologging
/dev/md/dsk/d115 /dev/md/rdsk/d115 /global/.devices/node@1 ufs 2 no global
swap - /tmp tmpfs - yes -
/dev/vx/dsk/ossdg/exporthome /dev/vx/rdsk/ossdg/exporthome /export/home ufs 2 no logging
/devices - /devices devfs - no -
ctfs - /system/contract ctfs - no -
objfs - /system/object objfs - no -
sharefs - /etc/dfs/sharetab sharefs - no -

Libproxy sp12 behind a firewaill fails to connect to weblogic cluster

Two WLS 5.1 SP11 clustering, iPlanet4.1 w/wlproxy plugin, Solaris Box. There is a firewall in between the iPlanet WebServer and the weblogic cluster. The NSAPI proxy plugin ( libproxy.so )connects successfully to one of the weblogic cluster for the first time using the NAT IPs of the cluster m/c maintained in the obj.conf. The Response contains the actual IP of the weblogic cluster m/c which gets updated in the proxy and uses that to connect to the cluster and firewall blocks that as it is not the NAT IP. But this works fine in libproxy of weblogic 6.1 unfortunately this has a problem when the data that gets passed is more than 1000 chars.

shouldn't the connect string look like jdbc:oracle:thin:@MyIP:1521:MySID ?

"Failed to connect to the service manager" when I try to add nodes to a cluster on Windows Server 2008 R2

Hello,
I get the following error message everytime I try to add a node to an existing cluster "Failed to connect to the service manager".
I'm running Windows Server 2008 R2,
Any ideas?

Hi saeedawadx,
Please run the cluster validation and post the error or warning information, in normal scenario the “Failed to connect to the service manager” issue often caused by the firewall
or AV soft block the others node connect, please try to disable the firewall and AV soft then try again.
The following related article will give more helpful tips,
The case of the server who couldn’t join a cluster – operation returned because the timeout period expired
http://blogs.technet.com/b/roplatforms/archive/2010/04/28/the-case-of-the-server-who-couldn-t-join-a-cluster-operation-returned-because-the-timeout-period-expired.aspx
Trouble Connecting to Cluster Nodes? Check WMI!
http://blogs.msdn.com/b/clustering/archive/2010/11/23/10095621.aspx
I’m glad to be of help to you!
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

What is the best way to connect a firewall cluster to a VPC domain

Hi All
Can anyone help me decide what is the best way to connect a firewall cluster to a VDC running in a pair of N7K's which is a VPC domain?
Can I configure a VLAN interface on each VDC and use HSRP? I was planning on presenting one 10GB cable from each VDC to each firewall. Would this work OK? HSRP traffic will go across the VPC peer link correct?
thanks all

No, but the one caveat is vpc orphan ports. If the vpc link between the nexus switches fails for any reason, all the vpc ports on the vpc secondary switch will be forced down. So it's recommended to connect single port devices to the primary vpc switch so the connections stay up. But if you're ok with that, then I don't see any problems.
You have a few options, one would be to run a separate link between your nexus switches for non-vpc vlans. These vlans would not be allowed over the vpc peer-link, or forwarded out vpc's.
See here page 49 :
http://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/sw/design/vpc_design/vpc_best_practices_design_guide.pdf

Weblogic managed servers connecting to the servers in different cluster

          Hi All,
          We have a weired problem going on for a while. We have a cluster configuration
          with an admin server and two managed servers. We have the similar configuration
          in DEV, TEST and PROD. The problem is that the managed server members in DEV cluster
          are making connections to managed servers which are member of PROD cluster for
          session replication. The same way TEST servers are trying to connect to PROD and
          DEV.
          Has anyone seen this kind of problem before. BEA seems to be cluless so far.
          Thanks in adavnce for your input.
          Udit


          Venkat,
          Thats a good suggestion but these things are too obvious to ignore. We have different
          multicast address in DEV and PROD and also hosts are on different sub net. I do
          not know if cluster name will make any differene though.
          Thanks for your input anyway,
          Udit
          "venkat" <[email protected]> wrote:
          >
          >Udit,
          > You can check the sub net, multicast address and the cluster name.
          >If the dev
          >and prod servers are in the same sub net with same multicast address,
          >then change
          >the multicast and try.
          >
          >Venkat
          >"venkat" <[email protected]> wrote:
          >>
          >>Udit,
          >>
          >>
          >>"Udit Singh" <[email protected]> wrote:
          >>>
          >>>Kumar,
          >>>Thanks for the reply.
          >>>The situation is that managed server in DEV try to replicate the session
          >>>to a
          >>>managed server in PROD and TEST and vice versa.
          >>>Let us say our dev managed servers are running on abc01 and abc02 and
          >>>prod managed
          >>>servers are running on xyz01 and xyz02. All the managed servers are
          >>running
          >>>on
          >>>port 7005.
          >>>If I do the netstat on abc01 or abc02 I could the see established connections
          >>>between abc01/02 and xyz01/02.
          >>>Why is that happening? We are running 6.1SP2.
          >>>
          >>>Udit
          >>>
          >>>Kumar Allamraju <[email protected]> wrote:
          >>>>We do not restrict intercluster communication as of 61 SP3.
          >>>>Once we get the IP from the cookie, we can safely make a
          >>>>connection to the other clustered node. We were not checking
          >>>>if the server is part of the same cluster or not. This is
          >>>>already fixed in 7.x and 61 SP4(not yet released) If you are
          >>>>on 61 Sp2 or SP3 then you should contact support and
          >>>>reference CR # CR089798 to get a one off patch.
          >>>>
          >>>>Regardless, are you traversing from DEV to PROD cluster and
          >>>>vice-versa. If not then this problem shouldn't happen unless
          >>>>plugin is routing the request to wrong cluster.
          >>>>
          >>>>--
          >>>>Kumar
          >>>>
          >>>>Udit Singh wrote:
          >>>>> Hi All,
          >>>>> We have a weired problem going on for a while. We have a cluster
          >>configuration
          >>>>> with an admin server and two managed servers. We have the similar
          >>>configuration
          >>>>> in DEV, TEST and PROD. The problem is that the managed server members
          >>>>in DEV cluster
          >>>>> are making connections to managed servers which are member of PROD
          >>>>cluster for
          >>>>> session replication. The same way TEST servers are trying to connect
          >>>>to PROD and
          >>>>> DEV.
          >>>>> Has anyone seen this kind of problem before. BEA seems to be cluless
          >>>>so far.
          >>>>>
          >>>>> Thanks in adavnce for your input.
          >>>>> Udit
          >>>>
          >>>
          >>
          >

Essbase Cluster - Lease manager is no connecting to the DB

Hi All,
This is a tough one for the Gurus :)
We are deploying a EPM 11.1.2.1, all windows except for the Essbase.
The Essbase is a cluster solution running on RHEL 5.3 with a shared repository for the ARBORPATH using OCFS2 as a filesystem. I am also using an Oracle RAC 11G as a DataBase.
I am using opmnctl to start the Essbase, but the StartEssbase.sh ends up with the same errors.
Now I've deploy this sort of solution many many times and never encourtered this error before:
leasemanager.log:
[ESSBASE0] [LM-39] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:-13298064] Lease Database Connection Information:
[ESSBASE0] [LM-41] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:-13298064] Odbc Driver [DataDirect 6.0 Oracle Wire Protocol]
[ESSBASE0] [LM-40] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:-13298064] Host [ora-grid1] and port [1521]
[ESSBASE0] [LM-42] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:-13298064] Service Name [prodepm]
[ESSBASE0] [LM-44] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:-13298064] User [HSS]
[ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
[ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
[ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
[ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
[ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
[ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
[ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
[ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
[ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
[ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
[ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
[ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
[ESSBASE0] [LM-1] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Failed to acquire the lease after [6] consecutive attempts.
[ESSBASE0] [LM-16] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Lease is being surrendered. [ESSBASE0] [LM-11] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Preparing to shutdown abort.
[ESSBASE0] [LM-12] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Terminating the process.
It is obviously not connecting to the Database to know wich node is the Active-One, but the reason why is still a mistery.
Now the DB connection info is ok, I tested manually and it works like a charm
This are the rest of my logs...just in case:
Essbase.log:
[Mon Sep 17 22:15:28 2012]Local/ESSBASE0///47588224341616/Info(1051283)
Retrieving License Information Please Wait...
[Mon Sep 17 22:15:28 2012]Local/ESSBASE0///47588224341616/Info(1051286)
License information retrieved.
[Mon Sep 17 22:15:28 2012]Local/ESSBASE0///47588224341616/Info(1051216)
JVM Started Successfully !
[Mon Sep 17 22:15:30 2012]Local/ESSBASE0///47588224341616/Info(1051199)
Single Sign-On Initialization Succeeded !
[Mon Sep 17 22:15:30 2012]Local/ESSBASE0///47588224341616/Info(1051232)
Using English_UnitedStates.Latin1@Binary as the Essbase Locale
[Mon Sep 17 22:15:30 2012]Local/ESSBASE0///47588224341616/Info(1056797)
Incremental security backup started by SYSTEM. The file created is [u01/EssbaseServer/essbaseserver1/bin/essbasets_1347916530.bak]
Essbase_ODL.log
[2012-09-17T22:15:28.35-21:15] [ESSBASE0] [AGENT-1283] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064] Retrieving License Information Please Wait...
[2012-09-17T22:15:28.35-21:15] [ESSBASE0] [AGENT-1286] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064] License information retrieved.
[2012-09-17T22:15:28.109-21:15] [ESSBASE0] [AGENT-1216] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064] JVM Started Successfully !
[2012-09-17T22:15:30.468-21:15] [ESSBASE0] [AGENT-1199] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064] Single Sign-On Initialization Succeeded !
[2012-09-17T22:15:30.468-21:15] [ESSBASE0] [NET-17] [ERROR] [16][] [ecid:1347916528494,0] [tid:-13298064] Host Name Not Available
[2012-09-17T22:15:30.513-21:15] [ESSBASE0] [AGENT-1232] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064]
Using English_UnitedStates.Latin1@Binary as the Essbase Locale
[2012-09-17T22:15:30.513-21:15] [ESSBASE0] [NET-17] [ERROR] [16][] [ecid:1347916528494,0] [tid:-13298064] Host Name Not Available
[2012-09-17T22:15:30.782-21:15] [ESSBASE0] [AGENT-6797] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064] Incremental security backup started by SYSTEM. The file created is [u01/EssbaseServer/essbaseserver1/bin/essbasets_1347916530.bak]
I have a SEV1 open with Oracle Support and is escalated to Development
Any help is wellcome...
Thank you all!

Hi All,
Problem solved...
The problem as you know was a connection issue to the Oracle DB from the OPMN, this one tries to connect to the DB (lese manager) to establish who is the active node and since the connection wasn’t happening, all nodes kill themselves as a default action.
After a long troubleshooting with TCP/IP and SQL.net traces we realize that the requests from OPMN existed but the response from the DB was not coming back and the reason why is how the Data Direct driver works. All EPM apps use a JDBC driver, this one is a little bit smarter and once you connect it to the SCAN-VIP it resolves everything automatically, even though we were using an alias for the SCAN-VIP (named oracle-grid). Data Direct drivers is not that smart and need to be connected to the actual SCAN name configured in the Oracle RAC, so while the request was going OK using the alias name (oracle-grid) the response was not, because it needs to have the same name as the SCAN (scan.hostname.com).
Once changed the alias name to the SCAN connection were happening and Essbase started in fail-over mode.
Thank you all for your suggestions!
Cheers,
Pablo.-

Hyper-V could not replicate changes for virtual machine 'machinename': The connection with the server was terminated abnormally (0x00002EFE).

I have a 3 node cluster that has replica setup to replicate to another cluster off-site.
Suddenly one of the servers is not replicating with the error:
Hyper-V could not replicate changes for virtual machine 'machinename': The connection with the server was terminated abnormally (0x00002EFE). (Virtual Machine ID CC0FD4CC-F9B7-4C68-ABE8-B7D52A87899F)
All other servers are replicating fine so there cannot be a permissions or connectivity issue between the 2 clusters.
This server has 2TB of data so I'd rather not have to start the replication again.
Does anyone have any pointers?
Thanks.

Hi drensta,
Based on my knowledge , "hyper-v replica broker " is needed for failover cluster replica .
Here is a link for "Why is the "Hyper-V Replica Broker" required?"
http://blogs.technet.com/b/virtualization/archive/2012/03/27/why-is-the-quot-hyper-v-replica-broker-quot-required.aspx
Hope this hleps
Best Regards
Elton Ji
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

The Cluster not failover when i shutdown one managed server?

Hello, I created one cluster whit two managed servers, and deployed an application across the cluster, but the weblogic server gave me two url and two different port for access to this application.
http://server1:7003/App_name
http://server1:7005/App_name
When I shutdown immediate one managed server i lost the connection whit the application from this managed server, My question is, the failover and de load balancer not work, why??
Why two diferent address?
thank any help

Well you have two different addresses (URL) because those are two physical managed servers. By creating a cluster you are not automatically going to have a virtual address (URL) that will load balance requests for that application between those two managed servers.
If you want one URL to access this application, you will have to have some kind of web server in front of your WebLogic. You can install and configure Oracle HTTP Server to route requests to WebLogic cluster. Refer this:
http://download.oracle.com/docs/cd/E12839_01/web.1111/e10144/intro_ohs.htm#i1008837
And this for details on how to configure mod_wl_ohs to route requests from OHS to WLS:
http://download.oracle.com/docs/cd/E12839_01/web.1111/e10144/under_mods.htm#BABGCGHJ
Hope this helps.
Thanks
Shail

Issue with LCM while migrating planning application in the cluster Env.

Hi,
Having issues with LCM while migrating the planning application in the cluster Env. In LCM we get below error and the application is up and running. Please let me know if anyone else has faced the same issue before in cluster environment. We have done migration using LCM on the single server and it works fine. It just that the cluster environment is an issue.
Error on Shared Service screen:
Post execution failed for - WebPlugin.importArtifacts.doImport. Unable to connect to "ApplicationName", ensure that the application is up and running.
Error on network:
“java.net.SocketTimeoutException: Read timed out”
ERROR - Zip error. The exception is -
java.net.SocketException: Connection reset by peer: socket write error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

Hi,
First of all, if your environment for source and target are same then you will have all the users and groups in shared services, in that case you just have to provision the users for this new application so that your security will get migrated when you migrate the from the source application. If the environs are different, then you have to migrate the users and groups first and provision them before importing the security using LCM.
Coming back to the process of importing the artifacts in the target application using LCM, you have to place the migrated file in the @admin native directory in Oracle/Middleware/epmsystem1.
Open shared services console->File system and you will see the your file name under that.
Select the file and you will see all your exported artifacts. Select all if you want to do complete migration to target.
Follow the steps, select the target application to which you want to migrate and execute migration.
Open the application and you will see all your artifacts migrated to the target.
If you face any error during migration it will be seen in the migration report..
Thanks,
Sourabh

"Service Cluster left the cluster" - lost all my data

My four storage enabled cluster nodes lost all their cached data when the all services left the cluster in response to some issue(?). Is that the expected behavior? Is the correct procedure to transactionally store to disk so you can reload when this happens or should this simply never happen? Seems like this should not happen. These four nodes are on the the same server. At about time 12:31 everything goes pear shaped.
2011-01-14 12:31:16.904/50004.436 Oracle Coherence GE 3.6.0.0 <Error> (thread=Cluster, member=3): This senior Member(Id=3, Timestamp=2011-01-13 22:37:52.106, Address=192.168.3.20:8088, MachineId=27412, Location=machine:amd4,process:4428,member:Administrator, Role=CoherenceServer) appears to have been disconnected from other nodes due to a long period of inactivity and the seniority has been assumed by the Member(Id=9, Timestamp=2011-01-13 22:38:01.438, Address=192.168.3.20:8094, MachineId=27412, Location=machine:amd4,process:3904,member:Administrator, Role=CoherenceServer); stopping cluster service.
2011-01-14 12:31:16.905/50004.437 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=3): Service Cluster left the cluster
2011-01-14 12:31:16.906/50004.438 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedStatsCacheService, member=3): Service DistributedStatsCacheService left the cluster
2011-01-14 12:31:16.906/50004.438 Oracle Coherence GE 3.6.0.0 <D5> (thread=Proxy:ExtendTcpProxyService, member=3): Service ExtendTcpProxyService left the cluster
2011-01-14 12:31:16.907/50004.439 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedQuotesCacheService, member=3): Service DistributedQuotesCacheService left the cluster
2011-01-14 12:31:16.913/50004.445 Oracle Coherence GE 3.6.0.0 <D5> (thread=Invocation:Management, member=3): Service Management left the cluster
2011-01-14 12:31:16.913/50004.445 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedOrdersService, member=3): Service DistributedOrdersService left the cluster
2011-01-14 12:31:16.913/50004.445 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedCacheService, member=3): Service DistributedCacheService left the cluster
2011-01-14 12:31:16.914/50004.446 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=214992652, Open=false)
2011-01-14 12:31:16.914/50004.446 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=8305999, Open=false)
2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=1383343339, Open=false)
2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: TcpConnection(Id=0x0000012D84061C15C0A803149CF3279B334BE6140AC76C47CA03670D76A96D22, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65480)
2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=1003858188, Open=false)
2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=1586910282, Open=false)
2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: TcpConnection(Id=0x0000012D84060E5AC0A8031442EA3CC26AC425D55D93A6AFC5404E5A76A96D1E, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65472)
2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor:TcpProcessor, member=3): Released: TcpConnection(Id=0x0000012D84061C15C0A803149CF3279B334BE6140AC76C47CA03670D76A96D22, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65480)
2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=160435953, Open=false)
2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor:TcpProcessor, member=3): Released: TcpConnection(Id=0x0000012D84060E5AC0A8031442EA3CC26AC425D55D93A6AFC5404E5A76A96D1E, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65472)
2011-01-14 12:31:16.916/50004.448 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=1635893341, Open=false)
2011-01-14 12:31:16.916/50004.448 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: TcpConnection(Id=0x0000012D84061203C0A8031455CD3A790F6009CA79AEC8BACC464D9976A96D20, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65478)
2011-01-14 12:31:16.916/50004.448 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor:TcpProcessor, member=3): Released: TcpConnection(Id=0x0000012D84061203C0A8031455CD3A790F6009CA79AEC8BACC464D9976A96D20, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65478)
2011-01-14 12:31:16.916/50004.448 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedExecutionsService, member=3): Service DistributedExecutionsService left the cluster
2011-01-14 12:31:16.919/50004.451 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedPositionsCacheService, member=3): Service DistributedPositionsCacheService left the clusterand ...
2011-01-14 12:31:22.874/50006.273 Oracle Coherence GE 3.6.0.0 <Info> (thread=main, member=n/a): Restarting cluster
2011-01-14 12:31:22.924/50006.323 Oracle Coherence GE 3.6.0.0 <D4> (thread=main, member=n/a): TCMP bound to /192.168.3.20:8094 using SystemSocketProvider
2011-01-14 12:31:52.937/50036.336 Oracle Coherence GE 3.6.0.0 <Warning> (thread=Cluster, member=n/a): This Member(Id=0, Timestamp=2011-01-14 12:31:22.924, Address=192.168.3.20:8094, MachineId=27412, Location=machine:amd4,process:4136,member:Administrator, Role=CoherenceServer) has been attempting to join the cluster at address 225.0.0.1:54321 with TTL 4 for 30 seconds without success; this could indicate a mis-configured TTL value, or it may simply be the result of a busy cluster or active failover.
2011-01-14 12:31:52.950/50036.349 Oracle Coherence GE 3.6.0.0 <Warning> (thread=Cluster, member=n/a): Received a discovery message that indicates the presence of an existing cluster that does not respond to join requests; this is usually caused by a network layer failure:Logs starting at 12:30 from the four nodes are here:
http://www.nmedia.net/~andrew/logs/1.log
http://www.nmedia.net/~andrew/logs/2.log
http://www.nmedia.net/~andrew/logs/3.log
http://www.nmedia.net/~andrew/logs/4.log
If someone could tell me if this is a bug in the cluster re-join logic or something I screwed up that would be great. Thanks!
Andrew

Hi Andrew
I had a quick look at your logs but cannot say for certain why your cluster died. I can say that losing data is a normal consequence of node loss though. If you have the backup count set to 1 then you can lose a single node without losing data. If you lose more than one node (on different machines, or the same machine if you only have one) over a very short space of time then you will almost certainly lose at least one partition and hence lose the data within that partition.
Going back to you logs is is difficult to determine the underlying cause without the whole set of logs. You have posted links to four logs but from looking at them the cluster has about 16 nodes. I know from experience (as we had a cluster that was quite unstable for a while) that tracing these issues through the logs can be a bit awkwrd but you soon get the hang of it :-)
For example in the log http://www.nmedia.net/~andrew/logs/1.log you have...
2011-01-14 12:31:16.807/49993.331 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=9): MemberLeft notification for Member(Id=3, Timestamp=2011-01-13 22:37:52.106, Address=192.168.3.20:8088, MachineId=27412, Location=machine:amd4,process:4428,member:Administrator, Role=CoherenceServer, PublisherSuccessRate=0.9975, ReceiverSuccessRate=0.9999, PauseRate=0.0, Threshold=93, Paused=false, Deferring=false, OutstandingPackets=0, DeferredPackets=0, ReadyPackets=0, LastIn=261ms, LastOut=277ms, LastSlow=n/a) received from Member(Id=22, Timestamp=2011-01-14 08:21:22.284, Address=192.168.3.121:8092, MachineId=27513, Location=machine:H1,process:3716,member:Howard, Role=Order_entry_window, PublisherSuccessRate=0.8326, ReceiverSuccessRate=1.0, PauseRate=0.0024, Threshold=1456, Paused=false, Deferring=false, OutstandingPackets=0, DeferredPackets=0, ReadyPackets=0, LastIn=0ms, LastOut=8ms, LastSlow=n/a)...which is Member-9 recieving a message about the departure of Member-3 from Member-22, so you would then need to look at the logs for Member-22 to see why it thought Member-3 had departed and also look at the logs for Member-3 for that time to see what might be wrong with it.
The more worrying message would be these...
2011-01-14 12:31:16.709/49993.233 Oracle Coherence GE 3.6.0.0 <Warning> (thread=PacketPublisher, member=9): Experienced a 19025 ms communication delay (probable remote GC) with Member(Id=21, Timestamp=2011-01-14 08:21:12.174, Address=192.168.3.121:8090, MachineId=27513, Location=machine:H1,process:4316,member:Howard, Role=OrderbookviewerViewer); 111 packets rescheduled, PauseRate=0.0014, Threshold=1696...a 19 second delay is a long time and would suggest either very long GC pauses of a network problem. Do you have GC logs of these processes. Are all the servers connected to the same switch or is the cluster distributed over more than one part of your network? Do you have too much on one machine, are you overloading the NIC, are you swapping, all these can cause delays and/or los of packets.
We have had problems with storage disabled nodes doing long GC pauses and causing storage nodes to drop out of the cluster. Our cluster was on 3.5.3-p8 whereas you are on 3.6.0.0 which is supposed to have better node death detection so you might not have the same issues we had.
Sorry to not be more help,
JK

Can I use one transport adapter on the nodes of the cluster?

Hi
I am new to sun cluster, in the cluster documentation they mentioned that each node should have 2 network cards one for public connections and one for private connection. what if I do not want the nodes to have public connections except for one node. In other words, I want to use one network card on each node except for the first node in the cluster, users can access the rest of the nodes through the fist node . Is that possible? If yes, what should be the name of the second transport adapter while installing the cluster software on the nodes.
Thank You for the help

Dear
We are using cluster for HA on failover condition, If you have only one network adapter so how you work in failover, and you can't assign one adaptor to two node as same, you have min 2 network adapter for 2 node cluster..
:)GooDLucK
Mohammed Tanvir

[SOLVED] Can't add a node to the cluster with error (Exchange 2010 SP3 DAG Windows Server 2012)

Hi there!
I have a problem which makes me very angry already :)
I have two servers Exchange 2010 SP3 with MB role started on Windows Server 2012. I decided to create a DAG.
I have created the prestaged AD object for the cluster called msc-co-exc-01c, assigned necessary permissions and disabled it. Allowed through the Windows Firewall traffic between nodes and prepared the File Share Witness server.
Then I have tried to add nodes. The first node has been added successfully, but the second node doesn't want to be added :). Now I can add only one node to the DAG. I tried to add different servers first, but only the first one was added.
LOGS on the second nodes:
Application Log
"Failed to initialize cluster with error 0x80004005." (MSExchangeIS)
Failover Clustering Diagnostic Log
"[VER] Could not read version data from database for node msc-co-exc-04v (id 1)."
CMDLET Error:
Summary: 1 item(s). 0 succeeded, 1 failed.
Elapsed time: 00:06:21
MSC-CO-EXC-02V
Failed
Error:
A database availability group administrative operation failed. Error: The operation failed. CreateCluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode()
(MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed. [Server: msc-co-exc-04v.int.krls.ru]
An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed..
This operation returned because the timeout period expired
Click here for help... http://technet.microsoft.com/en-US/library/ms.exch.err.default(EXCHG.141).aspx?v=14.3.174.1&t=exchgf1&e=ms.exch.err.ExC9C315
Warning:
Network name 'msc-co-exc-01c' is not online. Please check that the IP address configuration for the database availability group is correct.
Warning:
The operation wasn't successful because an error was encountered. You may find more details in log file "C:\ExchangeSetupLogs\DagTasks\dagtask_2014-11-17_13-54-56.543_add-databaseavailabiltygroupserver.log".
Exchange Management Shell command attempted:
Add-DatabaseAvailabilityGroupServer -MailboxServer 'MSC-CO-EXC-02V' -Identity 'msc-co-exc-01c'
Elapsed Time: 00:06:21
UPD:
when Exchange servers ran on the same Hyper-V node, the DAG is working well, but if I move one of VM to another node, It stops working.
I have installed Wireshark and captured trafic of cluster interface. When DAG members on the same HV-node, there is inbound and outbound traffic on the cluster interface, but if I move one of DAG member to another node, in Wireshark I see only outbound traffic
on both nodes.
It's confused me, because there is normal connectivity between these DAG members through the main interface.
Please, help me if you can.

Hi, Jared! Thank you for the reply.
Of course I did it already :) I have new info:
when Exchange servers ran on the same Hyper-V node, the DAG is working well, but if I move one of VM to another node, It stops working.
I have installed Wireshark and captured trafic of cluster interface. When DAG members on the same HV-node, there is inbound and outbound traffic on the cluster interface, but if I move one of DAG member to another node, in Wireshark I see only outbound traffic
on both nodes.
It's confused me, because there is normal connectivity between these DAG members through the main interface.

DPM failing SQL backups due to error: "the SQL Server instance refused a connection to the protection agent. (ID 30172 Details: Internal error code: 0x80990F85)

I ran across this error starting on 6/4/2011 and have been unable to find the root of the problem. In our environment, we have a DPM 2010 server dedicated to backing up all our SQL envrionment (about 45 SQL Servers total). All of the SQL
environment is backing up fine except for a SQL Cluster Application. This particular SQL Instances is part of a 6 node failover cluster with 6 SQL Instances distributed amongst them. The other 5 SQL instances in the cluster are backing
up fine; only one instance is failing. The DPM Alerts section shows this error when attempting to do a SQL backup of one of the databases on this SQL instance:
Affected area: KEN-PROD-VDB001\POSREPL1\master
Occurred since: 6/11/2011 11:00:56 PM
Description: Recovery point creation jobs for SQL Server 2008 database KEN-PROD-VDB001\POSREPL1\master on SQL Server (POSREPL1) - Store Settings.ken-prod-cl004.aarons.aaronrents.com have been failing. The number of failed recovery point creation jobs =
4.
If the datasource protected is SharePoint, then click on the Error Details to view the list of databases for which recovery point creation failed. (ID 3114)
The DPM job failed for SQL Server 2008 database KEN-PROD-VDB001\POSREPL1\master on SQL Server (POSREPL1) - Store Settings.ken-prod-cl004.aarons.aaronrents.com because the SQL Server instance refused a connection to the protection agent. (ID 30172 Details:
Internal error code: 0x80990F85)
More information
Recommended action: This can happen if the SQL Server process is overloaded, or running short of memory. Please ensure that you are able to successfully run transactions against the SQL database in question and then retry the failed job.
Create a recovery point...
Resolution: To dismiss the alert, click below
Inactivate alert
I have checked the cluster node this particular SQL instance is running on using Perfmon and the machine is nowhere near capacity on CPU, memory, network, or Disk I/O.  I have failed this SQL Application to another node in the cluster and
receive the same error (this other node has another clustered SQL application on it that is actively running as well as backing up fine). The only thing that I am aware of that has changed is that we installed SP2 for SQL 2008 about 2 weeks prior
to when the failures started to occur. However, we updated all six clustered SQL Instances at the same time and only this one is having this issue so I don't believe that caused the problem. We are running SQL 2008 SP2 (version 10.0.4000.0)
on all clustered instances along with DPM 2010 (version 3.0.7696.0) on this particular DPM server that has the issue.
One last thing, I have also noticed errors in the event log pertaining to the same SQL backups that are failing (but the time stamps are not concurrent with each backup attempt):
Log Name:      Application
Source:        MSDPM
Date:          6/13/2011 1:09:12 AM
Event ID:      4223
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      KEN-PROD-BS002.aarons.aaronrents.com
Description:
The description for Event ID 4223 from source MSDPM cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.
If the event originated on another computer, the display information had to be saved with the event.
The following information was included with the event:
DPM writer was unable to snapshot the replica of KEN-PROD-VDB001\POSREPL1\model. This may be due to:
1) No valid recovery points present on the replica.
2) Failure of the last express full backup job for the datasource.
3) Failure while deleting the invalid incremental recovery points on the replica.
Problem Details:
<DpmWriterEvent><__System><ID>30</ID><Seq>1833</Seq><TimeCreated>6/13/2011 5:09:12 AM</TimeCreated><Source>f:\dpmv3_rtm\private\product\tapebackup\dpswriter\vssfunctionality.cpp</Source><Line>815</Line><HasError>True</HasError></__System><DetailedCode>-2147212300</DetailedCode></DpmWriterEvent>
the message resource is present but the message is not found in the string/message table
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
    <Provider Name="MSDPM" />
    <EventID Qualifiers="0">4223</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2011-06-13T05:09:12.000000000Z" />
    <EventRecordID>68785</EventRecordID>
    <Channel>Application</Channel>
    <Computer>KEN-PROD-BS002.aarons.aaronrents.com</Computer>
    <Security />
</System>
<EventData>
    <Data>DPM writer was unable to snapshot the replica of KEN-PROD-VDB001\POSREPL1\model. This may be due to:
1) No valid recovery points present on the replica.
2) Failure of the last express full backup job for the datasource.
3) Failure while deleting the invalid incremental recovery points on the replica.
Problem Details:
<DpmWriterEvent><__System><ID>30</ID><Seq>1833</Seq><TimeCreated>6/13/2011 5:09:12 AM</TimeCreated><Source>f:\dpmv3_rtm\private\product\tapebackup\dpswriter\vssfunctionality.cpp</Source><Line>815</Line><HasError>True</HasError></__System><DetailedCode>-2147212300</DetailedCode></DpmWriterEvent>
</Data>
    <Binary>3C00440070006D005700720069007400650072004500760065006E0074003E003C005F005F00530079007300740065006D003E003C00490044003E00330030003C002F00490044003E003C005300650071003E0031003800330033003C002F005300650071003E003C00540069006D00650043007200650061007400650064003E0036002F00310033002F003200300031003100200035003A00300039003A0031003200200041004D003C002F00540069006D00650043007200650061007400650064003E003C0053006F0075007200630065003E0066003A005C00640070006D00760033005F00720074006D005C0070007200690076006100740065005C00700072006F0064007500630074005C0074006100700065006200610063006B00750070005C006400700073007700720069007400650072005C00760073007300660075006E006300740069006F006E0061006C006900740079002E006300700070003C002F0053006F0075007200630065003E003C004C0069006E0065003E003800310035003C002F004C0069006E0065003E003C004800610073004500720072006F0072003E0054007200750065003C002F004800610073004500720072006F0072003E003C002F005F005F00530079007300740065006D003E003C00440065007400610069006C006500640043006F00640065003E002D0032003100340037003200310032003300300030003C002F00440065007400610069006C006500640043006F00640065003E003C002F00440070006D005700720069007400650072004500760065006E0074003E00</Binary>
</EventData>
</Event>
Any help would be greatly appreciated!

Don't know if this helps or not, but I also noticed another peculiar issue that is derived from this problem. If I go to "Modify protection group", then expand the cluster, then expand all six nodes in the cluster, five of them show "All SQL Servers"
and allow me to expand the SQL Instance and show all databases; the one that is having a problem backing up, when I expand the node, doesn't even show that SQL exists on the node, when in fact, it does.
I would also like to add that the databases on this node that will not backup are running fine. They run hundreds of transactions daily so we know SQL itself is OK. Even though it is a busy SQL Server, there is plenty of available resources as
the SQL buffer and memory counters show the node is not under durress.

Connecting to the cluster

Similar Messages

Maybe you are looking for