Connecting to the cluster

Hi,
          I have a cluster running on a multihomed Solaris Server. It is made of two
          servers running on IP1 and IP2 (both using port 7001). It says in the
          documentation that I can put in the IPs of the two servers separated by a
          comma for the cluster address. So the address that I put in was
          "IP1:7001,IP2:7001" instead of a DNS name.
          How do I connect to the cluster. If I connect to IP1:7001 then I am
          connecting to the server and not the cluster, correct?
          I would appreciate any help,
          Mohammad
          

Sorry for the delay, I have been extremely busy...
          There are several alternatives -- DNS round-robin, hardware load balancers
          (e.g., Cisco LocalDirector), etc.
          Mica Cooper wrote:
          > Robert,
          >
          > WLBS is Windows Load Balancing Service. It was designed to only use 1 IP per
          > box. It comes on the server editions. Do you know of something else? Linux
          > or Solaris maybe?
          >
          > Mica
          >
          > "Robert Patrick" <[email protected]> wrote in message
          > news:[email protected]...
          > > Hi,
          > >
          > > I'm not sure what WLBS is but I guess my question is does it really need
          > to know
          > > that the IP addresses happen to point to the same machine? What happens
          > if you
          > > just tell it that there are twice as many machines as there really are?
          > >
          > > Robert
          > >
          > > Mica Cooper wrote:
          > >
          > > > Currently,
          > > > I am using WLBS to do the load balancing and it will only handle one IP
          > per
          > > > box. Do you know of any other software to do this?
          > > > Mica
          > > >
          > > > "Robert Patrick" <[email protected]> wrote in message
          > > > news:[email protected]...
          > > > > The real answer here is that the "cluster" is virtual -- it only
          > exists
          > > > because
          > > > > there are one or more server instances running. Regardless of how you
          > > > attain
          > > > > it, you are going to connect to one or more of the server instances.
          > > > There are
          > > > > several ways to do this as has been previously discussed (e.g., DNS
          > alias
          > > > that
          > > > > does DNS round-robin-style load-balancing, a comma-separated list of
          > IP
          > > > > addresses, etc.).
          > > > >
          > > > > Mica, as for your question, you need to teach your "load-balancing IP"
          > > > about all
          > > > > of the IP addresses for the servers in the cluster and not just "one
          > IP
          > > > per
          > > > > physical machine".
          > > > >
          > > > > Hope this helps,
          > > > > Robert
          > > > >
          > > > > Mica Cooper wrote:
          > > > >
          > > > > > Tao,
          > > > > >
          > > > > > Thats not what he is asking.
          > > > > >
          > > > > > He's asking how to call MyServerClusterName instead of trying to
          > call
          > > > > > different instances by IP. I would like to know how to do this also.
          > > > > > Currently we are calling a load balancing IP and it proxies to each
          > > > instance
          > > > > > but it only works for 1 instance per box and I need 2 instances per
          > box.
          > > > > >
          > > > > > Mica Cooper
          > > > > >
          > > > > > "Tao Zhang" <[email protected]> wrote in message
          > > > > > news:[email protected]...
          > > > > > > If you don't use DNS, you can write the IP1:7001,IP2:7001 in your
          > > > > > > PROVIDER_URL, and then transfer it to InitialContext.
          > > > > > >
          > > > > > >
          > > > > > > Mohammad Khan <[email protected]> wrote in message
          > > > > > > news:[email protected]...
          > > > > > > > Hi,
          > > > > > > >
          > > > > > > > I have a cluster running on a multihomed Solaris Server. It is
          > made
          > > > of
          > > > > > two
          > > > > > > > servers running on IP1 and IP2 (both using port 7001). It says
          > in
          > > > the
          > > > > > > > documentation that I can put in the IPs of the two servers
          > separated
          > > > by
          > > > > > a
          > > > > > > > comma for the cluster address. So the address that I put in was
          > > > > > > > "IP1:7001,IP2:7001" instead of a DNS name.
          > > > > > > >
          > > > > > > > How do I connect to the cluster. If I connect to IP1:7001 then I
          > am
          > > > > > > > connecting to the server and not the cluster, correct?
          > > > > > > >
          > > > > > > > I would appreciate any help,
          > > > > > > > Mohammad
          > > > > > > >
          > > > > > > >
          > > > > > > >
          > > > > > >
          > > > > > >
          > > > >
          > >
          

Similar Messages

  • Can not connect to a cluster and hyper-v manager(to one of my hosts) but virtual machines are running (hyperv2012r2)

    Hello guys
    I'm using a cluster hyper-v 2012 r2 with 9 hosts and after 12 hours without energy, it back again. my problem is:
    my cluster do not back
    all hyper-v hosts are and running hypervisor... all vms running... my problem is the cluster.... quorum do not back online...
    i found the problem... the wmi is not working in one of my hosts and in that, i cant connect to this hyperv manager... from anywhere.... the virtual machines are running on this host... but i cant manage them... 
    i found an article talking about this and now, im running this script.... but nothing hapen
    "C:\windows\system32\wbem> mofcomp.exe cluswmi.mof
    Microsoft (R) MOF Compiler Version 6.3.9600.16384
    Copyright (c) Microsoft Corp. 1997-2006. All rights reserved.
    Parsing MOF file: cluswmi.mof
    MOF file has been successfully parsed
    Storing data in the repository..."
    someone can help me?
    tnx!

    Tnx for the ansywer..
    "When you say "my
    cluster do not back" do you mean that the cluster is not running?"
    yes... i cant connect to the cluster
    Is "quorum
    do not back online" another way you are saying the cluster is not running?
    yes
     Are you using disk witness or file share witness? Recommend
    to use disk witness as it provides a more robust level of operation.
    witness
    Did you run the cluster validation wizard?  Turn off the disk tests to prevent VMs from
    failing over, but run all other tests.  What errors/warnings do you get?
    When i try to run validation tests, one of my hosts (same  that i cant manage vitual machines), fail to join in tests but all others are tested

  • How to find whether any disk arry is connected in the system

    Hi Guys,
    Please help me regarding Disk Array.
    1) What command(s) should I use to check whether any disk array is connected in the server or not.
    2) Please share the method for,
    For single server system (A single sun server is connected with exernal disk array).
    For Cluster server system (Two servers are connected as a sun cluster and Disk array is connected to the cluster as resource).
    3) If the disk array is from SUN or from other vendor, how can i know that?
    4) How to find how many disks are available in the Disk array?
    Thanks

    Wow... I got a reply from nik !!!
    Thanks for your reply..
    *1) I have a confussion that format command only report the local disks not external disks(Disk array). Is that correct?*
    2) We have a cluster server V890+V890 + Disk-Array(Sun StorageTek). The format output is below.
    *(5,6,7 & 8 are disk array?)*
    root# format
    Searching for disks...done
    AVAILABLE DISK SELECTIONS:
    0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
    /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e01924a4e1,0
    1. c1t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
    /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e01a467d91,0
    2. c1t2d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
    /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e019dd3151,0
    3. c1t3d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
    /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e01a467321,0
    4. c1t4d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
    /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w500000e01922a001,0
    5. c2t600A0B8000482D560000031B482A46BCd0 <SUN-CSM200_R-0619 cyl 51838 alt 2 hd 512 sec 64>
    /scsi_vhci/ssd@g600a0b8000482d560000031b482a46bc
    6. c2t600A0B8000482D5600000318482A46A6d0 <SUN-CSM200_R-0619 cyl 51838 alt 2 hd 512 sec 64>
    /scsi_vhci/ssd@g600a0b8000482d5600000318482a46a6
    7. c2t600A0B8000482DBC00000659482B59D3d0 <SUN-CSM200_R-0619 cyl 51838 alt 2 hd 512 sec 64>
    /scsi_vhci/ssd@g600a0b8000482dbc00000659482b59d3
    8. c2t600A0B8000482E200000049E482B599Ad0 <SUN-CSM200_R-0619 cyl 51838 alt 2 hd 512 sec 64>
    /scsi_vhci/ssd@g600a0b8000482e200000049e482b599a
    (End 0f output)
    *3) For cfgadm output is given below*
    root# cfgadm -al
    Ap_Id Type Receptacle Occupant Condition
    PCI0 unknown empty unconfigured unknown
    PCI1 unknown empty unconfigured unknown
    PCI2 unknown empty unconfigured unknown
    PCI3 unknown empty unconfigured unknown
    PCI4 scsi/hp connected configured ok
    PCI5 pci-pci/hp connected configured ok
    PCI6 pci-pci/hp connected configured ok
    PCI7 fibre/hp connected configured ok
    PCI8 fibre/hp connected configured ok
    c0 scsi-bus connected configured unknown
    c0::dsk/c0t0d0 CD-ROM connected configured unknown
    c1 fc-private connected configured unknown
    c1::500000e01922a001 disk connected configured unknown
    c1::500000e01924a4e1 disk connected configured unknown
    c1::500000e019dd3151 disk connected configured unknown
    c1::500000e01a467321 disk connected configured unknown
    c1::500000e01a467d91 disk connected configured unknown
    c1::508002000065adb9 ESI connected configured unknown
    c3 scsi-bus connected configured unknown
    c3::rmt/0 tape connected configured unknown
    c4 scsi-bus connected unconfigured unknown
    c5 scsi-bus connected unconfigured unknown
    c6 scsi-bus connected configured unknown
    c6::rmt/1 tape connected configured unknown
    c7 scsi-bus connected unconfigured unknown
    c8 scsi-bus connected unconfigured unknown
    c9 fc-private connected configured unknown
    c9::200600a0b8482dab disk connected configured unknown
    c10 fc-private connected configured unknown
    c10::200500a0b8482dbd disk connected configured unknown
    c11 fc-private connected configured unknown
    c11::200700a0b8482dab disk connected configured unknown
    c12 fc-private connected configured unknown
    c12::200400a0b8482dbd disk connected configured unknown
    usb0/1 unknown empty unconfigured ok
    usb0/2 unknown empty unconfigured ok
    usb0/3 unknown empty unconfigured ok
    usb0/4 unknown empty unconfigured ok
    *4) root# cfgadm -al -o show_FCP_dev*
    Ap_Id Type Receptacle Occupant Condition
    c1 fc-private connected configured unknown
    c1::500000e01922a001,0 disk connected configured unknown
    c1::500000e01924a4e1,0 disk connected configured unknown
    c1::500000e019dd3151,0 disk connected configured unknown
    c1::500000e01a467321,0 disk connected configured unknown
    c1::500000e01a467d91,0 disk connected configured unknown
    c1::508002000065adb9 ESI connected configured unknown
    c9 fc-private connected configured unknown
    c9::200600a0b8482dab,0 disk connected configured unknown
    c9::200600a0b8482dab,1 disk connected configured unknown
    c9::200600a0b8482dab,31 disk connected configured unknown
    c10 fc-private connected configured unknown
    c10::200500a0b8482dbd,0 disk connected configured unknown
    c10::200500a0b8482dbd,1 disk connected configured unknown
    c10::200500a0b8482dbd,31 disk connected configured unknown
    c11 fc-private connected configured unknown
    c11::200700a0b8482dab,0 disk connected configured unknown
    c11::200700a0b8482dab,1 disk connected configured unknown
    c11::200700a0b8482dab,31 disk connected configured unknown
    c12 fc-private connected configured unknown
    c12::200400a0b8482dbd,0 disk connected configured unknown
    c12::200400a0b8482dbd,1 disk connected configured unknown
    c12::200400a0b8482dbd,31 disk connected configured unknown
    Which part of output is indicating diskarry.. Please help me to understand....
    FYI
    root@# more /etc/vfstab
    #device device mount FS fsck mount mount
    #to mount to fsck point type pass at boot options
    #/dev/dsk/c1d0s2 /dev/rdsk/c1d0s2 /usr ufs 1 yes -
    fd - /dev/fd fd - no -
    /proc - /proc proc - no -
    /dev/md/dsk/d103 - - swap - no -
    /dev/md/dsk/d100 /dev/md/rdsk/d100 / ufs 1 no nologging
    /dev/md/dsk/d115 /dev/md/rdsk/d115 /global/.devices/node@1 ufs 2 no global
    swap - /tmp tmpfs - yes -
    /dev/vx/dsk/ossdg/exporthome /dev/vx/rdsk/ossdg/exporthome /export/home ufs 2 no logging
    /devices - /devices devfs - no -
    ctfs - /system/contract ctfs - no -
    objfs - /system/object objfs - no -
    sharefs - /etc/dfs/sharetab sharefs - no -

  • Libproxy sp12 behind a firewaill fails to connect to weblogic cluster

    Two WLS 5.1 SP11 clustering, iPlanet4.1 w/wlproxy plugin, Solaris Box. There is a firewall in between the iPlanet WebServer and the weblogic cluster. The NSAPI proxy plugin ( libproxy.so )connects successfully to one of the weblogic cluster for the first time using the NAT IPs of the cluster m/c maintained in the obj.conf. The Response contains the actual IP of the weblogic cluster m/c which gets updated in the proxy and uses that to connect to the cluster and firewall blocks that as it is not the NAT IP. But this works fine in libproxy of weblogic 6.1 unfortunately this has a problem when the data that gets passed is more than 1000 chars.
              

    shouldn't the connect string look like jdbc:oracle:thin:@MyIP:1521:MySID ?

  • "Failed to connect to the service manager" when I try to add nodes to a cluster on Windows Server 2008 R2

    Hello,
    I get the following error message everytime I try to add a node to an existing cluster "Failed to connect to the service manager".
    I'm running Windows Server 2008 R2,
    Any ideas?

    Hi saeedawadx,
    Please run the cluster validation and post the error or warning information, in normal scenario the “Failed to connect to the service manager” issue often caused by the firewall
    or AV soft block the others node connect, please try to disable the firewall and AV soft then try again.
    The following related article will give more helpful tips,
    The case of the server who couldn’t join a cluster – operation returned because the timeout period expired
    http://blogs.technet.com/b/roplatforms/archive/2010/04/28/the-case-of-the-server-who-couldn-t-join-a-cluster-operation-returned-because-the-timeout-period-expired.aspx
    Trouble Connecting to Cluster Nodes? Check WMI!
    http://blogs.msdn.com/b/clustering/archive/2010/11/23/10095621.aspx
    I’m glad to be of help to you!
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Support, contact [email protected]

  • What is the best way to connect a firewall cluster to a VPC domain

    Hi All
    Can anyone help me decide what is the best way to connect a firewall cluster to a VDC running in a pair of N7K's which is a VPC domain?  
    Can I configure a VLAN interface on each VDC and use HSRP?  I was planning on presenting one 10GB cable from each VDC to each firewall.  Would this work OK?  HSRP traffic will go across the VPC peer link correct?
    thanks all

    No, but the one caveat is vpc orphan ports. If the vpc link between the nexus switches fails for any reason, all the vpc ports on the vpc secondary switch will be forced down. So it's recommended to connect single port devices to the primary vpc switch so the connections stay up. But if you're ok with that, then I don't see any problems.
    You have a few options, one would be to run a separate link between your nexus switches for non-vpc vlans. These vlans would not be allowed over the vpc peer-link, or forwarded out vpc's.
    See here page 49 :
    http://www.cisco.com/c/dam/en/us/td/docs/switches/datacenter/sw/design/vpc_design/vpc_best_practices_design_guide.pdf

  • Weblogic managed servers connecting to the servers in different cluster

              Hi All,
              We have a weired problem going on for a while. We have a cluster configuration
              with an admin server and two managed servers. We have the similar configuration
              in DEV, TEST and PROD. The problem is that the managed server members in DEV cluster
              are making connections to managed servers which are member of PROD cluster for
              session replication. The same way TEST servers are trying to connect to PROD and
              DEV.
              Has anyone seen this kind of problem before. BEA seems to be cluless so far.
              Thanks in adavnce for your input.
              Udit
              

              Venkat,
              Thats a good suggestion but these things are too obvious to ignore. We have different
              multicast address in DEV and PROD and also hosts are on different sub net. I do
              not know if cluster name will make any differene though.
              Thanks for your input anyway,
              Udit
              "venkat" <[email protected]> wrote:
              >
              >Udit,
              > You can check the sub net, multicast address and the cluster name.
              >If the dev
              >and prod servers are in the same sub net with same multicast address,
              >then change
              >the multicast and try.
              >
              >Venkat
              >"venkat" <[email protected]> wrote:
              >>
              >>Udit,
              >>
              >>
              >>"Udit Singh" <[email protected]> wrote:
              >>>
              >>>Kumar,
              >>>Thanks for the reply.
              >>>The situation is that managed server in DEV try to replicate the session
              >>>to a
              >>>managed server in PROD and TEST and vice versa.
              >>>Let us say our dev managed servers are running on abc01 and abc02 and
              >>>prod managed
              >>>servers are running on xyz01 and xyz02. All the managed servers are
              >>running
              >>>on
              >>>port 7005.
              >>>If I do the netstat on abc01 or abc02 I could the see established connections
              >>>between abc01/02 and xyz01/02.
              >>>Why is that happening? We are running 6.1SP2.
              >>>
              >>>Udit
              >>>
              >>>Kumar Allamraju <[email protected]> wrote:
              >>>>We do not restrict intercluster communication as of 61 SP3.
              >>>>Once we get the IP from the cookie, we can safely make a
              >>>>connection to the other clustered node. We were not checking
              >>>>if the server is part of the same cluster or not. This is
              >>>>already fixed in 7.x and 61 SP4(not yet released) If you are
              >>>>on 61 Sp2 or SP3 then you should contact support and
              >>>>reference CR # CR089798 to get a one off patch.
              >>>>
              >>>>Regardless, are you traversing from DEV to PROD cluster and
              >>>>vice-versa. If not then this problem shouldn't happen unless
              >>>>plugin is routing the request to wrong cluster.
              >>>>
              >>>>--
              >>>>Kumar
              >>>>
              >>>>Udit Singh wrote:
              >>>>> Hi All,
              >>>>> We have a weired problem going on for a while. We have a cluster
              >>configuration
              >>>>> with an admin server and two managed servers. We have the similar
              >>>configuration
              >>>>> in DEV, TEST and PROD. The problem is that the managed server members
              >>>>in DEV cluster
              >>>>> are making connections to managed servers which are member of PROD
              >>>>cluster for
              >>>>> session replication. The same way TEST servers are trying to connect
              >>>>to PROD and
              >>>>> DEV.
              >>>>> Has anyone seen this kind of problem before. BEA seems to be cluless
              >>>>so far.
              >>>>>
              >>>>> Thanks in adavnce for your input.
              >>>>> Udit
              >>>>
              >>>
              >>
              >
              

  • Essbase Cluster - Lease manager is no connecting to the DB

    Hi All,
    This is a tough one for the Gurus :)
    We are deploying a EPM 11.1.2.1, all windows except for the Essbase.
    The Essbase is a cluster solution running on RHEL 5.3 with a shared repository for the ARBORPATH using OCFS2 as a filesystem. I am also using an Oracle RAC 11G as a DataBase.
    I am using opmnctl to start the Essbase, but the StartEssbase.sh ends up with the same errors.
    Now I've deploy this sort of solution many many times and never encourtered this error before:
    leasemanager.log:
    [ESSBASE0] [LM-39] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:-13298064] Lease Database Connection Information:
    [ESSBASE0] [LM-41] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:-13298064] Odbc Driver [DataDirect 6.0 Oracle Wire Protocol]
    [ESSBASE0] [LM-40] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:-13298064] Host [ora-grid1] and port [1521]
    [ESSBASE0] [LM-42] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:-13298064] Service Name [prodepm]
    [ESSBASE0] [LM-44] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:-13298064] User [HSS]
    [ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
    [ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
    [ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
    [ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
    [ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
    [ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
    [ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
    [ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
    [ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
    [ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
    [ESSBASE0] [LM-9] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12516: TNS:listener could not find available handler with matching protocol stack].
    [ESSBASE0] [LM-9] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Attempt to connect to database failed with error [[DataDirect][ODBC Oracle Wire Protocol driver][Oracle]TNS-12505: TNS:listener could not resolve SID given in connect descriptor].
    [ESSBASE0] [LM-1] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Failed to acquire the lease after [6] consecutive attempts.
    [ESSBASE0] [LM-16] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Lease is being surrendered. [ESSBASE0] [LM-11] [NOTIFICATION] [16][] [ecid:1347916530705,0] [tid:1090230592] Preparing to shutdown abort.
    [ESSBASE0] [LM-12] [ERROR] [32][] [ecid:1347916530705,0] [tid:1090230592] Terminating the process.
    It is obviously not connecting to the Database to know wich node is the Active-One, but the reason why is still a mistery.
    Now the DB connection info is ok, I tested manually and it works like a charm
    This are the rest of my logs...just in case:
    Essbase.log:
    [Mon Sep 17 22:15:28 2012]Local/ESSBASE0///47588224341616/Info(1051283)
    Retrieving License Information Please Wait...
    [Mon Sep 17 22:15:28 2012]Local/ESSBASE0///47588224341616/Info(1051286)
    License information retrieved.
    [Mon Sep 17 22:15:28 2012]Local/ESSBASE0///47588224341616/Info(1051216)
    JVM Started Successfully !
    [Mon Sep 17 22:15:30 2012]Local/ESSBASE0///47588224341616/Info(1051199)
    Single Sign-On Initialization Succeeded !
    [Mon Sep 17 22:15:30 2012]Local/ESSBASE0///47588224341616/Info(1051232)
    Using English_UnitedStates.Latin1@Binary as the Essbase Locale
    [Mon Sep 17 22:15:30 2012]Local/ESSBASE0///47588224341616/Info(1056797)
    Incremental security backup started by SYSTEM. The file created is [u01/EssbaseServer/essbaseserver1/bin/essbasets_1347916530.bak]
    Essbase_ODL.log
    [2012-09-17T22:15:28.35-21:15] [ESSBASE0] [AGENT-1283] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064] Retrieving License Information Please Wait...
    [2012-09-17T22:15:28.35-21:15] [ESSBASE0] [AGENT-1286] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064] License information retrieved.
    [2012-09-17T22:15:28.109-21:15] [ESSBASE0] [AGENT-1216] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064] JVM Started Successfully !
    [2012-09-17T22:15:30.468-21:15] [ESSBASE0] [AGENT-1199] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064] Single Sign-On Initialization Succeeded !
    [2012-09-17T22:15:30.468-21:15] [ESSBASE0] [NET-17] [ERROR] [16][] [ecid:1347916528494,0] [tid:-13298064] Host Name Not Available
    [2012-09-17T22:15:30.513-21:15] [ESSBASE0] [AGENT-1232] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064]
    Using English_UnitedStates.Latin1@Binary as the Essbase Locale
    [2012-09-17T22:15:30.513-21:15] [ESSBASE0] [NET-17] [ERROR] [16][] [ecid:1347916528494,0] [tid:-13298064] Host Name Not Available
    [2012-09-17T22:15:30.782-21:15] [ESSBASE0] [AGENT-6797] [NOTIFICATION] [16][] [ecid:1347916528494,0] [tid:-13298064] Incremental security backup started by SYSTEM. The file created is [u01/EssbaseServer/essbaseserver1/bin/essbasets_1347916530.bak]
    I have a SEV1 open with Oracle Support and is escalated to Development
    Any help is wellcome...
    Thank you all!

    Hi All,
    Problem solved...
    The problem as you know was a connection issue to the Oracle DB from the OPMN, this one tries to connect to the DB (lese manager) to establish who is the active node and since the connection wasn’t happening, all nodes kill themselves as a default action.
    After a long troubleshooting with TCP/IP and SQL.net traces we realize that the requests from OPMN existed but the response from the DB was not coming back and the reason why is how the Data Direct driver works. All EPM apps use a JDBC driver, this one is a little bit smarter and once you connect it to the SCAN-VIP it resolves everything automatically, even though we were using an alias for the SCAN-VIP (named oracle-grid). Data Direct drivers is not that smart and need to be connected to the actual SCAN name configured in the Oracle RAC, so while the request was going OK using the alias name (oracle-grid) the response was not, because it needs to have the same name as the SCAN (scan.hostname.com).
    Once changed the alias name to the SCAN connection were happening and Essbase started in fail-over mode.
    Thank you all for your suggestions!
    Cheers,
    Pablo.-

  • Hyper-V could not replicate changes for virtual machine 'machinename': The connection with the server was terminated abnormally (0x00002EFE).

    I have a 3 node cluster that has replica setup to replicate to another cluster off-site.
    Suddenly one of the servers is not replicating with the error:
    Hyper-V could not replicate changes for virtual machine 'machinename': The connection with the server was terminated abnormally (0x00002EFE). (Virtual Machine ID CC0FD4CC-F9B7-4C68-ABE8-B7D52A87899F)
    All other servers are replicating fine so there cannot be a permissions or connectivity issue between the 2 clusters.
    This server has 2TB of data so I'd rather not have to start the replication again.
    Does anyone have any pointers?
    Thanks.

    Hi drensta,
    Based on my knowledge , "hyper-v replica broker " is needed for failover cluster replica .
    Here is a link for "Why is the "Hyper-V Replica Broker" required?"
    http://blogs.technet.com/b/virtualization/archive/2012/03/27/why-is-the-quot-hyper-v-replica-broker-quot-required.aspx
    Hope this hleps
    Best Regards
    Elton Ji
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • The Cluster not failover when i shutdown one managed server?

    Hello, I created one cluster whit two managed servers, and deployed an application across the cluster, but the weblogic server gave me two url and two different port for access to this application.
    http://server1:7003/App_name
    http://server1:7005/App_name
    When I shutdown immediate one managed server i lost the connection whit the application from this managed server, My question is, the failover and de load balancer not work, why??
    Why two diferent address?
    thank any help

    Well you have two different addresses (URL) because those are two physical managed servers. By creating a cluster you are not automatically going to have a virtual address (URL) that will load balance requests for that application between those two managed servers.
    If you want one URL to access this application, you will have to have some kind of web server in front of your WebLogic. You can install and configure Oracle HTTP Server to route requests to WebLogic cluster. Refer this:
    http://download.oracle.com/docs/cd/E12839_01/web.1111/e10144/intro_ohs.htm#i1008837
    And this for details on how to configure mod_wl_ohs to route requests from OHS to WLS:
    http://download.oracle.com/docs/cd/E12839_01/web.1111/e10144/under_mods.htm#BABGCGHJ
    Hope this helps.
    Thanks
    Shail

  • Issue with LCM while migrating planning application in the cluster Env.

    Hi,
    Having issues with LCM while migrating the planning application in the cluster Env. In LCM we get below error and the application is up and running. Please let me know if anyone else has faced the same issue before in cluster environment. We have done migration using LCM on the single server and it works fine. It just that the cluster environment is an issue.
    Error on Shared Service screen:
    Post execution failed for - WebPlugin.importArtifacts.doImport. Unable to connect to "ApplicationName", ensure that the application is up and running.
    Error on network:
    “java.net.SocketTimeoutException: Read timed out”
    ERROR - Zip error. The exception is -
    java.net.SocketException: Connection reset by peer: socket write error
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)

    Hi,
    First of all, if your environment for source and target are same then you will have all the users and groups in shared services, in that case you just have to provision the users for this new application so that your security will get migrated when you migrate the from the source application. If the environs are different, then you have to migrate the users and groups first and provision them before importing the security using LCM.
    Coming back to the process of importing the artifacts in the target application using LCM, you have to place the migrated file in the @admin native directory in Oracle/Middleware/epmsystem1.
    Open shared services console->File system and you will see the your file name under that.
    Select the file and you will see all your exported artifacts. Select all if you want to do complete migration to target.
    Follow the steps, select the target application to which you want to migrate and execute migration.
    Open the application and you will see all your artifacts migrated to the target.
    If you face any error during migration it will be seen in the migration report..
    Thanks,
    Sourabh

  • "Service Cluster left the cluster" - lost all my data

    My four storage enabled cluster nodes lost all their cached data when the all services left the cluster in response to some issue(?). Is that the expected behavior? Is the correct procedure to transactionally store to disk so you can reload when this happens or should this simply never happen? Seems like this should not happen. These four nodes are on the the same server. At about time 12:31 everything goes pear shaped.
    2011-01-14 12:31:16.904/50004.436 Oracle Coherence GE 3.6.0.0 <Error> (thread=Cluster, member=3): This senior Member(Id=3, Timestamp=2011-01-13 22:37:52.106, Address=192.168.3.20:8088, MachineId=27412, Location=machine:amd4,process:4428,member:Administrator, Role=CoherenceServer) appears to have been disconnected from other nodes due to a long period of inactivity and the seniority has been assumed by the Member(Id=9, Timestamp=2011-01-13 22:38:01.438, Address=192.168.3.20:8094, MachineId=27412, Location=machine:amd4,process:3904,member:Administrator, Role=CoherenceServer); stopping cluster service.
    2011-01-14 12:31:16.905/50004.437 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=3): Service Cluster left the cluster
    2011-01-14 12:31:16.906/50004.438 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedStatsCacheService, member=3): Service DistributedStatsCacheService left the cluster
    2011-01-14 12:31:16.906/50004.438 Oracle Coherence GE 3.6.0.0 <D5> (thread=Proxy:ExtendTcpProxyService, member=3): Service ExtendTcpProxyService left the cluster
    2011-01-14 12:31:16.907/50004.439 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedQuotesCacheService, member=3): Service DistributedQuotesCacheService left the cluster
    2011-01-14 12:31:16.913/50004.445 Oracle Coherence GE 3.6.0.0 <D5> (thread=Invocation:Management, member=3): Service Management left the cluster
    2011-01-14 12:31:16.913/50004.445 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedOrdersService, member=3): Service DistributedOrdersService left the cluster
    2011-01-14 12:31:16.913/50004.445 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedCacheService, member=3): Service DistributedCacheService left the cluster
    2011-01-14 12:31:16.914/50004.446 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=214992652, Open=false)
    2011-01-14 12:31:16.914/50004.446 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=8305999, Open=false)
    2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=1383343339, Open=false)
    2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: TcpConnection(Id=0x0000012D84061C15C0A803149CF3279B334BE6140AC76C47CA03670D76A96D22, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65480)
    2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=1003858188, Open=false)
    2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=1586910282, Open=false)
    2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: TcpConnection(Id=0x0000012D84060E5AC0A8031442EA3CC26AC425D55D93A6AFC5404E5A76A96D1E, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65472)
    2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor:TcpProcessor, member=3): Released: TcpConnection(Id=0x0000012D84061C15C0A803149CF3279B334BE6140AC76C47CA03670D76A96D22, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65480)
    2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=160435953, Open=false)
    2011-01-14 12:31:16.915/50004.447 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor:TcpProcessor, member=3): Released: TcpConnection(Id=0x0000012D84060E5AC0A8031442EA3CC26AC425D55D93A6AFC5404E5A76A96D1E, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65472)
    2011-01-14 12:31:16.916/50004.448 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: Channel(Id=1635893341, Open=false)
    2011-01-14 12:31:16.916/50004.448 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor, member=3): Closed: TcpConnection(Id=0x0000012D84061203C0A8031455CD3A790F6009CA79AEC8BACC464D9976A96D20, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65478)
    2011-01-14 12:31:16.916/50004.448 Oracle Coherence GE 3.6.0.0 <D6> (thread=Proxy:ExtendTcpProxyService:TcpAcceptor:TcpProcessor, member=3): Released: TcpConnection(Id=0x0000012D84061203C0A8031455CD3A790F6009CA79AEC8BACC464D9976A96D20, Open=false, LocalAddress=192.168.3.20:9091, RemoteAddress=192.168.3.6:65478)
    2011-01-14 12:31:16.916/50004.448 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedExecutionsService, member=3): Service DistributedExecutionsService left the cluster
    2011-01-14 12:31:16.919/50004.451 Oracle Coherence GE 3.6.0.0 <D5> (thread=DistributedCache:DistributedPositionsCacheService, member=3): Service DistributedPositionsCacheService left the clusterand ...
    2011-01-14 12:31:22.874/50006.273 Oracle Coherence GE 3.6.0.0 <Info> (thread=main, member=n/a): Restarting cluster
    2011-01-14 12:31:22.924/50006.323 Oracle Coherence GE 3.6.0.0 <D4> (thread=main, member=n/a): TCMP bound to /192.168.3.20:8094 using SystemSocketProvider
    2011-01-14 12:31:52.937/50036.336 Oracle Coherence GE 3.6.0.0 <Warning> (thread=Cluster, member=n/a): This Member(Id=0, Timestamp=2011-01-14 12:31:22.924, Address=192.168.3.20:8094, MachineId=27412, Location=machine:amd4,process:4136,member:Administrator, Role=CoherenceServer) has been attempting to join the cluster at address 225.0.0.1:54321 with TTL 4 for 30 seconds without success; this could indicate a mis-configured TTL value, or it may simply be the result of a busy cluster or active failover.
    2011-01-14 12:31:52.950/50036.349 Oracle Coherence GE 3.6.0.0 <Warning> (thread=Cluster, member=n/a): Received a discovery message that indicates the presence of an existing cluster that does not respond to join requests; this is usually caused by a network layer failure:Logs starting at 12:30 from the four nodes are here:
    http://www.nmedia.net/~andrew/logs/1.log
    http://www.nmedia.net/~andrew/logs/2.log
    http://www.nmedia.net/~andrew/logs/3.log
    http://www.nmedia.net/~andrew/logs/4.log
    If someone could tell me if this is a bug in the cluster re-join logic or something I screwed up that would be great. Thanks!
    Andrew

    Hi Andrew
    I had a quick look at your logs but cannot say for certain why your cluster died. I can say that losing data is a normal consequence of node loss though. If you have the backup count set to 1 then you can lose a single node without losing data. If you lose more than one node (on different machines, or the same machine if you only have one) over a very short space of time then you will almost certainly lose at least one partition and hence lose the data within that partition.
    Going back to you logs is is difficult to determine the underlying cause without the whole set of logs. You have posted links to four logs but from looking at them the cluster has about 16 nodes. I know from experience (as we had a cluster that was quite unstable for a while) that tracing these issues through the logs can be a bit awkwrd but you soon get the hang of it :-)
    For example in the log http://www.nmedia.net/~andrew/logs/1.log you have...
    2011-01-14 12:31:16.807/49993.331 Oracle Coherence GE 3.6.0.0 <D5> (thread=Cluster, member=9): MemberLeft notification for Member(Id=3, Timestamp=2011-01-13 22:37:52.106, Address=192.168.3.20:8088, MachineId=27412, Location=machine:amd4,process:4428,member:Administrator, Role=CoherenceServer, PublisherSuccessRate=0.9975, ReceiverSuccessRate=0.9999, PauseRate=0.0, Threshold=93, Paused=false, Deferring=false, OutstandingPackets=0, DeferredPackets=0, ReadyPackets=0, LastIn=261ms, LastOut=277ms, LastSlow=n/a) received from Member(Id=22, Timestamp=2011-01-14 08:21:22.284, Address=192.168.3.121:8092, MachineId=27513, Location=machine:H1,process:3716,member:Howard, Role=Order_entry_window, PublisherSuccessRate=0.8326, ReceiverSuccessRate=1.0, PauseRate=0.0024, Threshold=1456, Paused=false, Deferring=false, OutstandingPackets=0, DeferredPackets=0, ReadyPackets=0, LastIn=0ms, LastOut=8ms, LastSlow=n/a)...which is Member-9 recieving a message about the departure of Member-3 from Member-22, so you would then need to look at the logs for Member-22 to see why it thought Member-3 had departed and also look at the logs for Member-3 for that time to see what might be wrong with it.
    The more worrying message would be these...
    2011-01-14 12:31:16.709/49993.233 Oracle Coherence GE 3.6.0.0 <Warning> (thread=PacketPublisher, member=9): Experienced a 19025 ms communication delay (probable remote GC) with Member(Id=21, Timestamp=2011-01-14 08:21:12.174, Address=192.168.3.121:8090, MachineId=27513, Location=machine:H1,process:4316,member:Howard, Role=OrderbookviewerViewer); 111 packets rescheduled, PauseRate=0.0014, Threshold=1696...a 19 second delay is a long time and would suggest either very long GC pauses of a network problem. Do you have GC logs of these processes. Are all the servers connected to the same switch or is the cluster distributed over more than one part of your network? Do you have too much on one machine, are you overloading the NIC, are you swapping, all these can cause delays and/or los of packets.
    We have had problems with storage disabled nodes doing long GC pauses and causing storage nodes to drop out of the cluster. Our cluster was on 3.5.3-p8 whereas you are on 3.6.0.0 which is supposed to have better node death detection so you might not have the same issues we had.
    Sorry to not be more help,
    JK

  • Can I use one transport adapter on the nodes of the cluster?

    Hi
    I am new to sun cluster, in the cluster documentation they mentioned that each node should have 2 network cards one for public connections and one for private connection. what if I do not want the nodes to have public connections except for one node. In other words, I want to use one network card on each node except for the first node in the cluster, users can access the rest of the nodes through the fist node . Is that possible? If yes, what should be the name of the second transport adapter while installing the cluster software on the nodes.
    Thank You for the help

    Dear
    We are using cluster for HA on failover condition, If you have only one network adapter so how you work in failover, and you can't assign one adaptor to two node as same, you have min 2 network adapter for 2 node cluster..
    :)GooDLucK
    Mohammed Tanvir

  • [SOLVED] Can't add a node to the cluster with error (Exchange 2010 SP3 DAG Windows Server 2012)

    Hi there!
    I have a problem which makes me very angry already :)
    I have two servers Exchange 2010 SP3 with MB role started on Windows Server 2012. I decided to create a DAG.
    I have created the prestaged AD object for the cluster called msc-co-exc-01c, assigned necessary permissions and disabled it. Allowed through the Windows Firewall traffic between nodes and prepared the File Share Witness server.
    Then I have tried to add nodes. The first node has been added successfully, but the second node doesn't want to be added :). Now I can add only one node to the DAG. I tried to add different servers first, but only the first one was added.
    LOGS on the second nodes: 
    Application Log
    "Failed to initialize cluster with error 0x80004005." (MSExchangeIS)
    Failover Clustering Diagnostic Log
    "[VER] Could not read version data from database for node msc-co-exc-04v (id 1)."
    CMDLET Error:
    Summary: 1 item(s). 0 succeeded, 1 failed.
    Elapsed time: 00:06:21
    MSC-CO-EXC-02V
    Failed
    Error:
    A database availability group administrative operation failed. Error: The operation failed. CreateCluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode()
    (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed. [Server: msc-co-exc-04v.int.krls.ru]
    An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"AddClusterNode() (MaxPercentage=100) failed with 0x5b4. Error: This operation returned because the timeout period expired"' failed..
    This operation returned because the timeout period expired
    Click here for help... http://technet.microsoft.com/en-US/library/ms.exch.err.default(EXCHG.141).aspx?v=14.3.174.1&t=exchgf1&e=ms.exch.err.ExC9C315
    Warning:
    Network name 'msc-co-exc-01c' is not online. Please check that the IP address configuration for the database availability group is correct.
    Warning:
    The operation wasn't successful because an error was encountered. You may find more details in log file "C:\ExchangeSetupLogs\DagTasks\dagtask_2014-11-17_13-54-56.543_add-databaseavailabiltygroupserver.log".
    Exchange Management Shell command attempted:
    Add-DatabaseAvailabilityGroupServer -MailboxServer 'MSC-CO-EXC-02V' -Identity 'msc-co-exc-01c'
    Elapsed Time: 00:06:21
    UPD:
    when Exchange servers ran on the same Hyper-V node, the DAG is working well, but if I move one of VM to another node, It stops working.
    I have installed Wireshark and captured trafic of cluster interface. When DAG members on the same HV-node, there is inbound and outbound traffic on the cluster interface, but if I move one of DAG member to another node, in Wireshark I see only outbound traffic
    on both nodes.
    It's confused me, because there is normal connectivity between these DAG members through the main interface.
    Please, help me if you can.

    Hi, Jared! Thank you for the reply.
    Of course I did it already :) I have new info:
    when Exchange servers ran on the same Hyper-V node, the DAG is working well, but if I move one of VM to another node, It stops working.
    I have installed Wireshark and captured trafic of cluster interface. When DAG members on the same HV-node, there is inbound and outbound traffic on the cluster interface, but if I move one of DAG member to another node, in Wireshark I see only outbound traffic
    on both nodes.
    It's confused me, because there is normal connectivity between these DAG members through the main interface.

  • DPM failing SQL backups due to error: "the SQL Server instance refused a connection to the protection agent. (ID 30172 Details: Internal error code: 0x80990F85)

    I ran across this error starting on 6/4/2011 and have been unable to find the root of the problem.  In our environment, we have a DPM 2010 server dedicated to backing up all our SQL envrionment (about 45 SQL Servers total).  All of the SQL
    environment is backing up fine except for a SQL Cluster Application.  This particular SQL Instances is part of a 6 node failover cluster with 6 SQL Instances distributed amongst them.  The other 5 SQL instances in the cluster are backing
    up fine; only one instance is failing.  The DPM Alerts section shows this error when attempting to do a SQL backup of one of the databases on this SQL instance:
    Affected area: KEN-PROD-VDB001\POSREPL1\master
    Occurred since: 6/11/2011 11:00:56 PM
    Description: Recovery point creation jobs for SQL Server 2008 database KEN-PROD-VDB001\POSREPL1\master on SQL Server (POSREPL1) - Store Settings.ken-prod-cl004.aarons.aaronrents.com have been failing. The number of failed recovery point creation jobs =
    4.
     If the datasource protected is SharePoint, then click on the Error Details to view the list of databases for which recovery point creation failed. (ID 3114)
     The DPM job failed for SQL Server 2008 database KEN-PROD-VDB001\POSREPL1\master on SQL Server (POSREPL1) - Store Settings.ken-prod-cl004.aarons.aaronrents.com because the SQL Server instance refused a connection to the protection agent. (ID 30172 Details:
    Internal error code: 0x80990F85)
     More information
    Recommended action: This can happen if the SQL Server process is overloaded, or running short of memory. Please ensure that you are able to successfully run transactions against the SQL database in question and then retry the failed job.
     Create a recovery point...
    Resolution: To dismiss the alert, click below
     Inactivate alert
    I have checked the cluster node this particular SQL instance is running on using Perfmon and the machine is nowhere near capacity on CPU, memory, network, or Disk I/O.  I have failed this SQL Application to another node in the cluster and
    receive the same error (this other node has another clustered SQL application on it that is actively running as well as backing up fine).  The only thing that I am aware of that has changed is that we installed SP2 for SQL 2008 about 2 weeks prior
    to when the failures started to occur.  However, we updated all six clustered SQL Instances at the same time and only this one is having this issue so I don't believe that caused the problem.  We are running SQL 2008 SP2 (version 10.0.4000.0)
    on all clustered instances along with DPM 2010 (version 3.0.7696.0) on this particular DPM server that has the issue.
    One last thing, I have also noticed errors in the event log pertaining to the same SQL backups that are failing (but the time stamps are not concurrent with each backup attempt):
    Log Name:      Application
    Source:        MSDPM
    Date:          6/13/2011 1:09:12 AM
    Event ID:      4223
    Task Category: None
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      KEN-PROD-BS002.aarons.aaronrents.com
    Description:
    The description for Event ID 4223 from source MSDPM cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.
    If the event originated on another computer, the display information had to be saved with the event.
    The following information was included with the event:
    DPM writer was unable to snapshot the replica of KEN-PROD-VDB001\POSREPL1\model. This may be due to:
    1) No valid recovery points present on the replica.
    2) Failure of the last express full backup job for the datasource.
    3) Failure while deleting the invalid incremental recovery points on the replica.
    Problem Details:
    <DpmWriterEvent><__System><ID>30</ID><Seq>1833</Seq><TimeCreated>6/13/2011 5:09:12 AM</TimeCreated><Source>f:\dpmv3_rtm\private\product\tapebackup\dpswriter\vssfunctionality.cpp</Source><Line>815</Line><HasError>True</HasError></__System><DetailedCode>-2147212300</DetailedCode></DpmWriterEvent>
    the message resource is present but the message is not found in the string/message table
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="MSDPM" />
        <EventID Qualifiers="0">4223</EventID>
        <Level>2</Level>
        <Task>0</Task>
        <Keywords>0x80000000000000</Keywords>
        <TimeCreated SystemTime="2011-06-13T05:09:12.000000000Z" />
        <EventRecordID>68785</EventRecordID>
        <Channel>Application</Channel>
        <Computer>KEN-PROD-BS002.aarons.aaronrents.com</Computer>
        <Security />
      </System>
      <EventData>
        <Data>DPM writer was unable to snapshot the replica of KEN-PROD-VDB001\POSREPL1\model. This may be due to:
    1) No valid recovery points present on the replica.
    2) Failure of the last express full backup job for the datasource.
    3) Failure while deleting the invalid incremental recovery points on the replica.
    Problem Details:
    &lt;DpmWriterEvent&gt;&lt;__System&gt;&lt;ID&gt;30&lt;/ID&gt;&lt;Seq&gt;1833&lt;/Seq&gt;&lt;TimeCreated&gt;6/13/2011 5:09:12 AM&lt;/TimeCreated&gt;&lt;Source&gt;f:\dpmv3_rtm\private\product\tapebackup\dpswriter\vssfunctionality.cpp&lt;/Source&gt;&lt;Line&gt;815&lt;/Line&gt;&lt;HasError&gt;True&lt;/HasError&gt;&lt;/__System&gt;&lt;DetailedCode&gt;-2147212300&lt;/DetailedCode&gt;&lt;/DpmWriterEvent&gt;
    </Data>
        <Binary>3C00440070006D005700720069007400650072004500760065006E0074003E003C005F005F00530079007300740065006D003E003C00490044003E00330030003C002F00490044003E003C005300650071003E0031003800330033003C002F005300650071003E003C00540069006D00650043007200650061007400650064003E0036002F00310033002F003200300031003100200035003A00300039003A0031003200200041004D003C002F00540069006D00650043007200650061007400650064003E003C0053006F0075007200630065003E0066003A005C00640070006D00760033005F00720074006D005C0070007200690076006100740065005C00700072006F0064007500630074005C0074006100700065006200610063006B00750070005C006400700073007700720069007400650072005C00760073007300660075006E006300740069006F006E0061006C006900740079002E006300700070003C002F0053006F0075007200630065003E003C004C0069006E0065003E003800310035003C002F004C0069006E0065003E003C004800610073004500720072006F0072003E0054007200750065003C002F004800610073004500720072006F0072003E003C002F005F005F00530079007300740065006D003E003C00440065007400610069006C006500640043006F00640065003E002D0032003100340037003200310032003300300030003C002F00440065007400610069006C006500640043006F00640065003E003C002F00440070006D005700720069007400650072004500760065006E0074003E00</Binary>
      </EventData>
    </Event>
    Any help would be greatly appreciated!

    Don't know if this helps or not, but I also noticed another peculiar issue that is derived from this problem.  If I go to "Modify protection group", then expand the cluster, then expand all six nodes in the cluster, five of them show "All SQL Servers"
    and allow me to expand the SQL Instance and show all databases; the one that is having a problem backing up, when I expand the node, doesn't even show that SQL exists on the node, when in fact, it does.
    I would also like to add that the databases on this node that will not backup are running fine.  They run hundreds of transactions daily so we know SQL itself is OK.  Even though it is a busy SQL Server, there is plenty of available resources as
    the SQL buffer and memory counters show the node is not under durress.

Maybe you are looking for

  • Sharing photos via MobileMe Gallery - what alternative?

    I have been sharing photos with my soon to be 80 years old mother via my MobileMe gallery, all I had to do was send her the link, while all she had to do was subscribe to my gallery in order to have the images download to iPhoto. This was easy and st

  • How to PROMOTE your iWeb site...............

    iWeb is soooooooo user friendly. I'm new to creating a web site and I was wondering, how you go about promoting my website on search engines such as Yahoo and Google. Any suggestions? Cheers

  • Losing Digital Audio Ouput from Mini SPDIF

    I have a 2010 Mac Mini connected to my HDTV/AV Receiver. I send digital audio to the av receiver with an optical cable. I have sleep disabled for OTA DVR. I have not tried to see if it will wake the mini to record. I have it working, but often upon u

  • Encountering ORA-29285: file write error

    Hi. I am running a simple code and got this error? ORA-29285: file write error I swear this is working yesterday and and all my write to text code =( Error starting at line 1 in command: Declare v_file utl_file.file_type; v_dir_folder varchar (30) :=

  • Activity stream in CC desktop manager is not updated

    Hello, the activity stream on the Home panel of the Adobe CC desktop manager is not updated. I'm logged, I sync, I reinstalled the Adobe CC manager. I use the same Adobe CC account on another MacBookPro OS X 10.10 and the activity stream in the Home