EJB clustering fail over problem.

          I have setup 2 server cluster, WL5.1 patch 6 on NT. when client make calls to EJBs , you can see these 2 server load balancing (Round-Robin). after I stop one server in it, I got null pointer exception at client. what did I miss ?
          Thanks in advance,
          java.lang.NullPointerException
          at weblogic.rjvm.RJVMFinder.isThisHost(RJVMFinder.java:340)
          at weblogic.rjvm.RJVMFinder.isHostedByLocalRJVM(RJVMFinder.java:314)
          at weblogic.rjvm.RJVMFinder.findOrCreate(RJVMFinder.java:151)
          at weblogic.rjvm.ServerURL.findOrCreateRJVM(ServerURL.java:200)
          at weblogic.jndi.WLInitialContextFactoryDelegate.getInitialContext(WLInitialContextFactoryDelegate.java:19
          at weblogic.jndi.WLInitialContextFactoryDelegate.getInitialContext(WLInitialContextFactoryDelegate.java:14
          at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:123)
          at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:671)
          at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:242)
          at javax.naming.InitialContext.init(InitialContext.java:218)
          at javax.naming.InitialContext.<init>(InitialContext.java:194)
          at com.avinamart.ApplicationServices.ServiceManager.getInitialContext(ServiceManager.java:66)
          at com.avinamart.ApplicationServices.ServiceManager._getApplicationService(ServiceManager.java:117)
          at com.avinamart.ApplicationServices.ServiceManager.getOrganizationService(ServiceManager.java:210)
          at com.avinamart.WebInterface.EPASSServicesReference.getOrganizationService(EPASSServicesReference.java:77
          at com.avinamart.WebInterface.EPASSReferences.getBusinessUserStruct(EPASSReferences.java:367)
          at com.avinamart.WebInterface.EPASSReferences.getUserPath(EPASSReferences.java:200)
          at jsp_servlet._en._detail_95_org._jspService(_detail_95_org.java:326)
          at weblogic.servlet.jsp.JspBase.service(JspBase.java:27)
          at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:105)
          at weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImpl.java:742)
          at weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImpl.java:686)
          at weblogic.servlet.internal.ServletContextManager.invokeServlet(ServletContextManager.java:247)
          at weblogic.socket.MuxableSocketHTTP.invokeServlet(MuxableSocketHTTP.java:361)
          at weblogic.socket.MuxableSocketHTTP.execute(MuxableSocketHTTP.java:261)
          at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)
          

What's the url for your lookup?
          Is it mapped to one server or multiple servers?
          - Prasad
          Tony Lu wrote:
          > I have setup 2 server cluster, WL5.1 patch 6 on NT. when client make calls to EJBs , you can see these 2 server load balancing (Round-Robin). after I stop one server in it, I got null pointer exception at client. what did I miss ?
          >
          > Thanks in advance,
          >
          > java.lang.NullPointerException
          > at weblogic.rjvm.RJVMFinder.isThisHost(RJVMFinder.java:340)
          > at weblogic.rjvm.RJVMFinder.isHostedByLocalRJVM(RJVMFinder.java:314)
          > at weblogic.rjvm.RJVMFinder.findOrCreate(RJVMFinder.java:151)
          > at weblogic.rjvm.ServerURL.findOrCreateRJVM(ServerURL.java:200)
          > at weblogic.jndi.WLInitialContextFactoryDelegate.getInitialContext(WLInitialContextFactoryDelegate.java:19
          > at weblogic.jndi.WLInitialContextFactoryDelegate.getInitialContext(WLInitialContextFactoryDelegate.java:14
          > at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:123)
          > at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:671)
          > at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:242)
          > at javax.naming.InitialContext.init(InitialContext.java:218)
          > at javax.naming.InitialContext.<init>(InitialContext.java:194)
          > at com.avinamart.ApplicationServices.ServiceManager.getInitialContext(ServiceManager.java:66)
          > at com.avinamart.ApplicationServices.ServiceManager._getApplicationService(ServiceManager.java:117)
          > at com.avinamart.ApplicationServices.ServiceManager.getOrganizationService(ServiceManager.java:210)
          > at com.avinamart.WebInterface.EPASSServicesReference.getOrganizationService(EPASSServicesReference.java:77
          > at com.avinamart.WebInterface.EPASSReferences.getBusinessUserStruct(EPASSReferences.java:367)
          > at com.avinamart.WebInterface.EPASSReferences.getUserPath(EPASSReferences.java:200)
          > at jsp_servlet._en._detail_95_org._jspService(_detail_95_org.java:326)
          > at weblogic.servlet.jsp.JspBase.service(JspBase.java:27)
          > at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:105)
          > at weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImpl.java:742)
          > at weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImpl.java:686)
          > at weblogic.servlet.internal.ServletContextManager.invokeServlet(ServletContextManager.java:247)
          > at weblogic.socket.MuxableSocketHTTP.invokeServlet(MuxableSocketHTTP.java:361)
          > at weblogic.socket.MuxableSocketHTTP.execute(MuxableSocketHTTP.java:261)
          > at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)
          

Similar Messages

  • Servlet fail-over problem

              I'm testing WebLogic clustering of servlets with in-memory-replication in Sun platform
              (wls 5.1 sp9) and using Apache plug-in.
              I did this test:
              - I configured a cluster of two servers
              - I simulate a situation of hang, in one of the two servers, filling all execution
              threads with servlets doing Thread.sleep()
              - I tried to launch a request to the cluster (a JSP request) but my request timed
              out
              after ConnectTimeoutSecs.
              Looking at the wlproxy.log it seems that the cluster attempts to failover to
              the
              secondary server (after HungServerRecoverSecs) but it doesn't respond, then it
              retries
              with the primary server and so on ( waiting every time HungServerRecoverSecs for
              a
              response) until the timeout "ConnectTimeoutSecs" is reached.
              This is very strange because the secondary server is not hung; if I launch a request
              directly to it (specifying in the URL host:port) it responds to me.
              I have also tried to specify the parameter Idempotent ON, even if the default
              is ON, but
              with no result.
              Can anyone help me?
              

              I solved the problem setting the parameter weblogic.system.servletThreadCount
              in the cluster properties file.
              Now another problem raised.
              When one of server of the cluster is in a status of hang the cluster
              carry out fail-over to the second server but session information is lost.
              Can anyone help me?
              "Mike Reiche" <[email protected]> wrote:
              >
              >Don't take this as absolute gospel - it is just my understanding of how
              >things
              >work.
              >
              >Since the WL server is still alive, it will accept connections. This
              >takes ConnectTimeOutSecs
              >out of the picture.
              >
              >Now you're just left with HungRecoverSeconds. If the response takes longer
              >than
              >HungRecoverSeconds, then wlproxy will deem the request to have 'timed
              >out'. If
              >it is not Idempotent, that's it, you're done. If it is Idempotent, wlproxy
              >will
              >retry - on the other wl instance. From what you describe, the second
              >one should
              >work - unless of course the second WL is also backed up with Thread.sleep()
              >-
              >then after HungRecoverSeconds, the request will be resent to an available
              >WL instance.
              >
              >"Lucia Giraldo" <[email protected]> wrote:
              >>
              >>I'm testing WebLogic clustering of servlets with in-memory-replication
              >>in Sun platform
              >>(wls 5.1 sp9) and using Apache plug-in.
              >>I did this test:
              >>- I configured a cluster of two servers
              >>- I simulate a situation of hang, in one of the two servers, filling
              >>all execution
              >>threads with servlets doing Thread.sleep()
              >>- I tried to launch a request to the cluster (a JSP request) but my
              >request
              >>timed
              >>out
              >>after ConnectTimeoutSecs.
              >>Looking at the wlproxy.log it seems that the cluster attempts to failover
              >>to
              >>the
              >>secondary server (after HungServerRecoverSecs) but it doesn't respond,
              >>then it
              >>retries
              >>with the primary server and so on ( waiting every time HungServerRecoverSecs
              >>for
              >>a
              >>response) until the timeout "ConnectTimeoutSecs" is reached.
              >>This is very strange because the secondary server is not hung; if I
              >launch
              >>a request
              >>directly to it (specifying in the URL host:port) it responds to me.
              >>I have also tried to specify the parameter Idempotent ON, even if the
              >>default
              >>is ON, but
              >>with no result.
              >>Can anyone help me?
              >
              

  • Dabase fail over problem after we change Concurrency Strategy:

    Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
    reasons. Since we change that now when we do oracle database fail over weblogic
    6.1 does not detect database fail over and it need to be rebooted.
    how we can resolve this ??

    Hi,
    It is just faining one of the application servers, developer wrote that when installing CI, Local hostname is written in Database and SDM. We will have to do a Homogeneous system copy to change the name.
    The problem is that I used Virtual SAP group name in CI and DI application servers, in SCS and ASCS  we used Virtual hostnames and it is OK according to SAP developer.
    The Start and instance profiles were checked and everything was fine, just the dispatcher from CI is having problems when comming from Node B to Node A.
    Regards

  • WLS6.1sp1 stateful EJB problem =   load-balancing and fail over

              I have three problem
              1. I have 2 clustered server. my weblogic-ejb-jar.xml is here
              <?xml version="1.0"?>
              <!DOCTYPE weblogic-ejb-jar PUBLIC '-//BEA Systems, Inc.//DTD WebLogic 6.0.0 EJB//EN'
              'http://www.bea.com/servers/wls600/dtd/weblogic-ejb-jar.dtd'>
              <weblogic-ejb-jar>
              <weblogic-enterprise-bean>
                   <ejb-name>DBStatefulEJB</ejb-name>
                   <stateful-session-descriptor>
                   <stateful-session-cache>
                        <max-beans-in-cache>100</max-beans-in-cache>
                        <idle-timeout-seconds>120</idle-timeout-seconds>
                   </stateful-session-cache>
                   <stateful-session-clustering>
                        <home-is-clusterable>true</home-is-clusterable>
                        <home-load-algorithm>RoundRobin</home-load-algorithm>
                        <home-call-router-class-name>common.QARouter</home-call-router-class-name>
                        <replication-type>InMemory</replication-type>
                   </stateful-session-clustering>
                   </stateful-session-descriptor>
                   <jndi-name>com.daou.EJBS.solutions.DBStatefulBean</jndi-name>
              </weblogic-enterprise-bean>
              </weblogic-ejb-jar>
              when i use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
              and deploy this ejb, exception cause
              <Warning> <Dispatcher> <RuntimeException thrown b
              y rmi server: 'weblogic.rmi.cluster.ReplicaAwareServerRef@9 - jvmid: '2903098842
              594628659S:203.231.15.167:[5001,5001,5002,5002,5001,5002,-1]:mydomain:cluster1',
              oid: '9', implementation: 'weblogic.jndi.internal.RootNamingNode@5f39bc''
              java.lang.IllegalArgumentException: Failed to instantiate weblogic.rmi.cluster.B
              asicReplicaHandler due to java.lang.reflect.InvocationTargetException
              at weblogic.rmi.cluster.ReplicaAwareInfo.instantiate(ReplicaAwareInfo.ja
              va:185)
              at weblogic.rmi.cluster.ReplicaAwareInfo.getReplicaHandler(ReplicaAwareI
              nfo.java:105)
              at weblogic.rmi.cluster.ReplicaAwareRemoteRef.initialize(ReplicaAwareRem
              oteRef.java:79)
              at weblogic.rmi.cluster.ClusterableRemoteRef.initialize(ClusterableRemot
              eRef.java:28)
              at weblogic.rmi.cluster.ClusterableRemoteObject.initializeRef(Clusterabl
              eRemoteObject.java:255)
              at weblogic.rmi.cluster.ClusterableRemoteObject.onBind(ClusterableRemote
              Object.java:149)
              at weblogic.jndi.internal.BasicNamingNode.rebindHere(BasicNamingNode.jav
              a:392)
              at weblogic.jndi.internal.ServerNamingNode.rebindHere(ServerNamingNode.j
              ava:142)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              2)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
              at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:296)
              So do i must use it or not???
              2. When i don't use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
              , there's no exception
              but load balancing does not happen. According to the document , there's must load
              balancing when i call home.create() method.
              my client program goes here
                   DBStateful the_ejb1 = (DBStateful) PortableRemoteObject.narrow(home.create(),
              DBStateful.class);
                   DBStateful the_ejb2 = (DBStateful) PortableRemoteObject.narrow(home.create(3),
              DBStateful.class);
              the result is like that
                   the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@4695a6)/397
                   the_ejb2 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@acf6e)/398
                   or
                   the_ejb1 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@252fdf)/380
                   the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@6a0252)/381
                   I think the result should be like under one... isn't it??
                   the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@4695a6)/397
                   the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@6a0252)/381
              In this case i think the_ejb1 and the_ejb2 must have instance in different cluster
              server
              but they go to one server .
              3. If i don't use      "<home-call-router-class-name>common.QARouter</home-call-router-class-name>",
              "<replication-type>InMemory</replication-type>" then load balancing happen but
              there's no fail-over
              So how can i get load-balancing and fail over together??
              

              I have three problem
              1. I have 2 clustered server. my weblogic-ejb-jar.xml is here
              <?xml version="1.0"?>
              <!DOCTYPE weblogic-ejb-jar PUBLIC '-//BEA Systems, Inc.//DTD WebLogic 6.0.0 EJB//EN'
              'http://www.bea.com/servers/wls600/dtd/weblogic-ejb-jar.dtd'>
              <weblogic-ejb-jar>
              <weblogic-enterprise-bean>
                   <ejb-name>DBStatefulEJB</ejb-name>
                   <stateful-session-descriptor>
                   <stateful-session-cache>
                        <max-beans-in-cache>100</max-beans-in-cache>
                        <idle-timeout-seconds>120</idle-timeout-seconds>
                   </stateful-session-cache>
                   <stateful-session-clustering>
                        <home-is-clusterable>true</home-is-clusterable>
                        <home-load-algorithm>RoundRobin</home-load-algorithm>
                        <home-call-router-class-name>common.QARouter</home-call-router-class-name>
                        <replication-type>InMemory</replication-type>
                   </stateful-session-clustering>
                   </stateful-session-descriptor>
                   <jndi-name>com.daou.EJBS.solutions.DBStatefulBean</jndi-name>
              </weblogic-enterprise-bean>
              </weblogic-ejb-jar>
              when i use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
              and deploy this ejb, exception cause
              <Warning> <Dispatcher> <RuntimeException thrown b
              y rmi server: 'weblogic.rmi.cluster.ReplicaAwareServerRef@9 - jvmid: '2903098842
              594628659S:203.231.15.167:[5001,5001,5002,5002,5001,5002,-1]:mydomain:cluster1',
              oid: '9', implementation: 'weblogic.jndi.internal.RootNamingNode@5f39bc''
              java.lang.IllegalArgumentException: Failed to instantiate weblogic.rmi.cluster.B
              asicReplicaHandler due to java.lang.reflect.InvocationTargetException
              at weblogic.rmi.cluster.ReplicaAwareInfo.instantiate(ReplicaAwareInfo.ja
              va:185)
              at weblogic.rmi.cluster.ReplicaAwareInfo.getReplicaHandler(ReplicaAwareI
              nfo.java:105)
              at weblogic.rmi.cluster.ReplicaAwareRemoteRef.initialize(ReplicaAwareRem
              oteRef.java:79)
              at weblogic.rmi.cluster.ClusterableRemoteRef.initialize(ClusterableRemot
              eRef.java:28)
              at weblogic.rmi.cluster.ClusterableRemoteObject.initializeRef(Clusterabl
              eRemoteObject.java:255)
              at weblogic.rmi.cluster.ClusterableRemoteObject.onBind(ClusterableRemote
              Object.java:149)
              at weblogic.jndi.internal.BasicNamingNode.rebindHere(BasicNamingNode.jav
              a:392)
              at weblogic.jndi.internal.ServerNamingNode.rebindHere(ServerNamingNode.j
              ava:142)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              2)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
              9)
              at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
              at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:296)
              So do i must use it or not???
              2. When i don't use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
              , there's no exception
              but load balancing does not happen. According to the document , there's must load
              balancing when i call home.create() method.
              my client program goes here
                   DBStateful the_ejb1 = (DBStateful) PortableRemoteObject.narrow(home.create(),
              DBStateful.class);
                   DBStateful the_ejb2 = (DBStateful) PortableRemoteObject.narrow(home.create(3),
              DBStateful.class);
              the result is like that
                   the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@4695a6)/397
                   the_ejb2 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@acf6e)/398
                   or
                   the_ejb1 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@252fdf)/380
                   the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@6a0252)/381
                   I think the result should be like under one... isn't it??
                   the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@4695a6)/397
                   the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
                   daryReplicaHandler@6a0252)/381
              In this case i think the_ejb1 and the_ejb2 must have instance in different cluster
              server
              but they go to one server .
              3. If i don't use      "<home-call-router-class-name>common.QARouter</home-call-router-class-name>",
              "<replication-type>InMemory</replication-type>" then load balancing happen but
              there's no fail-over
              So how can i get load-balancing and fail over together??
              

  • Load-balancing and fail-over between web containers and EJB containers

    When web components and EJB components are run in different OC4J instances, can we achieve load-balancing and fail-over between web containers and EJB containers?
    null

    It looks like there is clustering, but not loadbalancing available for rmi
    from the rmi.xml configuration. The application will treat any ejbs on the
    cluster as one-to-one look-ups. Orion will go out and get the first ejb
    available on the cluster. See the docs on configuring rmi.xml (and also the
    note below).
    That is a kind-of failover, because if machine A goes down, and the
    myotherAejbs.jar are on machine B too, orion will go out and get the bean
    from machine B when it can't find machine A. But it doesn't go machine A
    then machine B for each remote instance of the bean. You could also specify
    the maximum number of instances of a bean, and as one machine gets "loaded",
    orion would go to the next available machine...but that's not really
    loadbalancing.
    That is, you can set up your web-apps with ejbs, but let all of the ejbs be
    remote="true" in the orion-application.xml file:
    <?xml version="1.0"?>
    <!DOCTYPE orion-application PUBLIC "-//Evermind//DTD J2EE Application
    runtime 1.2//EN" "http://www.orionserver.com/dtds/orion-application.dtd">
    <orion-application deployment-version="1.5.2">
    <ejb-module remote="true" path="myotherAejbs.jar" />
    <ejb-module remote="true" path="myotherBejbs.jar" />
    <ejb-module remote="true" path="myotherCejbs.jar" />
    &ltweb-module id="mysite" path="mysite.war" />
    ... other stuff ...
    </orion-application>In the rmi.xml you would define your clustering:
    <cluster host="230.0.0.1" id="123" password="123abc" port="9127"
    username="cluster-user" />
    Tag that is defined if the application is to be clustered. Used to set up
    a local multicast cluster. A username and password used for the servers to
    intercommunicate also needs to be specified.
    host - The multicast host/ip to transmit and receive cluster data on. The
    default is 230.0.0.1.
    id - The id (number) of this cluster node to identify itself with in the
    cluster. The default is based on local machine IP.
    password - The password configured for cluster access. Needs to match that
    of the other nodes in the cluster.
    port - The port to transmit and receive cluster data on. The default is
    9127.
    username - The username configured for cluster access. Needs to match that
    of the other nodes in the cluster.

  • Configuring Fail Over Clustering

    I was assigned to configure the New DIA Line for our Data Center for Fail over (Back Up)
    My Existing Line is - ISP ==} Router 2811 ==} Juniper SSG320 ==} Cisco Core Switch 6500
    For Back Up We get additional line. 
    How can I configure the Fail Over Clustering the new DIA Lines?
    Please see attached my Network Diagram.
    Thank you,
    Michael  

    Dear All,
    My problem is solved by disabling antivirus.
    thanks for the support
    Sunil
    SUNIL PATEL SYSTEM ADMINISTRATOR

  • Is it possible to add hyper-V fail over clustering afterwards?

    Hi,
    We are testing Windows 2012R2 Hyper-V using only one stand alone host without fail over clustering now with few virtual machines. Is it possible to add fail over clustering afterwards and add second Hyper-V node and shared disk and move virtual
    machines there or do we have to install both nodes from scratch?
    ~ Jukka ~

    Hi Jukka,
    Inaddition, before you build hyper-v failover cluster please refer to these requirements within the article below :
    http://technet.microsoft.com/en-us/library/jj863389.aspx
    Best Regards
    Elton Ji
    We
    are trying to better understand customer views on social support experience, so your participation in this
    interview project would be greatly appreciated if you have time.
    Thanks for helping make community forums a great place.

  • OCR and voting disks on ASM, problems in case of fail-over instances

    Hi everybody
    in case at your site you :
    - have an 11.2 fail-over cluster using Grid Infrastructure (CRS, OCR, voting disks),
    where you have yourself created additional CRS resources to handle single-node db instances,
    their listener, their disks and so on (which are started only on one node at a time,
    can fail from that node and restart to another);
    - have put OCR and voting disks into an ASM diskgroup (as strongly suggested by Oracle);
    then you might have problems (as we had) because you might:
    - reach max number of diskgroups handled by an ASM instance (63 only, above which you get ORA-15068);
    - experiment delays (especially in case of multipath), find fake CRS resources, etc.
    whenever you dismount disks from one node and mount to another;
    So (if both conditions are true) you might be interested in this story,
    then please keep reading on for the boring details.
    One step backward (I'll try to keep it simple).
    Oracle Grid Infrastructure is mainly used by RAC db instances,
    which means that any db you create usually has one instance started on each node,
    and all instances access read / write the same disks from each node.
    So, ASM instance on each node will mount diskgroups in Shared Mode,
    because the same diskgroups are mounted also by other ASM instances on the other nodes.
    ASM instances have a spfile parameter CLUSTER_DATABASE=true (and this parameter implies
    that every diskgroup is mounted in Shared Mode, among other things).
    In this context, it is quite obvious that Oracle strongly recommends to put OCR and voting disks
    inside ASM: this (usually called CRS_DATA) will become diskgroup number 1
    and ASM instances will mount it before CRS starts.
    Then, additional diskgroup will be added by users, for DATA, REDO, FRA etc of each RAC db,
    and will be mounted later when a RAC db instance starts on the specific node.
    In case of fail-over cluster, where instances are not RAC type and there is
    only one instance running (on one of the nodes) at any time for each db, it is different.
    All diskgroups of db instances don't need to be mounted in Shared Mode,
    because they are used by one instance only at a time
    (on the contrary, they should be mounted in Exclusive Mode).
    Yet, if you follow Oracle advice and put OCR and voting inside ASM, then:
    - at installation OUI will start ASM instance on each node with CLUSTER_DATABASE=true;
    - the first diskgroup, which contains OCR and votings, will be mounted Shared Mode;
    - all other diskgroups, used by each db instance, will be mounted Shared Mode, too,
    even if you'll take care that they'll be mounted by one ASM instance at a time.
    At our site, for our three-nodes cluster, this fact has two consequences.
    One conseguence is that we hit ORA-15068 limit (max 63 diskgroups) earlier than expected:
    - none ot the instances on this cluster are Production (only Test, Dev, etc);
    - we planned to have usually 10 instances on each node, each of them with 3 diskgroups (DATA, REDO, FRA),
    so 30 diskgroups each node, for a total of 90 diskgroups (30 instances) on the cluster;
    - in case one node failed, surviving two should get resources of the failing node,
    in the worst case: one node with 60 diskgroups (20 instances), the other one with 30 diskgroups (10 instances)
    - in case two nodes failed, the only node survived should not be able to mount additional diskgroups
    (because of limit of max 63 diskgroup mounted by an ASM instance), so all other would remain unmounted
    and their db instances stopped (they are not Production instances);
    But it didn't worked, since ASM has parameter CLUSTER_DATABASE=true, so you cannot mount 90 diskgroups,
    you can mount 62 globally (once a diskgroup is mounted on one node, it is given a number between 2 and 63,
    and other diskgroups mounted on other nodes cannot reuse that number).
    So as a matter of fact we can mount only 21 diskgroups (about 7 instances) on each node.
    The second conseguence is that, every time our CRS handmade scripts dismount diskgroups
    from one node and mount it to another, there are delays in the range of seconds (especially with multipath).
    Also we found inside CRS log that, whenever we mounted diskgroups (on one node only), then
    behind the scenes were created on the fly additional fake resources
    of type ora*.dg, maybe to accomodate the fact that on other nodes those diskgroups were left unmounted
    (once again, instances are single-node here, and not RAC type).
    That's all.
    Did anyone go into similar problems?
    We opened a SR to Oracle asking about what options do we have here, and we are disappointed by their answer.
    Regards
    Oscar

    Hi Klaas-Jan
    - best practises require that also online redolog files are in a separate diskgroup, in case of ASM logical corruption (we are a little bit paranoid): in case DATA dg gets corrupted, you can restore Full backup plus Archived RedoLog plus Online Redolog (otherwise you will stop at the latest Archived).
    So we have 3 diskgroups for each db instance: DATA, REDO, FRA.
    - in case of fail-over cluster (active-passive), Oracle provide some templates of CRS scripts (in $CRS_HOME/crs/crs/public) that you edit and change at your will, also you might create additionale scripts in case of additional resources you might need (Oracle Agents, backups agent, file systems, monitoring tools, etc)
    About our problem, the only solution is to move OCR and voting disks from ASM and change pfile af all ASM instance (parameter CLUSTER_DATABASE from true to false ).
    Oracle aswers were a litlle bit odd:
    - first they told us to use Grid Standalone (without CRS, OCR, voting at all), but we told them that we needed a Fail-over solution
    - then they told us to use RAC Single Node, which actually has some better features, in csae of planned fail-over it might be able to migreate
    client sessions without causing a reconnect (for SELECTs only, not in case of a running transaction), but we already have a few fail-over cluster, we cannot change them all
    So we plan to move OCR and voting disks into block devices (we think that the other solution, which needs a Shared File System, will take longer).
    Thanks Marko for pointing us to OCFS2 pros / cons.
    We asked Oracle a confirmation that it supported, they said yes but it is discouraged (and also, doesn't work with OUI nor ASMCA).
    Anyway that's the simplest approach, this is a non-Prod cluster, we'll start here and if everthing is fine, after a while we'll do it also on Prod ones.
    - Note 605828.1, paragraph 5, Configuring non-raw multipath devices for Oracle Clusterware 11g (11.1.0, 11.2.0) on RHEL5/OL5
    - Note 428681.1: OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE)
    -"Grid Infrastructure Install on Linux", paragraph 3.1.6, Table 3-2
    Oscar

  • Failover cluster server - File Server role is clustered - Shadow copies do not seem to travel to other node when failing over

    Hi,
    New to 2012 and implementing a clustered environment for our File Services role.  Have got to a point where I have successfully configured the Shadow copy settings.
    Have a large (15tb) disk.  S:
    Have a VSS drive (volume shadow copy drive) V:
    Have successfully configured through Windows Explorer the Shadow copy settings.
    Created dependencies in Failcover Cluster Server console whereby S: depends on V:
    However, when I failover the resource and browse the Client Access Point share there are no entries under the "Previous Versions" tab. 
    When I visit the S: drive in windows explorer and open the Shadow copy dialogue box, there are entries showing the times and dates of the shadow copies ran when on the original node.  So the disk knows about the shadow copies that were ran on the
    original node but the "previous versions" tab has no entries to display.
    This is in a 2012 server (NOT R2 version).
    Can anyone explain what might be the reason?  Do I have an "issue" or is this by design?
    All help apprecieated!
    Kathy
    Kathleen Hayhurst Senior IT Support Analyst

    Hi,
    Please first check the requirements in following article:
    Using Shadow Copies of Shared Folders in a server cluster
    http://technet.microsoft.com/en-us/library/cc779378(v=ws.10).aspx
    Cluster-managed shadow copies can only be created in a single quorum device cluster on a disk with a Physical Disk resource. In a single node cluster or majority node set cluster without a shared cluster disk, shadow copies can only be created and managed
    locally.
    You cannot enable Shadow Copies of Shared Folders for the quorum resource, although you can enable Shadow Copies of Shared Folders for a File Share resource.
    The recurring scheduled task that generates volume shadow copies must run on the same node that currently owns the storage volume.
    The cluster resource that manages the scheduled task must be able to fail over with the Physical Disk resource that manages the storage volume.
    If you have any feedback on our support, please send to [email protected]

  • Multiple types of database and fail over clustering

    Hi,
    I have a few questions here.
    1) Can I have 2 types of databases (eg: OLTP and OLAP)run at the same time on a same machine?
    2) Can I implement a cross fail over cluster in this situation? Meaning I have 2 machines with OLAP and OLTP database instances installed in them (replica of each other), 1st machine running OLTP and 2nd running OLAP. In the situation where one of machines fail, the passive instance on the other machine takes over (back to situation on question 1).
    Thanks
    Regards
    Lai Ling

    Dear All,
    My problem is solved by disabling antivirus.
    thanks for the support
    Sunil
    SUNIL PATEL SYSTEM ADMINISTRATOR

  • Requirements on an EJB to be eligible for a fail-over

    Hi all,
              I was reading the EJB developer guide for weblogic server 9.2. When talking about the fail-over feature the guide said
              "EJB failover requiers that bean methods must be idempotent and configured as such in weblogic-ejb-jar.xml"
              There are two points in this statement.
              1) Fail overs must be configured
              This is straight forward.
              2) The bean methods must be idempotent.
              I don't really understand this point. Does this suggest that the bean methods should conform to some guide lines? If so what are they?
              Probably these are clarified in some other document or other resources. Being impatient and a little lazy I would love to have this clarified in the forum.
              Thanks in advance,
              - Madhu

    Daniel,
    I think since this will be the ONLY system that will be running as a DC providing ADDS and the Direct access server, i should follow this advice from the article you sent:
    For users who never connect directly to the Contoso intranet or through a VPN, they must use the DirectAccess
    Offline Domain Join process to initially join the appropriate domain and configure DirectAccess. When this process
    is complete, the users log on normally and have the same experience as if they were directly connected to the Contoso intranet.
    Because remember, no user will ever connect directly to the subnet where the server is. so do an offline join First and then start managing.. Only thing im worried about is: they keep saying that the direct access function has significantly improved in windows
    8. hmmmmm many systems will be using windows 7 Pro 64Bit. Some windows 8.1 Pro 64bit. should i worry?

  • Problems with Oracle FailSafe - Primary node not failing over the DB to the

    I am using 11.1.0.7 on Windows 64 bit OS, two nodes clustered at OS level. The Cluster is working fine at Windows level and the shared drive fails over. However, the database does not failover when the primary node is shutdown or restarted.
    The Oracle software is on local drive on each box. The Oracle DB files and Logs are on shared drive.

    Is the database listed in your cluster group that you are failing over?

  • Support for fail over/clustering?

    Is there any support for fail over/clustering Oracle on a two-node Linux cluster? Is it even possible to run Oracle on a two-node Linux cluster?

    To add more, I'm unsure whether an HA product is even needed once Oracle Releases Parallel Server. Also, is it possible to run Oracle w/o Parallel Server on a cluster? If so, what does that buy you?

  • Problems with LDAP Server fail-over

    Our Xsan installed with 12 FCP, 2 MDC Xserve and 2 LDAP Xserver for fail-over.
    The 2 MDC fail-over runs well but the 2 LDAP fail-over got problems.
    The first time we up-plug the powercode of 1 xserve and the other LDAP takes over successfully but FCP users re-login takes 15 minutes. That's unacceptable.
    The fail-over never succeed after that.
    That means once the LDAP down and the backup LDAP will not take the job, we will lose everything related to user login.
    Anybody can help? Thanks a lot.

    I believe you can enter both LDAP servers in the client configuration for LDAP access. (Even though you shouldn't have to)
    IP failover is not the issue, your LDAP configuration is.
    Start at page 90 and work throught this document to make sure you have the clients setup properly.
    http://manuals.info.apple.com/en/MacOSXSrvr10.3_OpenDirectoryAdmin.pdf

  • RAC using SUN Geo Clusters with Fail over

    Hi ,
    My customer is in the process of investigating and deploying Sun GeoClusters to fail over a RAC from one location to another, the distance between the primary and fail over site is 1200km, they are going to use TrueCopy to replicate the storage across the sites.
    I am in the process of gathering information and need to find out more detail and if any one has any knowledge of this software.If anybody knows about the clients who are using(some urls) the same please let me know.
    Regards
    Manoj

    TrueCopy is a way of replicating storage offsite. RAC works using a single source for the database. That means that RAC can not be used simultaneously at both locations with the files being used locally.
    If my memory serves me well, Hitachi TrueCopy was OSCP (oracle storage compatiblity program) certified, but the OSCP program seems to be discontinued per januari 2007 (see http://www.oracle.com/technology/deploy/availability/htdocs/oscp.html)
    That means that you can use TrueCopy to replicate the storage layer to another location (according to the OSCP note), and use the replicated storage to startup the RAC database in case of failover.

Maybe you are looking for