EJB clustering fail over problem.

          I have setup 2 server cluster, WL5.1 patch 6 on NT. when client make calls to EJBs , you can see these 2 server load balancing (Round-Robin). after I stop one server in it, I got null pointer exception at client. what did I miss ?
          Thanks in advance,
          java.lang.NullPointerException
          at weblogic.rjvm.RJVMFinder.isThisHost(RJVMFinder.java:340)
          at weblogic.rjvm.RJVMFinder.isHostedByLocalRJVM(RJVMFinder.java:314)
          at weblogic.rjvm.RJVMFinder.findOrCreate(RJVMFinder.java:151)
          at weblogic.rjvm.ServerURL.findOrCreateRJVM(ServerURL.java:200)
          at weblogic.jndi.WLInitialContextFactoryDelegate.getInitialContext(WLInitialContextFactoryDelegate.java:19
          at weblogic.jndi.WLInitialContextFactoryDelegate.getInitialContext(WLInitialContextFactoryDelegate.java:14
          at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:123)
          at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:671)
          at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:242)
          at javax.naming.InitialContext.init(InitialContext.java:218)
          at javax.naming.InitialContext.<init>(InitialContext.java:194)
          at com.avinamart.ApplicationServices.ServiceManager.getInitialContext(ServiceManager.java:66)
          at com.avinamart.ApplicationServices.ServiceManager._getApplicationService(ServiceManager.java:117)
          at com.avinamart.ApplicationServices.ServiceManager.getOrganizationService(ServiceManager.java:210)
          at com.avinamart.WebInterface.EPASSServicesReference.getOrganizationService(EPASSServicesReference.java:77
          at com.avinamart.WebInterface.EPASSReferences.getBusinessUserStruct(EPASSReferences.java:367)
          at com.avinamart.WebInterface.EPASSReferences.getUserPath(EPASSReferences.java:200)
          at jsp_servlet._en._detail_95_org._jspService(_detail_95_org.java:326)
          at weblogic.servlet.jsp.JspBase.service(JspBase.java:27)
          at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:105)
          at weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImpl.java:742)
          at weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImpl.java:686)
          at weblogic.servlet.internal.ServletContextManager.invokeServlet(ServletContextManager.java:247)
          at weblogic.socket.MuxableSocketHTTP.invokeServlet(MuxableSocketHTTP.java:361)
          at weblogic.socket.MuxableSocketHTTP.execute(MuxableSocketHTTP.java:261)
          at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)

What's the url for your lookup?
          Is it mapped to one server or multiple servers?
          - Prasad
          Tony Lu wrote:
          > I have setup 2 server cluster, WL5.1 patch 6 on NT. when client make calls to EJBs , you can see these 2 server load balancing (Round-Robin). after I stop one server in it, I got null pointer exception at client. what did I miss ?
          >
          > Thanks in advance,
          >
          > java.lang.NullPointerException
          > at weblogic.rjvm.RJVMFinder.isThisHost(RJVMFinder.java:340)
          > at weblogic.rjvm.RJVMFinder.isHostedByLocalRJVM(RJVMFinder.java:314)
          > at weblogic.rjvm.RJVMFinder.findOrCreate(RJVMFinder.java:151)
          > at weblogic.rjvm.ServerURL.findOrCreateRJVM(ServerURL.java:200)
          > at weblogic.jndi.WLInitialContextFactoryDelegate.getInitialContext(WLInitialContextFactoryDelegate.java:19
          > at weblogic.jndi.WLInitialContextFactoryDelegate.getInitialContext(WLInitialContextFactoryDelegate.java:14
          > at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:123)
          > at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:671)
          > at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:242)
          > at javax.naming.InitialContext.init(InitialContext.java:218)
          > at javax.naming.InitialContext.<init>(InitialContext.java:194)
          > at com.avinamart.ApplicationServices.ServiceManager.getInitialContext(ServiceManager.java:66)
          > at com.avinamart.ApplicationServices.ServiceManager._getApplicationService(ServiceManager.java:117)
          > at com.avinamart.ApplicationServices.ServiceManager.getOrganizationService(ServiceManager.java:210)
          > at com.avinamart.WebInterface.EPASSServicesReference.getOrganizationService(EPASSServicesReference.java:77
          > at com.avinamart.WebInterface.EPASSReferences.getBusinessUserStruct(EPASSReferences.java:367)
          > at com.avinamart.WebInterface.EPASSReferences.getUserPath(EPASSReferences.java:200)
          > at jsp_servlet._en._detail_95_org._jspService(_detail_95_org.java:326)
          > at weblogic.servlet.jsp.JspBase.service(JspBase.java:27)
          > at weblogic.servlet.internal.ServletStubImpl.invokeServlet(ServletStubImpl.java:105)
          > at weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImpl.java:742)
          > at weblogic.servlet.internal.ServletContextImpl.invokeServlet(ServletContextImpl.java:686)
          > at weblogic.servlet.internal.ServletContextManager.invokeServlet(ServletContextManager.java:247)
          > at weblogic.socket.MuxableSocketHTTP.invokeServlet(MuxableSocketHTTP.java:361)
          > at weblogic.socket.MuxableSocketHTTP.execute(MuxableSocketHTTP.java:261)
          > at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)

Similar Messages

Servlet fail-over problem

          I'm testing WebLogic clustering of servlets with in-memory-replication in Sun platform
          (wls 5.1 sp9) and using Apache plug-in.
          I did this test:
          - I configured a cluster of two servers
          - I simulate a situation of hang, in one of the two servers, filling all execution
          threads with servlets doing Thread.sleep()
          - I tried to launch a request to the cluster (a JSP request) but my request timed
          out
          after ConnectTimeoutSecs.
          Looking at the wlproxy.log it seems that the cluster attempts to failover to
          the
          secondary server (after HungServerRecoverSecs) but it doesn't respond, then it
          retries
          with the primary server and so on ( waiting every time HungServerRecoverSecs for
          a
          response) until the timeout "ConnectTimeoutSecs" is reached.
          This is very strange because the secondary server is not hung; if I launch a request
          directly to it (specifying in the URL host:port) it responds to me.
          I have also tried to specify the parameter Idempotent ON, even if the default
          is ON, but
          with no result.
          Can anyone help me?


          I solved the problem setting the parameter weblogic.system.servletThreadCount
          in the cluster properties file.
          Now another problem raised.
          When one of server of the cluster is in a status of hang the cluster
          carry out fail-over to the second server but session information is lost.
          Can anyone help me?
          "Mike Reiche" <[email protected]> wrote:
          >
          >Don't take this as absolute gospel - it is just my understanding of how
          >things
          >work.
          >
          >Since the WL server is still alive, it will accept connections. This
          >takes ConnectTimeOutSecs
          >out of the picture.
          >
          >Now you're just left with HungRecoverSeconds. If the response takes longer
          >than
          >HungRecoverSeconds, then wlproxy will deem the request to have 'timed
          >out'. If
          >it is not Idempotent, that's it, you're done. If it is Idempotent, wlproxy
          >will
          >retry - on the other wl instance. From what you describe, the second
          >one should
          >work - unless of course the second WL is also backed up with Thread.sleep()
          >-
          >then after HungRecoverSeconds, the request will be resent to an available
          >WL instance.
          >
          >"Lucia Giraldo" <[email protected]> wrote:
          >>
          >>I'm testing WebLogic clustering of servlets with in-memory-replication
          >>in Sun platform
          >>(wls 5.1 sp9) and using Apache plug-in.
          >>I did this test:
          >>- I configured a cluster of two servers
          >>- I simulate a situation of hang, in one of the two servers, filling
          >>all execution
          >>threads with servlets doing Thread.sleep()
          >>- I tried to launch a request to the cluster (a JSP request) but my
          >request
          >>timed
          >>out
          >>after ConnectTimeoutSecs.
          >>Looking at the wlproxy.log it seems that the cluster attempts to failover
          >>to
          >>the
          >>secondary server (after HungServerRecoverSecs) but it doesn't respond,
          >>then it
          >>retries
          >>with the primary server and so on ( waiting every time HungServerRecoverSecs
          >>for
          >>a
          >>response) until the timeout "ConnectTimeoutSecs" is reached.
          >>This is very strange because the secondary server is not hung; if I
          >launch
          >>a request
          >>directly to it (specifying in the URL host:port) it responds to me.
          >>I have also tried to specify the parameter Idempotent ON, even if the
          >>default
          >>is ON, but
          >>with no result.
          >>Can anyone help me?
          >

Dabase fail over problem after we change Concurrency Strategy:

Hi We had Concurrency Strategy:excusive . Now we change that to Database for performace
reasons. Since we change that now when we do oracle database fail over weblogic
6.1 does not detect database fail over and it need to be rebooted.
how we can resolve this ??

Hi,
It is just faining one of the application servers, developer wrote that when installing CI, Local hostname is written in Database and SDM. We will have to do a Homogeneous system copy to change the name.
The problem is that I used Virtual SAP group name in CI and DI application servers, in SCS and ASCS we used Virtual hostnames and it is OK according to SAP developer.
The Start and instance profiles were checked and everything was fine, just the dispatcher from CI is having problems when comming from Node B to Node A.
Regards

WLS6.1sp1 stateful EJB problem = load-balancing and fail over

          I have three problem
          1. I have 2 clustered server. my weblogic-ejb-jar.xml is here
          <?xml version="1.0"?>
          <!DOCTYPE weblogic-ejb-jar PUBLIC '-//BEA Systems, Inc.//DTD WebLogic 6.0.0 EJB//EN'
          'http://www.bea.com/servers/wls600/dtd/weblogic-ejb-jar.dtd'>
          <weblogic-ejb-jar>
          <weblogic-enterprise-bean>
               <ejb-name>DBStatefulEJB</ejb-name>
               <stateful-session-descriptor>
               <stateful-session-cache>
                    <max-beans-in-cache>100</max-beans-in-cache>
                    <idle-timeout-seconds>120</idle-timeout-seconds>
               </stateful-session-cache>
               <stateful-session-clustering>
                    <home-is-clusterable>true</home-is-clusterable>
                    <home-load-algorithm>RoundRobin</home-load-algorithm>
                    <home-call-router-class-name>common.QARouter</home-call-router-class-name>
                    <replication-type>InMemory</replication-type>
               </stateful-session-clustering>
               </stateful-session-descriptor>
               <jndi-name>com.daou.EJBS.solutions.DBStatefulBean</jndi-name>
          </weblogic-enterprise-bean>
          </weblogic-ejb-jar>
          when i use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
          and deploy this ejb, exception cause
          <Warning> <Dispatcher> <RuntimeException thrown b
          y rmi server: 'weblogic.rmi.cluster.ReplicaAwareServerRef@9 - jvmid: '2903098842
          594628659S:203.231.15.167:[5001,5001,5002,5002,5001,5002,-1]:mydomain:cluster1',
          oid: '9', implementation: 'weblogic.jndi.internal.RootNamingNode@5f39bc''
          java.lang.IllegalArgumentException: Failed to instantiate weblogic.rmi.cluster.B
          asicReplicaHandler due to java.lang.reflect.InvocationTargetException
          at weblogic.rmi.cluster.ReplicaAwareInfo.instantiate(ReplicaAwareInfo.ja
          va:185)
          at weblogic.rmi.cluster.ReplicaAwareInfo.getReplicaHandler(ReplicaAwareI
          nfo.java:105)
          at weblogic.rmi.cluster.ReplicaAwareRemoteRef.initialize(ReplicaAwareRem
          oteRef.java:79)
          at weblogic.rmi.cluster.ClusterableRemoteRef.initialize(ClusterableRemot
          eRef.java:28)
          at weblogic.rmi.cluster.ClusterableRemoteObject.initializeRef(Clusterabl
          eRemoteObject.java:255)
          at weblogic.rmi.cluster.ClusterableRemoteObject.onBind(ClusterableRemote
          Object.java:149)
          at weblogic.jndi.internal.BasicNamingNode.rebindHere(BasicNamingNode.jav
          a:392)
          at weblogic.jndi.internal.ServerNamingNode.rebindHere(ServerNamingNode.j
          ava:142)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          2)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
          at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:296)
          So do i must use it or not???
          2. When i don't use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
          , there's no exception
          but load balancing does not happen. According to the document , there's must load
          balancing when i call home.create() method.
          my client program goes here
               DBStateful the_ejb1 = (DBStateful) PortableRemoteObject.narrow(home.create(),
          DBStateful.class);
               DBStateful the_ejb2 = (DBStateful) PortableRemoteObject.narrow(home.create(3),
          DBStateful.class);
          the result is like that
               the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@4695a6)/397
               the_ejb2 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@acf6e)/398
               or
               the_ejb1 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@252fdf)/380
               the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@6a0252)/381
               I think the result should be like under one... isn't it??
               the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@4695a6)/397
               the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@6a0252)/381
          In this case i think the_ejb1 and the_ejb2 must have instance in different cluster
          server
          but they go to one server .
          3. If i don't use      "<home-call-router-class-name>common.QARouter</home-call-router-class-name>",
          "<replication-type>InMemory</replication-type>" then load balancing happen but
          there's no fail-over
          So how can i get load-balancing and fail over together??


          I have three problem
          1. I have 2 clustered server. my weblogic-ejb-jar.xml is here
          <?xml version="1.0"?>
          <!DOCTYPE weblogic-ejb-jar PUBLIC '-//BEA Systems, Inc.//DTD WebLogic 6.0.0 EJB//EN'
          'http://www.bea.com/servers/wls600/dtd/weblogic-ejb-jar.dtd'>
          <weblogic-ejb-jar>
          <weblogic-enterprise-bean>
               <ejb-name>DBStatefulEJB</ejb-name>
               <stateful-session-descriptor>
               <stateful-session-cache>
                    <max-beans-in-cache>100</max-beans-in-cache>
                    <idle-timeout-seconds>120</idle-timeout-seconds>
               </stateful-session-cache>
               <stateful-session-clustering>
                    <home-is-clusterable>true</home-is-clusterable>
                    <home-load-algorithm>RoundRobin</home-load-algorithm>
                    <home-call-router-class-name>common.QARouter</home-call-router-class-name>
                    <replication-type>InMemory</replication-type>
               </stateful-session-clustering>
               </stateful-session-descriptor>
               <jndi-name>com.daou.EJBS.solutions.DBStatefulBean</jndi-name>
          </weblogic-enterprise-bean>
          </weblogic-ejb-jar>
          when i use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
          and deploy this ejb, exception cause
          <Warning> <Dispatcher> <RuntimeException thrown b
          y rmi server: 'weblogic.rmi.cluster.ReplicaAwareServerRef@9 - jvmid: '2903098842
          594628659S:203.231.15.167:[5001,5001,5002,5002,5001,5002,-1]:mydomain:cluster1',
          oid: '9', implementation: 'weblogic.jndi.internal.RootNamingNode@5f39bc''
          java.lang.IllegalArgumentException: Failed to instantiate weblogic.rmi.cluster.B
          asicReplicaHandler due to java.lang.reflect.InvocationTargetException
          at weblogic.rmi.cluster.ReplicaAwareInfo.instantiate(ReplicaAwareInfo.ja
          va:185)
          at weblogic.rmi.cluster.ReplicaAwareInfo.getReplicaHandler(ReplicaAwareI
          nfo.java:105)
          at weblogic.rmi.cluster.ReplicaAwareRemoteRef.initialize(ReplicaAwareRem
          oteRef.java:79)
          at weblogic.rmi.cluster.ClusterableRemoteRef.initialize(ClusterableRemot
          eRef.java:28)
          at weblogic.rmi.cluster.ClusterableRemoteObject.initializeRef(Clusterabl
          eRemoteObject.java:255)
          at weblogic.rmi.cluster.ClusterableRemoteObject.onBind(ClusterableRemote
          Object.java:149)
          at weblogic.jndi.internal.BasicNamingNode.rebindHere(BasicNamingNode.jav
          a:392)
          at weblogic.jndi.internal.ServerNamingNode.rebindHere(ServerNamingNode.j
          ava:142)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          2)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.BasicNamingNode.rebind(BasicNamingNode.java:36
          9)
          at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
          at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:296)
          So do i must use it or not???
          2. When i don't use "<home-call-router-class-name>common.QARouter</home-call-router-class-name>"
          , there's no exception
          but load balancing does not happen. According to the document , there's must load
          balancing when i call home.create() method.
          my client program goes here
               DBStateful the_ejb1 = (DBStateful) PortableRemoteObject.narrow(home.create(),
          DBStateful.class);
               DBStateful the_ejb2 = (DBStateful) PortableRemoteObject.narrow(home.create(3),
          DBStateful.class);
          the result is like that
               the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@4695a6)/397
               the_ejb2 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@acf6e)/398
               or
               the_ejb1 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@252fdf)/380
               the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@6a0252)/381
               I think the result should be like under one... isn't it??
               the_ejb1 = ClusterableRemoteRef(203.231.15.167 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@4695a6)/397
               the_ejb2 = ClusterableRemoteRef(203.231.15.125 weblogic.rmi.cluster.PrimarySecon
               daryReplicaHandler@6a0252)/381
          In this case i think the_ejb1 and the_ejb2 must have instance in different cluster
          server
          but they go to one server .
          3. If i don't use      "<home-call-router-class-name>common.QARouter</home-call-router-class-name>",
          "<replication-type>InMemory</replication-type>" then load balancing happen but
          there's no fail-over
          So how can i get load-balancing and fail over together??

Load-balancing and fail-over between web containers and EJB containers

When web components and EJB components are run in different OC4J instances, can we achieve load-balancing and fail-over between web containers and EJB containers?
null

It looks like there is clustering, but not loadbalancing available for rmi
from the rmi.xml configuration. The application will treat any ejbs on the
cluster as one-to-one look-ups. Orion will go out and get the first ejb
available on the cluster. See the docs on configuring rmi.xml (and also the
note below).
That is a kind-of failover, because if machine A goes down, and the
myotherAejbs.jar are on machine B too, orion will go out and get the bean
from machine B when it can't find machine A. But it doesn't go machine A
then machine B for each remote instance of the bean. You could also specify
the maximum number of instances of a bean, and as one machine gets "loaded",
orion would go to the next available machine...but that's not really
loadbalancing.
That is, you can set up your web-apps with ejbs, but let all of the ejbs be
remote="true" in the orion-application.xml file:
<?xml version="1.0"?>
<!DOCTYPE orion-application PUBLIC "-//Evermind//DTD J2EE Application
runtime 1.2//EN" "http://www.orionserver.com/dtds/orion-application.dtd">
<orion-application deployment-version="1.5.2">
<ejb-module remote="true" path="myotherAejbs.jar" />
<ejb-module remote="true" path="myotherBejbs.jar" />
<ejb-module remote="true" path="myotherCejbs.jar" />
&ltweb-module id="mysite" path="mysite.war" />
... other stuff ...
</orion-application>In the rmi.xml you would define your clustering:
<cluster host="230.0.0.1" id="123" password="123abc" port="9127"
username="cluster-user" />
Tag that is defined if the application is to be clustered. Used to set up
a local multicast cluster. A username and password used for the servers to
intercommunicate also needs to be specified.
host - The multicast host/ip to transmit and receive cluster data on. The
default is 230.0.0.1.
id - The id (number) of this cluster node to identify itself with in the
cluster. The default is based on local machine IP.
password - The password configured for cluster access. Needs to match that
of the other nodes in the cluster.
port - The port to transmit and receive cluster data on. The default is
9127.
username - The username configured for cluster access. Needs to match that
of the other nodes in the cluster.

Configuring Fail Over Clustering

I was assigned to configure the New DIA Line for our Data Center for Fail over (Back Up)
My Existing Line is - ISP ==} Router 2811 ==} Juniper SSG320 ==} Cisco Core Switch 6500
For Back Up We get additional line.
How can I configure the Fail Over Clustering the new DIA Lines?
Please see attached my Network Diagram.
Thank you,
Michael

Dear All,
My problem is solved by disabling antivirus.
thanks for the support
Sunil
SUNIL PATEL SYSTEM ADMINISTRATOR

Is it possible to add hyper-V fail over clustering afterwards?

Hi,
We are testing Windows 2012R2 Hyper-V using only one stand alone host without fail over clustering now with few virtual machines. Is it possible to add fail over clustering afterwards and add second Hyper-V node and shared disk and move virtual
machines there or do we have to install both nodes from scratch?
~ Jukka ~

Hi Jukka,
Inaddition, before you build hyper-v failover cluster please refer to these requirements within the article below :
http://technet.microsoft.com/en-us/library/jj863389.aspx
Best Regards
Elton Ji
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place.

OCR and voting disks on ASM, problems in case of fail-over instances

Hi everybody
in case at your site you :
- have an 11.2 fail-over cluster using Grid Infrastructure (CRS, OCR, voting disks),
where you have yourself created additional CRS resources to handle single-node db instances,
their listener, their disks and so on (which are started only on one node at a time,
can fail from that node and restart to another);
- have put OCR and voting disks into an ASM diskgroup (as strongly suggested by Oracle);
then you might have problems (as we had) because you might:
- reach max number of diskgroups handled by an ASM instance (63 only, above which you get ORA-15068);
- experiment delays (especially in case of multipath), find fake CRS resources, etc.
whenever you dismount disks from one node and mount to another;
So (if both conditions are true) you might be interested in this story,
then please keep reading on for the boring details.
One step backward (I'll try to keep it simple).
Oracle Grid Infrastructure is mainly used by RAC db instances,
which means that any db you create usually has one instance started on each node,
and all instances access read / write the same disks from each node.
So, ASM instance on each node will mount diskgroups in Shared Mode,
because the same diskgroups are mounted also by other ASM instances on the other nodes.
ASM instances have a spfile parameter CLUSTER_DATABASE=true (and this parameter implies
that every diskgroup is mounted in Shared Mode, among other things).
In this context, it is quite obvious that Oracle strongly recommends to put OCR and voting disks
inside ASM: this (usually called CRS_DATA) will become diskgroup number 1
and ASM instances will mount it before CRS starts.
Then, additional diskgroup will be added by users, for DATA, REDO, FRA etc of each RAC db,
and will be mounted later when a RAC db instance starts on the specific node.
In case of fail-over cluster, where instances are not RAC type and there is
only one instance running (on one of the nodes) at any time for each db, it is different.
All diskgroups of db instances don't need to be mounted in Shared Mode,
because they are used by one instance only at a time
(on the contrary, they should be mounted in Exclusive Mode).
Yet, if you follow Oracle advice and put OCR and voting inside ASM, then:
- at installation OUI will start ASM instance on each node with CLUSTER_DATABASE=true;
- the first diskgroup, which contains OCR and votings, will be mounted Shared Mode;
- all other diskgroups, used by each db instance, will be mounted Shared Mode, too,
even if you'll take care that they'll be mounted by one ASM instance at a time.
At our site, for our three-nodes cluster, this fact has two consequences.
One conseguence is that we hit ORA-15068 limit (max 63 diskgroups) earlier than expected:
- none ot the instances on this cluster are Production (only Test, Dev, etc);
- we planned to have usually 10 instances on each node, each of them with 3 diskgroups (DATA, REDO, FRA),
so 30 diskgroups each node, for a total of 90 diskgroups (30 instances) on the cluster;
- in case one node failed, surviving two should get resources of the failing node,
in the worst case: one node with 60 diskgroups (20 instances), the other one with 30 diskgroups (10 instances)
- in case two nodes failed, the only node survived should not be able to mount additional diskgroups
(because of limit of max 63 diskgroup mounted by an ASM instance), so all other would remain unmounted
and their db instances stopped (they are not Production instances);
But it didn't worked, since ASM has parameter CLUSTER_DATABASE=true, so you cannot mount 90 diskgroups,
you can mount 62 globally (once a diskgroup is mounted on one node, it is given a number between 2 and 63,
and other diskgroups mounted on other nodes cannot reuse that number).
So as a matter of fact we can mount only 21 diskgroups (about 7 instances) on each node.
The second conseguence is that, every time our CRS handmade scripts dismount diskgroups
from one node and mount it to another, there are delays in the range of seconds (especially with multipath).
Also we found inside CRS log that, whenever we mounted diskgroups (on one node only), then
behind the scenes were created on the fly additional fake resources
of type ora*.dg, maybe to accomodate the fact that on other nodes those diskgroups were left unmounted
(once again, instances are single-node here, and not RAC type).
That's all.
Did anyone go into similar problems?
We opened a SR to Oracle asking about what options do we have here, and we are disappointed by their answer.
Regards
Oscar

Hi Klaas-Jan
- best practises require that also online redolog files are in a separate diskgroup, in case of ASM logical corruption (we are a little bit paranoid): in case DATA dg gets corrupted, you can restore Full backup plus Archived RedoLog plus Online Redolog (otherwise you will stop at the latest Archived).
So we have 3 diskgroups for each db instance: DATA, REDO, FRA.
- in case of fail-over cluster (active-passive), Oracle provide some templates of CRS scripts (in $CRS_HOME/crs/crs/public) that you edit and change at your will, also you might create additionale scripts in case of additional resources you might need (Oracle Agents, backups agent, file systems, monitoring tools, etc)
About our problem, the only solution is to move OCR and voting disks from ASM and change pfile af all ASM instance (parameter CLUSTER_DATABASE from true to false ).
Oracle aswers were a litlle bit odd:
- first they told us to use Grid Standalone (without CRS, OCR, voting at all), but we told them that we needed a Fail-over solution
- then they told us to use RAC Single Node, which actually has some better features, in csae of planned fail-over it might be able to migreate
client sessions without causing a reconnect (for SELECTs only, not in case of a running transaction), but we already have a few fail-over cluster, we cannot change them all
So we plan to move OCR and voting disks into block devices (we think that the other solution, which needs a Shared File System, will take longer).
Thanks Marko for pointing us to OCFS2 pros / cons.
We asked Oracle a confirmation that it supported, they said yes but it is discouraged (and also, doesn't work with OUI nor ASMCA).
Anyway that's the simplest approach, this is a non-Prod cluster, we'll start here and if everthing is fine, after a while we'll do it also on Prod ones.
- Note 605828.1, paragraph 5, Configuring non-raw multipath devices for Oracle Clusterware 11g (11.1.0, 11.2.0) on RHEL5/OL5
- Note 428681.1: OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE)
-"Grid Infrastructure Install on Linux", paragraph 3.1.6, Table 3-2
Oscar

Failover cluster server - File Server role is clustered - Shadow copies do not seem to travel to other node when failing over

Hi,
New to 2012 and implementing a clustered environment for our File Services role. Have got to a point where I have successfully configured the Shadow copy settings.
Have a large (15tb) disk. S:
Have a VSS drive (volume shadow copy drive) V:
Have successfully configured through Windows Explorer the Shadow copy settings.
Created dependencies in Failcover Cluster Server console whereby S: depends on V:
However, when I failover the resource and browse the Client Access Point share there are no entries under the "Previous Versions" tab.
When I visit the S: drive in windows explorer and open the Shadow copy dialogue box, there are entries showing the times and dates of the shadow copies ran when on the original node. So the disk knows about the shadow copies that were ran on the
original node but the "previous versions" tab has no entries to display.
This is in a 2012 server (NOT R2 version).
Can anyone explain what might be the reason? Do I have an "issue" or is this by design?
All help apprecieated!
Kathy
Kathleen Hayhurst Senior IT Support Analyst

Hi,
Please first check the requirements in following article:
Using Shadow Copies of Shared Folders in a server cluster
http://technet.microsoft.com/en-us/library/cc779378(v=ws.10).aspx
Cluster-managed shadow copies can only be created in a single quorum device cluster on a disk with a Physical Disk resource. In a single node cluster or majority node set cluster without a shared cluster disk, shadow copies can only be created and managed
locally.
You cannot enable Shadow Copies of Shared Folders for the quorum resource, although you can enable Shadow Copies of Shared Folders for a File Share resource.
The recurring scheduled task that generates volume shadow copies must run on the same node that currently owns the storage volume.
The cluster resource that manages the scheduled task must be able to fail over with the Physical Disk resource that manages the storage volume.
If you have any feedback on our support, please send to [email protected]

Multiple types of database and fail over clustering

Hi,
I have a few questions here.
1) Can I have 2 types of databases (eg: OLTP and OLAP)run at the same time on a same machine?
2) Can I implement a cross fail over cluster in this situation? Meaning I have 2 machines with OLAP and OLTP database instances installed in them (replica of each other), 1st machine running OLTP and 2nd running OLAP. In the situation where one of machines fail, the passive instance on the other machine takes over (back to situation on question 1).
Thanks
Regards
Lai Ling

Dear All,
My problem is solved by disabling antivirus.
thanks for the support
Sunil
SUNIL PATEL SYSTEM ADMINISTRATOR

Requirements on an EJB to be eligible for a fail-over

Hi all,
          I was reading the EJB developer guide for weblogic server 9.2. When talking about the fail-over feature the guide said
          "EJB failover requiers that bean methods must be idempotent and configured as such in weblogic-ejb-jar.xml"
          There are two points in this statement.
          1) Fail overs must be configured
          This is straight forward.
          2) The bean methods must be idempotent.
          I don't really understand this point. Does this suggest that the bean methods should conform to some guide lines? If so what are they?
          Probably these are clarified in some other document or other resources. Being impatient and a little lazy I would love to have this clarified in the forum.
          Thanks in advance,
          - Madhu

Daniel,
I think since this will be the ONLY system that will be running as a DC providing ADDS and the Direct access server, i should follow this advice from the article you sent:
For users who never connect directly to the Contoso intranet or through a VPN, they must use the DirectAccess
Offline Domain Join process to initially join the appropriate domain and configure DirectAccess. When this process
is complete, the users log on normally and have the same experience as if they were directly connected to the Contoso intranet.
Because remember, no user will ever connect directly to the subnet where the server is. so do an offline join First and then start managing.. Only thing im worried about is: they keep saying that the direct access function has significantly improved in windows
8. hmmmmm many systems will be using windows 7 Pro 64Bit. Some windows 8.1 Pro 64bit. should i worry?

Problems with Oracle FailSafe - Primary node not failing over the DB to the

I am using 11.1.0.7 on Windows 64 bit OS, two nodes clustered at OS level. The Cluster is working fine at Windows level and the shared drive fails over. However, the database does not failover when the primary node is shutdown or restarted.
The Oracle software is on local drive on each box. The Oracle DB files and Logs are on shared drive.

Is the database listed in your cluster group that you are failing over?

Support for fail over/clustering?

Is there any support for fail over/clustering Oracle on a two-node Linux cluster? Is it even possible to run Oracle on a two-node Linux cluster?

To add more, I'm unsure whether an HA product is even needed once Oracle Releases Parallel Server. Also, is it possible to run Oracle w/o Parallel Server on a cluster? If so, what does that buy you?

Problems with LDAP Server fail-over

Our Xsan installed with 12 FCP, 2 MDC Xserve and 2 LDAP Xserver for fail-over.
The 2 MDC fail-over runs well but the 2 LDAP fail-over got problems.
The first time we up-plug the powercode of 1 xserve and the other LDAP takes over successfully but FCP users re-login takes 15 minutes. That's unacceptable.
The fail-over never succeed after that.
That means once the LDAP down and the backup LDAP will not take the job, we will lose everything related to user login.
Anybody can help? Thanks a lot.

I believe you can enter both LDAP servers in the client configuration for LDAP access. (Even though you shouldn't have to)
IP failover is not the issue, your LDAP configuration is.
Start at page 90 and work throught this document to make sure you have the clients setup properly.
http://manuals.info.apple.com/en/MacOSXSrvr10.3_OpenDirectoryAdmin.pdf

RAC using SUN Geo Clusters with Fail over

Hi ,
My customer is in the process of investigating and deploying Sun GeoClusters to fail over a RAC from one location to another, the distance between the primary and fail over site is 1200km, they are going to use TrueCopy to replicate the storage across the sites.
I am in the process of gathering information and need to find out more detail and if any one has any knowledge of this software.If anybody knows about the clients who are using(some urls) the same please let me know.
Regards
Manoj

TrueCopy is a way of replicating storage offsite. RAC works using a single source for the database. That means that RAC can not be used simultaneously at both locations with the files being used locally.
If my memory serves me well, Hitachi TrueCopy was OSCP (oracle storage compatiblity program) certified, but the OSCP program seems to be discontinued per januari 2007 (see http://www.oracle.com/technology/deploy/availability/htdocs/oscp.html)
That means that you can use TrueCopy to replicate the storage layer to another location (according to the OSCP note), and use the replicated storage to startup the RAC database in case of failover.

EJB clustering fail over problem.

Similar Messages

Maybe you are looking for