Client failover troubles

Hello again
I configured Client failover to OCI Clients as follow:
tnsnames on app client:
DATAGUARDOCI =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = PRIMARY)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = STANDBY)(PORT = 1521))
(LOAD_BALANCE = yes)
(CONNECT_DATA=
(SERVICE_NAME=DATAGUARD)
Service name on Primary:
begin
dbms_service.create_service
service_name => 'service',
network_name => 'service',
FAILOVER_METHOD => 'BASIC',
FAILOVER_TYPE => 'SELECT',
FAILOVER_RETRIES => 200,
FAILOVER_DELAY => 1);
end;
Trigger as Uwe Hesse shows us:
create trigger myapptrigg after startup on database
declare
v_role varchar(30);
begin
select database_role into v_role from v$database;
if v_role = 'PRIMARY' then
DBMS_SERVICE.START_SERVICE('myapp');
else
DBMS_SERVICE.STOP_SERVICE('myapp');
end if;
end;
I shutdown standby and start it up.
I make many connections using DATAGUARDOCI tnsnames entry and sometimes I connect to primary and sometimes to standby. I don't know why trigger doesn't work on standby.
Thanks for helping

When you do
lsnrctl statuson your Standby System, does the service DATAGUARD show up? It shouldn't while the DB there is in Standby Role.
BTW the name is a little misleading, don't you think? The service is supposed to be offered on the respective Primary for productive usage by applications.
All I can say is that the setup described in the posting works exactly as listed there - because I followed our documentation :-)
LOAD_BALANCE should play no role here, because only one (the Primary) Database is supposed to offer the service. That seems to be exactly your problem here:
You offer the service on the Standby also. Stop doing that :-)
Kind regards
Uwe Hesse
http://uhesse.wordpress.com

Similar Messages

Constantly getting "Reopen for Clustered Client Failover registered application has failed for FileObject " error in CCFilter eventlog.

Hi everybody.
Hope somebody will be able to help me with the following issue.
I have the following environment configuration:
1. WFC cluster (cluster1) contains 3 nodes - sql1,sql2,sql3
2. sql1 and sql2 can run single shared instance SQL server
3. Node sql3 is a standalone SQL server.
4. AlwaysON is turned on shared instance and standalone SQL servers and availability group have been configured for multiple DBs. So sql3 is a replica of shared instance.
5. All this runs on Vmware as a virtual machine.
I'm constantly getting following error in Microsoft-Windows-CCFFilter/Operational logfile when I execute SQL DB/Transaction log backup maintenance plan on my shared instance SQL server (sql1 or sql2):
Log Name:      Microsoft-Windows-CCFFilter/Operational
Source:        Microsoft-Windows-CCFFilter
Date:          10/24/2014 6:00:12 AM
Event ID:      2000
Task Category: None
Level:         Error
Keywords:
User:          DOMAIN\wfcsqlsvc
Computer:      SQL1
Description:
Reopen for Clustered Client Failover registered application has failed for FileObject 0xfffffa801cbb08a0 to \SQL3\Backups\Logs\DB1\DB1_backup_2014_10_24_060003_3960528.trn with status 0xC0000034
Getting multiple mentioned errors for every single DB I'm running my backup maintenance plan against. The maintenance plan gets executed on SQL1 which is shared instance machine.
Any ideas of what can cause this and how to fix it.
Thanks in advance.

Yes. I'm doing backup on primary replica in the AlwaysOn Availability Group. And this primary replica itself is a WFC shared SQL instance.
I've double checked Maintenance Plan's History and Agent's logs. No Error, no warning, nothing. And by the way DB full and transaction log backups gets created as they should. By that I mean that 'For availability databases, ignore Replica Priority for Backup
and Backup on Primary Settings' property is turned on and this allows me to do backups from primary replica.
As you've written I've clear the maintenance plan setting ''For availability databases, ignore Replica Priority for Backup and Backup on Primary Settings.', and configure the availability group's AUTOMATED_BACKUP_PREFERENCE setting to allow backup from any
replica for certain availability group. But still nothing. Getting the same error.
This is how AVG1 are configured regarding Backup preferences:
For example this subplan from Maintenance plan cause mentioned errors:

EJB access from C++ client / Failover+LoadBalancing

We are accessing an EJB from VisiBroker for C++. The EJB is deployed in a WLS cluster.
Trying to achieve something like 'failover', we discovered that VisiBroker supports
multiloc addresses, so we are able to start our client as follows:
./client -ORBInitRef NameService=corbaloc::server1:8001,:server2:8002/NameService
As a result, server2 is only used when NameService of server1 is not available.
After connecting to a distinct NameService in a cluster, all further IIOP calls
are routed to this cluster server only. If the server shuts down, a new NameService
connection has to be made to get access to the other server and its objects.
Is this correct so far?
For the idea of 'load balancing' our EJB accesses, we didn't find a solution -
so it seems that this is completely impossible or is there any trick we could
use?
Thanks for your help
ml

"Marko Lorentz" <[email protected]> writes:
We are accessing an EJB from VisiBroker for C++. The EJB is deployed in a WLS cluster.
Trying to achieve something like 'failover', we discovered that VisiBroker supports
multiloc addresses, so we are able to start our client as follows:
./client -ORBInitRef NameService=corbaloc::server1:8001,:server2:8002/NameService
As a result, server2 is only used when NameService of server1 is not available.
After connecting to a distinct NameService in a cluster, all further IIOP calls
are routed to this cluster server only. If the server shuts down, a new NameService
connection has to be made to get access to the other server and its objects.
Is this correct so far?
For the idea of 'load balancing' our EJB accesses, we didn't find a solution -
so it seems that this is completely impossible or is there any trick we could
use?If you use the Tuxedo 8.1 C++ client, then you will get per-request
load-balancing and failover. The C++ client is free to WLS licensees.
andy

Weblogic Migrateable Server JSM client failover issue

Hello.
I am experiencing a failover issue with JMS client (i am using wlfullclient.jar)
I've setup a migratable JMS Server (My cluster contains 2 servers: ManagedServer1 and ManagedServer2) and created a module with Sub Deployment to this server. On the module i created Queue and Topic.
Whenever i forecfully ManagedServer2 (That is where JMS Server resides), JMS Client fails to reconnect with the following exception.
How come dispatcher does not try to connect to ManagedServer1, and keeps trying to connect to ManagedServer2?
weblogic.jms.common.JMSException: Error creating session
     at weblogic.jms.dispatcher.DispatcherAdapter.convertToJMSExceptionAndThrow(DispatcherAdapter.java:110)
     at weblogic.jms.dispatcher.DispatcherAdapter.dispatchSync(DispatcherAdapter.java:45)
     at weblogic.jms.client.JMSSession.consumerCreate(JMSSession.java:2914)
     at weblogic.jms.client.JMSSession.setupConsumer(JMSSession.java:2687)
     at weblogic.jms.client.JMSSession.createConsumer(JMSSession.java:2628)
     at weblogic.jms.client.JMSSession.createConsumer(JMSSession.java:2608)
     at weblogic.jms.client.WLSessionImpl.createConsumer(WLSessionImpl.java:880)
     at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.createConsumer(AbstractPollingMessageListenerContainer.java:477)
     at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.createListenerConsumer(AbstractPollingMessageListenerContainer.java:221)
     at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.initResourcesIfNecessary(DefaultMessageListenerContainer.java:1005)
     at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:981)
     at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:974)
     at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:876)
     at java.lang.Thread.run(Thread.java:619)
Caused by: weblogic.jms.common.JMSException: Error creating session
     at weblogic.jms.frontend.FESession.setUpBackEndSession(FESession.java:745)
     at weblogic.jms.frontend.FESession.consumerCreate(FESession.java:963)
     at weblogic.jms.frontend.FESession.invoke(FESession.java:2931)
     at weblogic.messaging.dispatcher.Request.wrappedFiniteStateMachine(Request.java:961)
     at weblogic.messaging.dispatcher.DispatcherServerRef.invoke(DispatcherServerRef.java:276)
     at weblogic.messaging.dispatcher.DispatcherServerRef.handleRequest(DispatcherServerRef.java:141)
     at weblogic.messaging.dispatcher.DispatcherServerRef.access$000(DispatcherServerRef.java:34)
     at weblogic.messaging.dispatcher.DispatcherServerRef$2.run(DispatcherServerRef.java:111)
     at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
     at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
Caused by: weblogic.messaging.dispatcher.DispatcherException: could not find Server ManagedServer2
     at weblogic.messaging.dispatcher.DispatcherManager.dispatcherCreate(DispatcherManager.java:176)
     at weblogic.messaging.dispatcher.DispatcherManager.dispatcherFindOrCreate(DispatcherManager.java:58)
     at weblogic.jms.dispatcher.JMSDispatcherManager.dispatcherFindOrCreate(JMSDispatcherManager.java:219)
     at weblogic.jms.dispatcher.JMSDispatcherManager.dispatcherFindOrCreateChecked(JMSDispatcherManager.java:230)
     at weblogic.jms.frontend.FESession.setUpBackEndSession(FESession.java:743)
     ... 9 more
Caused by: javax.naming.NameNotFoundException: Unable to resolve 'weblogic.messaging.dispatcher.S:ManagedServer2'. Resolved 'weblogic.messaging.dispatcher'; remaining name 'S:ManagedServer2'
     at weblogic.jndi.internal.BasicNamingNode.newNameNotFoundException(BasicNamingNode.java:1139)
     at weblogic.jndi.internal.BasicNamingNode.lookupHere(BasicNamingNode.java:252)
     at weblogic.jndi.internal.ServerNamingNode.lookupHere(ServerNamingNode.java:182)
     at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:206)
     at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
     at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
     at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
     at weblogic.jndi.internal.WLEventContextImpl.lookup(WLEventContextImpl.java:254)
     at weblogic.jndi.internal.WLContextImpl.lookup(WLContextImpl.java:380)
     at javax.naming.InitialContext.lookup(InitialContext.java:392)
     at weblogic.messaging.dispatcher.DispatcherManager.dispatcherCreate(DispatcherManager.java:172)
     ... 13 more

I am not shutting down both the managed server. Only the first managed server is shut down. As the portal ear is deployed both on admin server and all the managed servers in the cluster, I should be able to access the application through the second managed server.

Extend TCP- clients Failover

Hi,
In an attempt to failover a TCP client we had the following config file which setup with the following elements
               <heartbeat-interval>50s</heartbeat-interval>
               <heartbeat-timeout>35s</heartbeat-timeout>
When inserting the event handler for the service stopping with the following code
public static void InstallServiceEventHandler(string servicename, ServiceEventHandler eh) //this should be a generic handler in
Tangosol.Net.IService ics = CacheFactory.GetService(servicename); -->This line throws
try{
ics.ServiceStopping += eh;
catch (Exception e)
log.Error("Exception in the service Event handler insertion", e);
return;
The Exception is
{"The element 'outgoing-message-handler' in namespace 'http://schemas.tangosol.com/cache' has *invalid* child element 'heartbeat-interval' in namespace 'http://schemas.tangosol.com/cache'."}
On commenting out the hearbeat* lines the above line executes....Which is of course useless to detect Server failures without a heartbeat.
What are we doing wrong?
Thanks,
Vipin
Given below is the
<cache-config xmlns="http://schemas.tangosol.com/cache">
<caching-scheme-mapping>
<cache-mapping>
<cache-name>dist-*</cache-name>
<scheme-name>extend-direct</scheme-name>
</cache-mapping>
</caching-scheme-mapping>
<caching-schemes>
<remote-cache-scheme>
<scheme-name>extend-direct</scheme-name>
<service-name>ExtendTcpCacheService</service-name>
<initiator-config>
<tcp-initiator>
<remote-addresses>
          <socket-address>
<address>nycs00057388.us.net.intra</address>
<port>8078</port>
</socket-address>
               <socket-address>
                    <address>nycs00057389.us.net.intra</address>
                    <port>8078</port>
               </socket-address>
          </remote-addresses>
</tcp-initiator>
<outgoing-message-handler>
<request-timeout>30s</request-timeout>
               <heartbeat-interval>50s</heartbeat-interval>
               <heartbeat-timeout>35s</heartbeat-timeout>
          </outgoing-message-handler>
</initiator-config>
</remote-cache-scheme>
<remote-invocation-scheme>
<scheme-name>extend-invocation</scheme-name>
<service-name>ExtendTcpInvocationService</service-name>
<initiator-config>
<tcp-initiator>
<remote-addresses>
<socket-address>
<address>nycs00057388.us.net.intra</address>
<port>8078</port>
</socket-address>
</remote-addresses>
</tcp-initiator>
<outgoing-message-handler>

</outgoing-message-handler>
</initiator-config>
</remote-invocation-scheme>
</caching-schemes>
</cache-config>

Hi Vipin -
While I do not have a definite answer on the issue, the internal tracking number is COH-2534. While I cannot commit on dates, at last check it was being worked on for inclusion in 3.6, and the fix would likely be back-ported to 3.5.x.
I suggest that you open an SR with Oracle Support if you have not already done so, so that you can specifically request the resolution of this and the backport to 3.5.
I apologize for the inconvenience that this has caused you.
Peace,
Cameron Purdy | Oracle Coherence

10.6 server, first 10.7 client - having trouble

Just installed my first 10.7 client this week, and can't seem to get things to work. On the server, I have managed preferenced setup for all the macs in the office (energy saver, login options, open directory, time machine, PHD syncing, etc.).
If I am not pushing managed prefs to the 10.7 machine, I can login as me (to the local admin account on the mac). No other network users are shown as viable login options.
If I DO push managed prefs to the 10.7 machine, I can see all the network users as viable login options, however I can't login to my local admin account on the 10.7 machine anymore. After I enter my password, i get a little spinning pinwheel next to my password, and it never logs in, just sits like that forever. At that point, it seems that only a reboot can un-stick it. The only way I've been able to get past this is to turn off managed prefs for this machine at the server.
It seems my next option is a lot of trial and error to determine which of the managed prefs is causing the failure to login. I was hoping that someone here had a suggestion, just so as to cut down on my trial and error time.
Thanks!

I did find this thread: https://discussions.apple.com/message/16801064#16801064 that seems to say it isn't possible, but wondering if anyone has done this.

Questions on Client Failver and Fast Failover

've few questions regarding Client Failover and Fast Faiolver of Oracle Database HA. Before I ask those questions I would like to explain my environment. Below are the details.
- We have two physical locations called 'ABC' and 'PQR'
- ABC is the primary site.
- PQR is the standby site.
- In ABC, we have Oracle RAC database (11.2.0.2) with two nodes.
- In PQR, we have a single server standalone database (11.2.0.2) with ASM. This is not a RAC.
- Data Guard has been configured between ABC and PQR and it is working as expected.
- Please note that we have a licence for Active Data Guard.
- We have Oracle Identity Management products at both ABC and PQR and they are going to use RAC database as a primary database which is in ABC.
- We did not configure Data Guard broker yet.
We want to achieve below goals:
Goal 1:
Whenever RAC primary goes down completely, standby database should become primary database AUTOMATICALLY and it should allow read/write operation.
I guess this is called 'Fast Failover'. Please let me know if I am wrong.
Questions :
- To make this happen, Do I need to configure Data Guard Broker so that standy database becomes primary when RAC goes down completely with planned or unplanned outage.
- Let's say RAC goes down completely, how long does Data Guard broker take to make standby db as Primary.
- What about the client/application who already connected to RAC.
- Let's standby DB has become as primary and after sometime if RAC comes back , Does data guard automatically changes the role of RAC to primary ?
Goal 2:
As I explained above, all Oracle IDM products and applications speak to RAC database they only know about RAC database which is primary.They are not aware of standby database.
- Whenever a client session is in progress with RAC primary database, if RAC goes down completely , we would like to expect client session should get transferred standby datbase without loosing session information . However before this happens, standby database should become primary becuase client session may perform write operations.
- Whenever a client is trying to connect to RAC prmary and assume RAC is completely down, we would like to expect client connections should get transferred to standby database.
However before this happens, standby database should become primary becuase client session may perform write operations.
As per my knowledge, above scenarios are called 'client failver'. Please let me know if I am wrong.
Questions:
1. Please throw some light to achieve above features.
2. As per my understanding, before client failover happens, fast failover should have already occured and standby should get switch to primary role. I guess all this happens through TIMEOUT parameters. What are those.
Could you please help ?
Thanks

859875 wrote:
Goal 1:
Whenever RAC primary goes down completely, standby database should become primary database AUTOMATICALLY and it should allow read/write operation.
I guess this is called 'Fast Failover'. Please let me know if I am wrong.You are correct.
>
Questions :
- To make this happen, Do I need to configure Data Guard Broker so that standy database becomes primary when RAC goes down completely with planned or unplanned outage.Yes (you can also use Grid Control that will use Data Guard Broker).
>
Goal 2:
As I explained above, all Oracle IDM products and applications speak to RAC database they only know about RAC database which is primary.They are not aware of standby database.
- Whenever a client session is in progress with RAC primary database, if RAC goes down completely , we would like to expect client session should get transferred standby datbase without loosing session information:This is not possible: it is possible only for SELECT statement and only in a single RAC database.
You can find some interesting documents on MAA home page (best pratices, case studies...):
http://www.oracle.com/technetwork/database/features/availability/maa-090890.html

Client side load balancing and server side load balancing

Hello Team,
I need to know how to set up client and server side load balancing in oracle rac. What all things to be implemented like creating a service, tnsnames.ora settings etc.
And also if i used SCAN ip instead of VIP. how the settings will change.
Regards,

Hi,
please find here an Whitepaper with the information
http://www.oracle.com/technetwork/database/features/availability/maa-wp-11gr2-client-failover-173305.pdf
kind regards

Dataguard site failover scenario

Dear Gurus,
I am preparing a operation runbook with scenarios of Production environment database DR event. I have got sets of workable switchover and failover procedures but just get confused about when to trigger the failover steps? I can think about some scenarios,
1. Web/App server tier failure triggered site failover - this means database layer is healthy so no doubt we should do switchover
2. Database layer problem triggered site failover
2.a Primary site database problem
2.b Secondary site database problem
for scenario 2 I believe it can further drill down into detailed categories. I intend to write steps to fix the problem on whatever primary or secondary site then perform switchover, what do you think?
we are using Oracle EE 11gR2
Best

Hi rac100g.
Really I dont undertant your quetion clearly.
Dow you want steps automati client failover?
I think my video tutorial with helpful for you.
Please watch : http://www.mahir-quluzade.com/2012/05/oracle-data-guard-11g-overview-client.html
And you can watch my all videos about Data Guard from : http://www.mahir-quluzade.com/p/oracle-videos.html
Regards
Mahir M. Quluzade
Edited by: Mahir M. Quluzade on Jun 6, 2012 12:13 PM

[svn] 4815: Feature: Client side load balancing.

Revision: 4815
Author: [email protected]
Date: 2009-02-03 10:47:12 -0800 (Tue, 03 Feb 2009)
Log Message:
Feature: Client side load balancing.
QA: Yes
Doc: Yes
Checkintests: Pass
Reviewer: Seth
Details: Added client side code for client side load balancing as described in 2 tier messaging spec.
Modified Paths:
flex/sdk/trunk/frameworks/projects/rpc/src/mx/messaging/Channel.as
flex/sdk/trunk/frameworks/projects/rpc/src/mx/messaging/config/ServerConfig.as

Hi,
please find here an Whitepaper with the information
http://www.oracle.com/technetwork/database/features/availability/maa-wp-11gr2-client-failover-173305.pdf
kind regards

[svn] 4814: Feature: Client side load balancing.

Revision: 4814
Author: [email protected]
Date: 2009-02-03 10:44:03 -0800 (Tue, 03 Feb 2009)
Log Message:
Feature: Client side load balancing.
QA: Yes
Doc: Yes
Checkintests: Pass
Details: Added server side code for client side load balancing as described in 2 tier messaging spec.
Modified Paths:
blazeds/trunk/modules/common/src/flex/messaging/config/ClientConfigurationParser.java
blazeds/trunk/modules/common/src/flex/messaging/config/ConfigurationConstants.java
blazeds/trunk/modules/common/src/flex/messaging/config/ServicesDependencies.java
blazeds/trunk/modules/core/src/flex/messaging/endpoints/AbstractEndpoint.java

Hi,
please find here an Whitepaper with the information
http://www.oracle.com/technetwork/database/features/availability/maa-wp-11gr2-client-failover-173305.pdf
kind regards

OPS의 TAF (TRANSPARENT APPLICATION FAILOVER) 개념 및 구성

제품 : ORACLE SERVER
작성날짜 : 2004-08-13
OPS의 TAF (TRANSPARENT APPLICATION FAILOVER) 개념 및 구성 (8.1이상)
===================================================================
PURPOSE
Oracle8 부터는 OPS node 간의 TAF (Transparent Application Fail-over)가
제공된다. 즉 OPS의 한쪽 node에 fail이 발생하여도 해당 node로 접속하여
사용하던 모든 session이 사용하던 session을 잃지 않고 자동으로 정상적인
node로의 재접속이 이루어저 작업이 계속 진행하도록 하는 것이다.
이 문서에는 이 TAF에 대해서 간단히 살펴보고 실제 configuration을 기술한다.
SCOPE
Transparent Application Failover(TAF) Feature는
8i~10g Standard Edition에서는 지원하지 않는다.
Explanation
TAF가 cover하는 fail의 형태에 대한 설명과, TAF 시 지정하는 fail over의
type과 method에 대해서 설명한다.
(1) fail의 형태:
TAF는 다음과 같은 fail에 대해서 모두 TAF가 정상적으로 수행되게 된다.
단 MTS mode에 대해서는 전혀 문제가 없지만, dedicated mode의 경우는
반드시 dynamic registration형태로 구현이 되어야 정상적으로 TAF가 가능하다.
instance fail: mts의 경우는 문제가 없지만 dedicated mode의 경우는 반드시
dynamic registration 형태로 구성되어야 한다.
fail된 instance 측의 listener가 정상적이라 하더라도,
dynamic registration에 의해서 instance가 fail되면
listener로부터 deregistration되게 되어 listener 정보
를 확인 후 다른 node의 listener로 접속을 시도하게 된다.
그러나 dynamic registration을 사용하지 않게 되면 fail
된 instance 쪽의 listener는 fail된 instance 정보를
services로 보여주게 되고 해당 instance와 연결을 시도하
면서 ORA-1034: Oracle not available 오류가 발생하게 되
는 것이다.
instance & listener down: listener까지 down되게 되면 문제 발생 후
재접속 시도 시 fail된 쪽의 listener 접속이 실패하게 되고,
다른 node의 listener로 접속이 이루어지게 된다.
node down: node 자체가 down되는 경우에도 TAF는 이루어진다. 단 clinet
에 적정한 TCP configuration parameter인 keepalive 의 설정
이 요구되어진다.
node fail시 client와 server간의 작업이 진행중이라면
문제가 없지만 만약 server쪽에서 수행되는 작업이 없는
상태라면 cleint가 node가 down이 되어도 바로 인지할 수가
없다. client에서 다음 server로의 요청이 이루어지는
순간에 client가 더이상 존재하지 않는 TCP end point쪽으로
TCP packet을 보내게 되고, server node가 더이상 살아있지
않다는것을 확인하게 되는데 일반적으로 2,3분이 걸릴수
있다. node가 fail이 된경우 network에 대한 write() function
call이 오류를 return하게 되고, 이것을 client가 받은후
failover기능을 호출하게 되는 것이다.
client에서 idle한 상태에서도 server node가 down되었는지를
학인하려면 TCP keepalive를 설정해야 하며, 이 keepalive를
오라클의 connection에서 사용하려면 TNS service name에서
ENABLE=BROKEN절을 지정해 주어야한다.
DESCRIPTION절에 포함되는 이 ENABLE=BROKEN절에 대한 예제는
아래 구성 예제의 (3)번 tnsnames.ora 구성 부분에서 참조할
수 있다.
이렇게 ENABLE=BROKEN을 지정하면 network쪽 configuration인
keepalive 설정을 이용하게 되는데 이것이 일반적으로는
2 ~ 3시간으로 설정되어 있기 때문에 이값이 적당히 짧아야
TAF에서 의미가 있을 수 있다.
단 이 keepalive time이 너무 짧으면, 그리고 idle한
session이 많은 편이라면 network부하가 매우 증가할 수
있으므로 이 지정에 대해서는 os나 network administrator와
충분히 상의하여야 한다.
이 keepalive 대한 자세한 내용과 설정 방법은 <bulletin:11323:
SQL*NET DCD(DEAD CONNECTION DETECTION)과 KEEPALIVE의 관계>를
          참조한다.
(2) type: session vs. select
session은 유지하고 수행중이던 SQL문장은 모두 fail되는 session type과
DML문장은 rollback되고 select문장은 유지되는 select type이 제공된다.
select type의 경우도 fail된 instance에서만 얻을 수 있는 정보의 경우는
조회수행 도중 다음과 같은 오류를 발생시키고 중단될 수 있다.
예를 들어 해당 instance에 대한 gv$session으로부터의 조회와 같은것이 그
예이다.
ORA-25401: can not continue fetches
(3) method: basic vs. backup
fail발생시 다른 node로 session을 연결하는 basic method와,
미리 다른 node로 backup session을 연결해 두었다가 fail발생시 사용하는
backup method가 존재한다.
Example
TAF설정을 위해서는 init.ora, listener.ora, tnsnames.ora에 설정이 필요하다.
MTS mode에서는 문제가 없기 때문에 여기서는 반드시 dynamic registration으로
설정해야 하는 dedicated방식을 예로 들었다.
test는 Oracle 8.1.7.4/Sun solaris 2.8에서 수행되었다.
A/B 두 node를 가정한다.
(1)initSID.ora에서
- A node의 initSID.ora
service_names=INS1, DB1
local_listener="(address=(protocol=TCP)(host=krtest1)(port=1521))"
- B node의 initSID.ora
service_names=INS2, DB1
local_listener="(address=(protocol=TCP)(host=krtest2)(port=1521))"
service_names는 여러개를 지정가능한데, 중요한것은 두 node가 공통으로
사용할 service name한가지는 반드시 지정하여야 한다.
일반적으로 db_name을 지정하면 된다.
host=부분은 hostname이나 ip address를 지정하면 된다.
(2) listener.ora
LISTENER =
(DESCRIPTION =
(ADDRESS =
(PROTOCOL = tcp)
(HOST = krtest1)(PORT= 1521)))
B node에서는 krtest1대신 b node의 hostname혹은 ip address를 지정하면
된다
(3) tnsnames.ora은 지정하는 방법이 두가지입니다.
아래에 basic method와 backup method 두 가지 방법에 대한 예를 모두 기술한다.
이중 한가지를 사용하면 되며 backup method의 fail-over시 미리 연결된
session을 사용하므로 시간이 적게 걸릴수 있으나 반대 node에 사용안하는
session을 미리 맺어놓는것에 대한 부하가 있어 서로 장단점이 있을 수 있다.
두 설정 모두 TAF뿐 아니라 connect time fail-over도 가능한 설정이다.
즉 A node가 fail시 같은 tns service name을 이용하여서 (여기서는 opsbasic
또는 ops1) B node로 접속이 이루어진다.
address=로 정의된 address절이 위쪽을 먼저 시도하므로 정상적인 상태에서
B node로 접속을 원하는 경우는 opsbasic의 경우 krtest2를 위쪽에 적고,
ops1/ops2의 경우는 ops2를 사용하도록 한다.
여기에서 (enable=broken)설정이 되어 있는데 이것은 client machine에 설정되어
있는 TCP keepalive를 이용하는 것으로 network부하를 고려하여 설정을 제거할
수 있다.
a. basic method
krtest1의 tnsnames.ora에서는 opsbasic과 ops2에 대해서 설정해두고,
krtest2 node에서는 opsbasic과 ops1을 설정한 후, backup=ops2를
backup=ops1으로 수정하면 된다.
opsbasic =
(description=
(address_list=
     (enable=broken)
     (load_balance=off)
     (failover=on)
     (address= (protocol=tcp) (host=krtest1) (port=1521))
     (address= (protocol=tcp) (host=krtest2) (port=1521))
(connect_data =
          (service_name=DB1)
     (failover_mode=
     (type=select)
     (method=basic)
(backup=ops2))))
ops1 =
     (description =
     (enable=broken)
     (load_balance=off)
     (failover=on)
     (address=(protocol=tcp)(host=krtest1) (port=1521))
(connect_data = (service_name = DB1)))
ops2 =
     (description =
     (enable=broken)
     (load_balance=off)
     (failover=on)
(address=(protocol=tcp)(host=krtest2) (port=1521))
(connect_data = (service_name = DB1)))
b. preconnect method
아래 예제의 ops1, ops2가 모두 같은 tnsnames.ora에 정의되어 있어야 하며,
ops1을 이용하여 접속하여 krtest1을 사용시에도 미리 backup session을
krtest2에 맺어둔 상태에서 작업하게 된다.
ops1 =
(description =
(address_list =
(enable=broken)
     (load_balance=off)
     (failover=on)
     (address=(protocol=tcp)(host=krtest1) (port=1521))
     (address=(protocol=tcp)(host=krtest2) (port=1521))
(connect_data = (service_name = DB1)
(failover_mode=
     (backup=ops2)
     (type=select)
     (method=preconnect))))
ops2 =
(description =
(address_list=
     (enable=broken)
     (load_balance=off)
     (failover=on)
(address=(protocol=tcp)(host=krtest2) (port=1521))
(address=(protocol=tcp)(host=krtest1) (port=1521))
(connect_data = (service_name = DB1)
(failover_mode=
     (backup=ops1)
     (type=select)
     (method=preconnect))))
Reference Documents
-------------------

SCAN IP failover

Installed RAC11gR2 on windows server, SCAN IP will failover and fallback if shutdown and then startup one of the nodes.
However, when i unplug the public network cable, the SCAN IP can failover but won't faillback when plug in the cable again. And then find that the SCAN VIP binded to public interface on both node. Seems it's needed to relocate the SCAN IP manually.
Is it the expected behaviour?

As we all know scan IP is for client failover, my question is , if suppose by some network issue, if node1, node2 and node 3 shutdown immediate.
What will happen to the client failover under this case??
It depend what failover you is taking about.
SCAN failover - If you have a bad lucky for tree SCAN IP was running at node1,node2 and node3 the scan will failover to survive node and for a minute (time to failover and pmon register database on scan listener) the client wich try connect will get ORA-12541 TNS :no listener.
If at least one SCAN IP is not on node1,node2 or node3 at moment of failure then no ORA-* is raised.
Client Failover - The client does not use SCAN to keep a session or to failover, the client use VIP to hold connection at database, all client of node1,node2 and node3 will failover or not depending your client configuration (i.e depend of Network Configuration TAF enabled or not).

Distribution Point Offline - how long before SCCM client connects to alternative DP?

Hi, I'm doing some testing on my SCCM 2012 setup and have been attempting to test distribution point resiliency and fall-back.
I have a single primary site with two distribution points. One is in the main datacentre and is in the same boundary group as the clients and tagged as fast connection. The 2nd DP is in the DR datacentre, in a separate boundary group (no client subnets are
in this group) and the distribution point has the allow failback checked. Both DP's have the same content.
After shutting down/taking offline the main distribution point, I've then kicked of an install of a package on a W7 client. It attempts to download and stays in that state for as long as I leave it. Looking in the logs I can see both DP's returned to
the client (the DR tagged as REMOTE), it then attempts to connect to the main DP and just keeps retrying over and over.
I thought if it couldn't connect to the main DP it would then failback to the DR DP, but it doesn't appear to do this. Is there a timeout on this before it would fail back?
I'm also currently trying adding the DR DP to the main boundary group and tagging the connection as slow so the main DP would still be used first. Again both DP's are returned to the client when installing software and the client attempts to connect
to the main DP over and over without using the DR DP which is online.
Is this normal behaviour or do I have a configuration issue?
Appreciate your help.
Carl

Hi, just to update on this thread, I raised a support call with Microsoft and the end result is that SCCM2012 clients wont fall back to an alternative DP where a DP is offline. The fall back is only for when content isn't on a DP. The 8hr timeout doesn't
appear to be in affect anymore.
What I have managed to get to work and test out is removing our production DP (also primary site server) from the production content boundary group, then clients will fall back to the DR DP as this is the only other DP available with content.
I've managed to perform the update to remove the site server from the boundary group while the primary site server is offline, by using a PowerShell script to connect to the SCCM provider on the DR site server (DP/MP/SUP) to perform the update as our site
database is off-box. This works well and the changes replicate to the SQL replica in DR that the DR MP uses and when clients failover to the DR MP they then begin using the DR DP and packages can be installed etc.
How funny, I just fixed this at a Client this week.
This is default client behavior as MS CSS probably told you, the client thinks the Distribution Point is coming back online soon so it waits, for good reasons. For some reason I keep thinking "7 days" not 8 hours, but maybe wrong.
I have a work around for this, it just requires a change to the Distribution Points DNS record. Head to the DNS server, find the record for the Distribution Point that is down, change the IP address to a different member server IP address, this will cause
the client, when it flushes it's DNS cache, to get an updated DNS record for the Distribution Point and it will try to connect to the Distribution Point using the changed IP address, which then induces what the client thinks is a severe error which makes it
go to the next Distribution Point in the list it got from the Management Point. Once you've recovered the Distribution Point and it is back online, change it's IP address back in DNS or just let the Distribution Point update its own DNS record when it boots
up (if configured to do so) and viola, you are back in business.
Test, test and test again before ever putting something from "the web" into your production environment. I just implemented this at a client to solve their issues with their DR procedure.
Rob Marshall | UK | My Blog |
WMUG |
File CM12 Feedback |
CM12 Docs |
CM12 Release Notes

Failover without VIP/SCAN

DB Version:10gR2, 11Gr2
OS : Solaris 5.10
I understand the concepts of VIP. Lets say for a 2 node RAC, i haven't used the VIPs in the TNS entry for my Client.
If one node goes down , the failover will still happen but only after few minutes. Right? ie. the failover is done by the Listener in the surviving node when it notices Connection timeout error from the dead node.

VitaminD wrote:
DB Version:10gR2, 11Gr2
OS : Solaris 5.10
I understand the concepts of VIP. Lets say for a 2 node RAC, i haven't used the VIPs in the TNS entry for my Client.
If one node goes down , the failover will still happen but only after few minutes. Right? ie. the failover is done by the Listener in the surviving node when it notices Connection timeout error from the dead node.Yes. What the client use and do not use ito cluster IP addresses do not affect the failover of cluster services (such as the Virtual IP and SCAN listener) from the failed node to a surviving node.
Obviously, if the client only use a static public IP, that IP will fail with the failed node and the client will loose all connectivity, despite a successful failover on the cluster and database still being available.
So the client do need to support either the SCAN or VIPs in order to successfully failover - you cannot only use the static public IP and expect a successful client failover.

Client failover troubles

Similar Messages

Maybe you are looking for