Fail-over Recovery ?
Hi All,
can anyone help me out to know if "Fail-Over Recovery" concept is avaliable in Hyperion Essbase 11.1.1.3.
If possible, please explain me how it ca be done.
Regards
Rajesh Kumar wrote:
Hi
I am working on data base fail over recovery mechanism. I am working on weblogic6.1Sp1 server installed on a unix machine. We are using J2EE architecture in our application. We have used Entity beans for dase base transactions.
My main objective is to allow my applictaion to switch over to secondry data base in case of failure of primary data base.
I have already developed a prototype which is working fine for a client application's request.But i can't use it for entity beans with container managed persistance.
So what i want to ask you is as follows:
Is there a way to switch between data bases for container managed entity beans.If yes then how to implement it?
Thank you
RajeshEasy. Define a multipool to tap a pool to the regular database first, and in cases when that DBMS is down,
tap a second pool to the fallback DBMS. Define a TxDataSource for the multipool, and have the beans
use that DataSource.
Joe
Similar Messages
-
Physical standby database fail-over
Hi,
I am working on Oracle 10.2.0.3 on Solaris SPARC 64-bit.
I have a Dataguard configuration with a single Physical standby database that uses real time application. We had a major application upgrade yesterday and before the start of upgrade, we cancelled the media recovery and disabled the log_archive_dest_n so that it doesn't ship the archive logs to standby site. We left the dataguard configuration in this mode incase of a rollback.
Primary:
alter system set log_archive_dest_state_2='DEFER';
alter system switch logfile;
Standby:
alter database recover managed standby database cancel;Due to application upgrade induced problems we had to failover to the physical standby, which was not in sync with primary from yesterday. I used the following method to fail-over since i do not want to apply any redo from yesterday.
Standby:
alter database activate physical standby database;
alter database open;
shutdown immediate;
startupSo, after this step, the database was a stand alone database, which doesn't have any standby databases yet (but it still has log_archive_config parameter set and log_archive_dest_n parameters set but i have 'DEFER' the log_archive_dest_n pointing to the old primary). I have even changed the "archive log deletion policy to NONE"
RMAN> configure archivelog deletion policy to none;After the fail-over was completed, the log sequence started from Sequence 1. We cleared the FRA to make space for the new archive logs and started off a FULL database backup (backup incremental level 0 database plus archivelog delete input). The backup succeded but we got these alerts in the backup log that RMAN cannot delete the archivelogs.
RMAN-08137: WARNING: archive log not deleted as it is still neededMy question here is
1) Even though i have disabled the log_archive_dest_n parameters, why is RMAN not able to delete the archivelogs after backup when there is no standby database for this failed-over database?
2) Are all the old backups marked unusable after a fail-over is performed?
FYI... flashback database was not used in this case as it did not server our purpose.
Any information or documentation links would be greatly appreciated.
Thanks,
Harris.Thanks for the reply.
The FINISH FORCE works in some cases but if there is an archive gap (though it didn't report in our case), it might not work some times (DOCID: 846087.1). So, we followed the Switch-over & Fail-Over best practices where it mentioned about this "ACTIVE PHYSICAL STANDBY" for a fail-over if you intend not to apply any archivelogs. The process we followed is the Right one.
Anyhow, we got the issue resolved. Below is the resolution path.
1) Even though if you DEFER the LOG_ARCHIVE_DEST_STATE_N parameter's on the primary, there are some situations where the Primary database in a dataguard configuration where it will not delete the archive logs due to some SCN issues. This issue may or may not arise in all fail-over scenarios. If it does, then do the following checks
Follow DOCID: 803635.1, which talks about a PLSQL procedure to check for problematic SCN's in a dataguard configuration even though the physical standby databases are no available (i.e., if the dataguard parameters are set, log_archive_config, log_archive_dest_n='SERVICE=..." still set and even though corresponding LOG_ARCHIVE_DEST_STATE_N parameters are DEFERRED).
If this procedure returns any rows, then the primary database is not able to delete the archivelogs because it is still thinking there is a standby database and trying to save the archive logs because of the SCN conflict.
So, the best thing to do is, remove the DG related parameters from the spfile (log_archive_config, log_archive_dest_n parameters).
After i made these changes, i ran a test backup using "backup archivelog all delete input", the archive logs got deleted after backup without any issues.
Thanks,
Harris.
Edited by: user11971589 on Nov 18, 2010 2:55 PM -
Replication fail-over and reconfiguration
I would like to get a conversation going on the topic of Replication, I have
setup replication on several sites using the Netscape / iPlanet 4.x server
and all has worked fine so far. I now need to produce some documentation and
testing for replication fail-over for the master. I would like to hear from
anyone with some experience on promoting a consumer to a supplier. I'm
looking for the best practice on this issue. Here is what I am thinking,
please feel free to correct me or add input.
Disaster recovery plan:
1.) Select a consumer from the group of read-only replicas
2.) Change the database from Read-Only to Read-Write
3.) Delete the replication agreement (in my case I am using a SIR)
4.) Create a new agreement to reflect the supplier status of the chosen
replica (again a SIR for me)
5.) Reinitialize the consumers (Online or LDIF depending on your number of
entries)
That is the general plan so far. Other questions and topics might include:
1.) What to do when the original master comes back online
2.) DNS round-robin strategies (Hardware assistance, Dynamic DNS, etc)
3.) General backup and recovery procedures when: 1.) Directory is corrupted
2.) Link is down / network is partitioned 3.) Disk / server corruption /
destruction
Well I hope that is a good basis for getting a discussion going. Feel free
to email me if you have questions or I can help you with one of your issues.
Best regards,
Ray CormierThere is no failover in Meta-Directory 5.1, you can implement manual failover on the metaview by using multi-master replication with Directory Server. There are limitations and this is a manual process.
- Paul -
DAG Sporadic Entire Server DB Fail Over
Hi,
I have been having this issues for a while now, I have two physical exchange servers in a DAG, both on Exchange 2013 CU1. Randomly, every few days and various times, Server1 will fail all of it's databases over to Server2. I'll redistribute them, and again,
say Server2 will fail all databases to Server1. In short, both servers at times have failed their databases over.
I started with this: http://technet.microsoft.com/en-us/library/dd351258(v=exchg.150).aspx which led me to setup monitoring of the Microsoft-Exchange-ManagedAvailability logs. I can tell you that replication tests work fine, and the health of all the
databases are fine.
My monitoring turned up the following errors, in this example "EX0001" was the server that failed all of it's databases over to "EX0002". It seems pretty clear to me, that Exchange Managed Availability, is finding an issue with
EWS, attempting to restart the MSExchangeServicesApp pool and cannot due to "Throttling" so ti fails the DB's over, that's my best guess...the problem is I dont know how to fix this...I've run through troubleshooting EWS Healthset, nothing
really turns up... http://technet.microsoft.com/en-us/library/ms.exch.scom.ews.protocol(v=exchg.150).aspx
EX0001
1011
Microsoft-Exchange-ManagedAvailability
Recovery
Microsoft-Exchange-ManagedAvailability/RecoveryActionLogs
5/22/2014 7:06:43 AM
Warning (Info)
1520183
NT AUTHORITY\SYSTEM
RecycleApplicationPool-MSExchangeServicesAppPool-EWSSelfTestRestart: Throttling rejected the operation
EX0001
4
Microsoft-Exchange-ManagedAvailability
Monitoring
Microsoft-Exchange-ManagedAvailability/Monitoring
5/22/2014 7:17:27 AM
Error (Info)
8287
NT AUTHORITY\SYSTEM
The EWS.Protocol health set has detected a problem on EX0001 beginning at 5/22/2014 10:55:12 AM (UTC). The health manager is reporting that recycling the MSExchangeServicesAppPool
app pool has failed to restore health and it has tried to fail over active copies of local databases to a healthy server. Attempts to auto-recover from this condition have failed and requires Administrator attention. Details below: <b>MachineName:</b>
EX0001 <b>ServiceName:</b> EWS.Protocol <b>ResultName:</b> EWSSelfTestProbe/MSExchangeServicesAppPool <b>Error:</b> System.Exception: System.Exception: >>> PRIMARY ENDPOINT VERIFICATION EwsUrl=https://localhost:444/ews/exchange.asmx
UserName/Password=HealthMailbox663889950a344102878cede289222a46@domain.local/xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt AuthMethod=CAFE ConvertId (Attempt #0) Status=The
request failed. The operation has timed out ConvertId (Attempt #0) Latency=59521.1327 ConvertId (Attempt #1) Status=iteration 1; 55.427003 seconds elapsed ConvertId (Attempt #1) Latency=55427.003 at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.RetrySoapActionAndThrow(Action
operation, String soapAction, ExchangeServiceBase service) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.ExecuteEWSCall(String endPoint, String operation, Boolean verifyAffinity) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.DoWorkInternal(CancellationToken
cancellationToken) <b>Exception:</b> System.Exception: System.Exception: System.Exception: >>> PRIMARY ENDPOINT VERIFICATION EwsUrl=https://localhost:444/ews/exchange.asmx
UserName/Password=HealthMailbox663889950a344102878cede289222a46@domain.local/xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt AuthMethod=CAFE ConvertId (Attempt #0) Status=The
request failed. The operation has timed out ConvertId (Attempt #0) Latency=59521.1327 ConvertId (Attempt #1) Status=iteration 1; 55.427003 seconds elapsed ConvertId (Attempt #1) Latency=55427.003 at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.RetrySoapActionAndThrow(Action
operation, String soapAction, ExchangeServiceBase service) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.ExecuteEWSCall(String endPoint, String operation, Boolean verifyAffinity) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.DoWorkInternal(CancellationToken
cancellationToken) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.ThrowError(Object key, Object exceptiondata, String logDetails) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.DoWorkInternal(CancellationToken
cancellationToken) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.RunEWSGenericProbe(CancellationToken cancellationToken) at Microsoft.Exchange.WorkerTaskFramework.WorkItem.Execute(CancellationToken joinedToken) at Microsoft.Exchange.WorkerTaskFramework.WorkItem.<>c__DisplayClass2.<StartExecuting>b__0()
at System.Threading.Tasks.Task.Execute() <b>ExecutionContext:</b> EWSGenericProbeError:Exception=System.Exception: System.Exception: >>> PRIMARY ENDPOINT VERIFICATION EwsUrl=https://localhost:444/ews/exchange.asmx
UserName/Password=HealthMailbox663889950a344102878cede289222a46@domain.local/xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt AuthMethod=CAFE ConvertId (Attempt #0) Status=The
request failed. The operation has timed out ConvertId (Attempt #0) Latency=59521.1327 ConvertId (Attempt #1) Status=iteration 1; 55.427003 seconds elapsed ConvertId (Attempt #1) Latency=55427.003 at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSCommon.RetrySoapActionAndThrow(Action
operation, String soapAction, ExchangeServiceBase service) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon.ExecuteEWSCall(String endPoint, String operation, Boolean verifyAffinity) at Microsoft.Exchange.Monitoring.ActiveMonitoring.Ews.Probes.EWSGenericProbeCommon
<b>FailureContext:</b> <b>ResultType:</b> Failed <b>IsNotified:</b> False <b>DeploymentId:</b> 0 <b>RetryCount:</b> 0 <b>ExtensionXml:</b> <b>Version:</b> <b>StateAttribute1:</b>
EWS <b>StateAttribute2:</b> Unknown <b>StateAttribute3:</b> <b>StateAttribute4:</b> <b>StateAttribute5:</b> <b>StateAttribute6:</b> 0 <b>StateAttribute7:</b> 0 <b>StateAttribute8:</b>
0 <b>StateAttribute9:</b> 0 <b>StateAttribute10:</b> 0 <b>StateAttribute11:</b> <b>StateAttribute12:</b> <b>StateAttribute13:</b> <b>StateAttribute14:</b> <b>StateAttribute14:</b>
<b>StateAttribute16:</b> 0 <b>StateAttribute17:</b> 0 <b>StateAttribute18:</b> 0 <b>StateAttribute19:</b> 0 <b>StateAttribute20:</b> 120011 <b>StateAttribute21:</b> [000.000] EWSCommon
start: 5/22/2014 11:13:13 AM [000.000] Configuring EWScommon [000.000] Probe time limit: 120000ms, HTTP timeout: 59500ms, RetryCount: 1 [000.047] using authN: CAFE
[email protected] xGAVmP[^jn{qGgOx0Jtx:4X+-j@?d%XM?@7yErsoFF[_#u[%LcX=0hPzMln#1PiQ/7z?14rJJs8Dc)AYLi0F9mU)bMpL_gj{Q3*[Yt1:UgX=:CkQc=[Xuagz%Od=|@tt
[000.047] using HTTP request timeout: 59500 ms [000.047] action iteration 0 [000.047] starting (total time left 119954 ms) [059.568] action threw Microsoft.Exchange.WebServices.Data.ServiceRequestException: The request failed. The operation has timed out [064.584]
action iteration 1 [064.584] starting (total time left 55416 ms) [120.011] action wait timed out [120.011] action threw System.TimeoutException: iteration 1; 55.427003 seconds elapsed <b>StateAttribute22:</b> <b>StateAttribute23:</b>
<b>StateAttribute24:</b> <b>StateAttribute25:</b> <b>PoisonedCount:</b> 0 <b>ExecutionId:</b> 32395373 <b>ExecutionStartTime:</b> 5/22/2014 11:13:13 AM <b>ExecutionEndTime:</b> 5/22/2014
11:15:13 AM <b>ResultId:</b> 253233015 <b>SampleValue:</b> 0 ------------------------------------------------------------------------------- States of all monitors within the health set: Note: Data may be stale. To get current data,
run: Get-ServerHealth -Identity 'EX0001' -HealthSet 'EWS.Protocol' State Name TargetResource HealthSet AlertValue ServerComponent ----- ---- -------------- --------- ---------- --------------- NotApplicable EWSSelfTestMonitor MSExchangeServicesAppPool EWS.Protocol
Unhealthy None NotApplicable EWSDeepTestMonitor DG01DB15 EWS.Protocol Unhealthy None NotApplicable PrivateWorkingSetWarningThresholdExc... msexchangeservicesapppool EWS.Protocol Healthy None NotApplicable ProcessProcessorTimeErrorThresholdEx... msexchangeservicesapppool
EWS.Protocol Healthy None NotApplicable ExchangeCrashEventErrorThresholdExce... msexchangeservicesapppool EWS.Protocol Healthy None States of all health sets: Note: Data may be stale. To get current data, run: Get-HealthReport -Identity 'EX0001' State HealthSet
AlertValue LastTransitionTime MonitorCount ----- --------- ---------- ------------------ ------------ NotApplicable Autodiscover.Protocol Healthy 3/8/2014 12:46:17 AM 4 NotApplicable ActiveSync.Protocol Healthy 3/8/2014 1:15:35 AM 7 NotApplicable ActiveSync
Healthy 3/8/2014 2:08:15 AM 3 NotApplicable EDS Healthy 5/22/2014 5:19:41 AM 13 NotApplicable ECP Healthy 3/8/2014 1:15:27 AM 3 NotApplicable EventAssistants Healthy 5/22/2014 5:48:56 AM 28 NotApplicable EWS.Protocol Unhealthy 5/22/2014 7:07:12 AM 5 NotApplicable
FIPS Healthy 5/21/2014 10:24:01 PM 18 NotApplicable AD Healthy 2/23/2014 10:42:29 PM 10 NotApplicable OWA.Protocol.Dep Healthy 5/22/2014 5:19:40 AM 1 NotApplicable Monitoring Unhealthy 5/22/2014 5:35:31 AM 9 Online HubTransport Unhealthy 5/22/2014 5:19:43
AM 138 NotApplicable DataProtection Healthy 5/22/2014 7:08:02 AM 201 NotApplicable AntiSpam Healthy 5/22/2014 5:19:40 AM 4 NotApplicable Network Healthy 5/21/2014 10:36:54 PM 1 NotApplicable OWA.Protocol Healthy 3/8/2014 1:15:34 AM 5 NotApplicable MailboxMigration
Healthy 3/8/2014 12:46:18 AM 4 NotApplicable MRS Healthy 3/8/2014 12:44:35 AM 9 NotApplicable MailboxTransport Healthy 5/22/2014 5:19:41 AM 57 NotApplicable PublicFolders Healthy 5/21/2014 10:44:15 PM 4 NotApplicable RPS Healthy 2/23/2014 11:38:33 PM 1 NotApplicable
Outlook.Protocol Healthy 4/22/2014 11:04:18 AM 3 NotApplicable UserThrottling Healthy 5/22/2014 5:51:13 AM 7 NotApplicable SiteMailbox Healthy 3/8/2014 2:10:53 AM 3 NotApplicable UM.Protocol Healthy 5/22/2014 5:19:41 AM 17 NotApplicable Store Healthy 5/22/2014
5:19:43 AM 225 NotApplicable MSExchangeCertificateDeplo... Disabled 1/1/0001 12:00:00 AM 2 NotApplicable DAL Healthy 8/2/2013 12:59:03 AM 16 NotApplicable Search Healthy 5/22/2014 5:37:18 AM 269 Online EWS.Proxy Healthy 5/5/2014 1:34:08 AM 1 Online RPS.Proxy
Healthy 5/5/2014 1:34:38 AM 13 Online OAB.Proxy Healthy 5/5/2014 1:34:37 AM 1 Online ECP.Proxy Healthy 5/5/2014 1:34:17 AM 4 Online OWA.Proxy Healthy 5/5/2014 1:34:25 AM 2 Online Outlook.Proxy Healthy 5/5/2014 1:34:08 AM 1 Online Autodiscover.Proxy Healthy
5/5/2014 1:34:08 AM 1 Online ActiveSync.Proxy Healthy 5/5/2014 1:34:35 AM 1 Online RWS.Proxy Healthy 5/5/2014 1:34:18 AM 10 NotApplicable Autodiscover Healthy 5/21/2014 10:24:01 PM 2 Online FrontendTransport Healthy 5/15/2014 12:49:31 AM 11 NotApplicable EWS
Unhealthy 5/22/2014 7:06:01 AM 2 NotApplicable OWA Healthy 2/23/2014 11:37:56 PM 1 NotApplicable Outlook Healthy 3/8/2014 12:45:14 AM 5 Online UM.CallRouter Healthy 5/22/2014 5:19:41 AM 7 NotApplicable RemoteMonitoring Healthy 8/2/2013 12:58:03 AM 1 NotApplicable
POP.Protocol Healthy 5/20/2014 9:22:12 AM 5 NotApplicable IMAP.Protocol Healthy 5/20/2014 9:22:21 AM 5 Online POP.Proxy Healthy 3/7/2014 1:31:10 PM 1 Online IMAP.Proxy Healthy 3/7/2014 1:31:10 PM 1 NotApplicable IMAP Healthy 5/20/2014 9:23:32 AM 2 NotApplicable
POP Healthy 5/20/2014 9:17:18 AM 2 NotApplicable Antimalware Healthy 5/15/2014 8:33:13 AM 8 NotApplicable FfoQuarantine Healthy 8/2/2013 12:58:20 AM 1 Online Transport Healthy 5/22/2014 5:38:00 AM 9 NotApplicable Security Healthy 3/8/2014 12:46:09 AM 3 NotApplicable
Datamining Healthy 3/8/2014 12:45:44 AM 3 NotApplicable Provisioning Healthy 3/8/2014 12:45:40 AM 3 NotApplicable ProcessIsolation Healthy 3/8/2014 12:47:05 AM 12 NotApplicable TransportSync Healthy 3/8/2014 12:45:37 AM 3 NotApplicable MessageTracing Healthy
3/8/2014 12:44:56 AM 3 NotApplicable CentralAdmin Healthy 3/8/2014 12:45:12 AM 3 NotApplicable OAB Healthy 8/2/2013 1:02:27 AM 3 NotApplicable Calendaring Healthy 8/2/2013 1:02:07 AM 3 NotApplicable PushNotifications.Protocol Healthy 2/23/2014 10:46:17 PM
3 NotApplicable Ediscovery.Protocol Healthy 5/21/2014 10:38:16 PM 1 NotApplicable HDPhoto Healthy 5/6/2014 9:36:25 AM 1 NotApplicable Clustering Healthy 3/8/2014 12:45:34 AM 4 NotApplicable DiskController Healthy 4/22/2014 2:51:30 AM 1 NotApplicable MailboxSpace
Healthy 5/22/2014 6:16:51 AM 96 NotApplicable FreeBusy Healthy 5/22/2014 5:32:54 AM 1 Note: Subsequent detected alerts are suppressed until the health set is healthy again.Hi,
Based on the error message, throttling rejected the operation. I recommend you use the Get-ThrottlingPolicy | fl cmdlet to view EWS settings in throttling policy.
You can modify the default throttling policy and set the basic settings for EWS. Then restart the Microsoft Exchange Throttling service and recycle the MSExchangeServicesAppPool to check the result.
For more information about the EWS throttling, you can refer to the following articles.
EWS throttling in Exchange
http://msdn.microsoft.com/en-us/library/office/jj945066(v=exchg.150).aspx
EWS Best Practices: Understand Throttling Policies
http://blogs.msdn.com/b/mstehle/archive/2010/11/09/ews-best-practices-understand-throttling-policies.aspx
Best regards,
Belinda
Belinda Ma
TechNet Community Support -
Sun Identity Manger 8.0 and fail over..
We are setting up a fail/recovery site for our Sun Identity Manager solution, I had pictured a seem less fail over, but that looks near impossible to do with oracle database. I had pictured load balanced Appserver, with load balanced data bases, sort of a multi-master like LDAP allows..
Curious what others are using for a fail over site / setup.
ThanksWe're using 7.0. For us failover is basically mulitple servers all using the same DB repository, with a "smart" loadbalancer in front of them (smart meaning, able to detect which back end servers are responsive).
IdM doesn't use any inter-server temp-data synchronization, all the servers running off the same repository communicate by committing changes to the database.
So if a specific IdM instance dies, on the next page load the user will be redirected to a new server. That server will redirect to the login page and ask the user to re-auth, with the desired page placed after login.jsp as a "nextPage" argument. After (re-)logging in, the user's returned to the page they were trying to get to. However, in-progress edits that had not been committed back to the database will be lost.
We looked at high availability arrangements where valid sessions are shared across a new server, but fundamentally the limitation is that the app servers still don't sync in-progress edits, so the only difference between an HA environment and a more passive fail-over environment (like ours) is that in an HA environment the user doesn't have to re-login on a server failure; they still lose in-progress edits. So HA didn't seem like it added value to us.
If you are literally talking about an off-site, completely standby, seamless failover site, I agree I don't see how you would do that. I'd expect that you'd need the offsite setup to be a cold-standby site; configured to use the replicated database, but with the apps powered down until you actually need them. Otherwise, I think you'd have problems with the standby site servers not wanting to "standby". You could ensure no users end up on the standby servers, but background processes are likely to be run across both the primary and the standby services; I don't think you can enforce an "idle but running" status for the standby servers.
Edited by: etech on Feb 4, 2009 7:37 PM -
Morning,
Im just going through testing VSS on 2*4500x, when i pull the power from the Standby or Active unit the host's see a 6 ping drop out.
Am i expecting to much by having 0 loss or have i missed some fail over configuration?
Many thanks for any help.
I can post config if needed.
Ssh run
Building configuration...
Current configuration : 16559 bytes
! Last configuration change at 06:10:42 UTC Fri Apr 11 2014
version 15.1
no service pad
service timestamps debug datetime
service timestamps log datetime
service password-encryption
service compress-config
service sequence-numbers
hostname VSScore
boot-start-marker
boot system flash bootflash:cat4500e-universalk9.SPA.03.04.02.SG.151-2.SG2.bin
boot-end-marker
vrf definition mgmtVrf
address-family ipv4
exit-address-family
address-family ipv6
exit-address-family
enable secret
username
aaa new-model
aaa authentication login CONSOLE local
aaa session-id common
clock summer-time UTC recurring last Sun Mar 2:00 last Sun Oct 2:00
switch virtual domain 10
switch mode virtual
switch 1 priority 200
mac-address use-virtual
dual-active detection pagp trust channel-group 201
dual-active recovery ip address 192.168.22.1 255.255.255.192
udld enable
ip vrf Liin-vrf
no ip domain-lookup
ip domain-name
power redundancy-mode redundant
mac access-list extended VSL-BPDU
permit any 0180.c200.0000 0000.0000.0003
mac access-list extended VSL-CDP
permit any host 0100.0ccc.cccc
mac access-list extended VSL-DOT1x
permit any any 0x888E
mac access-list extended VSL-GARP
permit any host 0180.c200.0020
mac access-list extended VSL-LLDP
permit any host 0180.c200.000e
mac access-list extended VSL-MGMT
permit any host 00ff.a3e1.f864
permit any host 00ff.f271.3e20
mac access-list extended VSL-SSTP
permit any host 0100.0ccc.cccd
port-channel load-balance src-dst-mac
spanning-tree mode rapid-pvst
no spanning-tree optimize bpdu transmission
spanning-tree extend system-id
spanning-tree vlan 2-999 priority 8192
redundancy
mode sso
main-cpu
auto-sync startup-config
auto-sync standard
vlan internal allocation policy ascending
ip ssh source-interface Vlan500
ip ssh version 2
class-map match-any VSL-MGMT-PACKETS
match access-group name VSL-MGMT
class-map match-any VSL-DATA-PACKETS
match any
class-map match-any VSL-L2-CONTROL-PACKETS
match access-group name VSL-DOT1x
match access-group name VSL-BPDU
match access-group name VSL-CDP
match access-group name VSL-LLDP
match access-group name VSL-SSTP
match access-group name VSL-GARP
class-map match-any VSL-L3-CONTROL-PACKETS
match access-group name VSL-IPV4-ROUTING
match access-group name VSL-BFD
match access-group name VSL-DHCP-CLIENT-TO-SERVER
match access-group name VSL-DHCP-SERVER-TO-CLIENT
match access-group name VSL-DHCP-SERVER-TO-SERVER
match access-group name VSL-IPV6-ROUTING
class-map match-any VSL-MULTIMEDIA-TRAFFIC
match dscp af41
match dscp af42
match dscp af43
match dscp af31
match dscp af32
match dscp af33
match dscp af21
match dscp af22
match dscp af23
class-map match-any VSL-VOICE-VIDEO-TRAFFIC
match dscp ef
match dscp cs4
match dscp cs5
class-map match-any VSL-SIGNALING-NETWORK-MGMT
match dscp cs2
match dscp cs3
match dscp cs6
match dscp cs7
policy-map VSL-Queuing-Policy
class VSL-MGMT-PACKETS
bandwidth percent 5
class VSL-L2-CONTROL-PACKETS
bandwidth percent 5
class VSL-L3-CONTROL-PACKETS
bandwidth percent 5
class VSL-VOICE-VIDEO-TRAFFIC
bandwidth percent 30
class VSL-SIGNALING-NETWORK-MGMT
bandwidth percent 10
class VSL-MULTIMEDIA-TRAFFIC
bandwidth percent 20
class VSL-DATA-PACKETS
bandwidth percent 20
class class-default
bandwidth percent 5
interface Port-channel1
description VSL Link from Switch 1
switchport
switchport mode trunk
switchport nonegotiate
switch virtual link 1
interface Port-channel2
switchport
switchport mode trunk
switchport nonegotiate
switch virtual link 2
interface Port-channel200
description Link:-Security
switchport
interface Port-channel201
description Link:01
switchport
switchport mode trunk
interface Port-channel202
description Link:02
switchport
interface Port-channel203
description Link:03
switchport
interface Port-channel204
description Link:04
switchport
interface Port-channel205
no ip address
interface Port-channel206
no ip address
interface Port-channel207
no ip address
interface Port-channel208
no ip address
interface Port-channel209
no ip address
interface Port-channel210
description Link:10
switchport
switchport mode trunk
interface Port-channel211
description Link:11
switchport
switchport mode trunk
interface Port-channel212
description Link:12
switchport
switchport mode trunk
interface Port-channel213
description Link:13
switchport
switchport mode trunk
interface FastEthernet1
vrf forwarding mgmtVrf
no ip address
speed auto
duplex auto
interface TenGigabitEthernet1/1/1
switchport mode trunk
switchport nonegotiate
no lldp transmit
no lldp receive
no cdp enable
channel-group 1 mode on
service-policy output VSL-Queuing-Policy
interface TenGigabitEthernet1/1/2
switchport mode trunk
switchport nonegotiate
no lldp transmit
no lldp receive
no cdp enable
channel-group 1 mode on
service-policy output VSL-Queuing-Policy
interface TenGigabitEthernet1/1/3
description WAN:
no switchport
no ip address
ip ospf message-digest-key 199 md5 xxxxxxx
ip ospf network point-to-point
ip ospf 1 area 0
interface TenGigabitEthernet1/1/4
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 201 mode desirable
interface TenGigabitEthernet1/1/5
description Link: 02
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 202 mode desirable
interface TenGigabitEthernet1/1/6
description Link: 03
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 203 mode desirable
interface TenGigabitEthernet1/1/7
description Link: 04
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 204 mode desirable
interface TenGigabitEthernet1/1/8
description Link: 10
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 210 mode desirable
interface TenGigabitEthernet1/1/9
description Link: 11
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 211 mode desirable
interface TenGigabitEthernet1/1/10
description Link: 12
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 212 mode desirable
interface TenGigabitEthernet1/1/11
description Link: 13
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 213 mode desirable
interface TenGigabitEthernet1/1/12
description Link: Security
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 200 mode desirable
interface TenGigabitEthernet1/1/13
interface TenGigabitEthernet1/1/14
interface TenGigabitEthernet1/1/15
interface TenGigabitEthernet1/1/16
interface TenGigabitEthernet1/2/1
switchport mode trunk
switchport nonegotiate
no lldp transmit
no lldp receive
no cdp enable
channel-group 1 mode on
service-policy output VSL-Queuing-Policy
interface TenGigabitEthernet1/2/2
switchport mode trunk
switchport nonegotiate
no lldp transmit
no lldp receive
no cdp enable
channel-group 1 mode on
service-policy output VSL-Queuing-Policy
interface TenGigabitEthernet1/2/3
description WAN:
no switchport
no ip address
ip ospf message-digest-key 199 md5
ip ospf network point-to-point
ip ospf 1 area 0
interface TenGigabitEthernet1/2/4
interface TenGigabitEthernet1/2/5
interface TenGigabitEthernet1/2/6
interface TenGigabitEthernet1/2/7
interface TenGigabitEthernet1/2/8
interface TenGigabitEthernet2/1/1
switchport mode trunk
switchport nonegotiate
no lldp transmit
no lldp receive
no cdp enable
channel-group 2 mode on
service-policy output VSL-Queuing-Policy
interface TenGigabitEthernet2/1/2
switchport mode trunk
switchport nonegotiate
no lldp transmit
no lldp receive
no cdp enable
channel-group 2 mode on
service-policy output VSL-Queuing-Policy
interface TenGigabitEthernet2/1/3
description WAN:
no switchport
no ip address
ip ospf message-digest-key 199 md5 7 08191B783E375242431A040D327C
ip ospf network point-to-point
ip ospf 1 area 0
interface TenGigabitEthernet2/1/4
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 201 mode desirable
interface TenGigabitEthernet2/1/5
description Link: 02
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 202 mode desirable
interface TenGigabitEthernet2/1/6
description Link: 03
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 203 mode desirable
interface TenGigabitEthernet2/1/7
description Link: 04
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 204 mode desirable
interface TenGigabitEthernet2/1/8
description Link: 10
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 210 mode desirable
interface TenGigabitEthernet2/1/9
description Link: 11
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 211 mode desirable
interface TenGigabitEthernet2/1/10
description Link: 12
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 212 mode desirable
interface TenGigabitEthernet2/1/11
description Link: 13
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 213 mode desirable
interface TenGigabitEthernet2/1/12
description Link: Secu
switchport mode trunk
logging event link-status
logging event trunk-status
channel-group 200 mode desirable
interface TenGigabitEthernet2/1/13
interface TenGigabitEthernet2/1/14
interface TenGigabitEthernet2/1/15
interface TenGigabitEthernet2/1/16
interface TenGigabitEthernet2/2/1
switchport mode trunk
switchport nonegotiate
no lldp transmit
no lldp receive
no cdp enable
channel-group 2 mode on
service-policy output VSL-Queuing-Policy
interface TenGigabitEthernet2/2/2
switchport mode trunk
switchport nonegotiate
no lldp transmit
no lldp receive
no cdp enable
channel-group 2 mode on
service-policy output VSL-Queuing-Policy
interface TenGigabitEthernet2/2/3
description WAN:
no switchport
no ip address
ip ospf message-digest-key 199 md5
ip ospf network point-to-point
ip ospf 1 area 0
interface TenGigabitEthernet2/2/4
interface TenGigabitEthernet2/2/5
interface TenGigabitEthernet2/2/6
interface TenGigabitEthernet2/2/7
interface TenGigabitEthernet2/2/8
router ospf 1
router-id 192.168.22.3
area 0 authentication message-digest
passive-interface default
network 10.10.0.0 0.0.255.255 area 0
network 192.168.0.0 0.0.255.255 area 0
no ip http server
no ip http secure-server
ip route 0.0.0.0 0.0.0.0 10.10.183.3
ip access-list extended VSL-BFD
permit udp any any eq 3784
ip access-list extended VSL-DHCP-CLIENT-TO-SERVER
permit udp any eq bootpc any eq bootps
ip access-list extended VSL-DHCP-SERVER-TO-CLIENT
permit udp any eq bootps any eq bootpc
ip access-list extended VSL-DHCP-SERVER-TO-SERVER
permit udp any eq bootps any eq bootps
ip access-list extended VSL-IPV4-ROUTING
permit ip any 224.0.0.0 0.0.0.255
kron occurrence DAILYat1 at 1:00 recurring
policy-list SaveConfig
kron policy-list SaveConfig
cli wr mem
logging trap debugging
logging source-interface Vlan500
snmp-server community
snmp-server community
ipv6 access-list VSL-IPV6-ROUTING
permit ipv6 any FF02::/124
module provision switch 1
chassis-type 70 base-mac B838.6121.2F90
slot 1 slot-type 401 base-mac B838.6121.2F90
slot 2 slot-type 400 base-mac 4C4E.358C.E548
module provision switch 2
chassis-type 70 base-mac B838.6121.2D50
slot 1 slot-type 401 base-mac B838.6121.2D50
slot 2 slot-type 400 base-mac 4C4E.358C.E580
end
VSScore# -
Audio Applications in Unity Fail-over
Hi all,
I am going to install Cisco Unity with fail-over and what I remember, I should to rebuild the applications like Auto Attendant in secondary server. because this is not part of the replication.
Am I right? or no need to rebuild the applications?Hi JFV,
That is no longer the case
How Standby Redundancy Works in Cisco Unity 8.x
Cisco Unity standby redundancy uses failover functionality to provide duplicate Cisco Unity servers for disaster recovery. The primary server is located at the primary facility, and the secondary server is located at the disaster-recover facility.
Standby redundancy functions in the following manner:
•Data is replicated to the secondary server, with the exceptions noted in the "Data That Is Not Replicated in Cisco Unity 8.x" section.
•Automatic failover is disabled.
•In the event of a loss of the primary server, the secondary server is manually activated.
Data That Is Not Replicated in Cisco Unity 8.x
Changes to the following Cisco Unity settings are not replicated between the primary and secondary servers. You must manually change values on both servers.
•Registry settings
•Recording settings
•Phone language settings
•GUI language settings
•Port settings
•Integration settings
•Conversation scripts
•Key mapping scripts (can be modified through the Custom Key Map tool)
•Media Master server name settings
•Exchange message store, when installed on the secondary server
http://www.cisco.com/en/US/docs/voice_ip_comm/unity/8x/failover/guide/8xcufg040.html#wp1099338
Cheers!
Rob -
SQL Server 2014 Always on HA takes 8-14 seconds to fail over. Application side timeouts occur
Hi All,
I have a very similar post in the SQL Server 2014 forums too (https://social.technet.microsoft.com/Forums/sqlserver/en-US/adb5e338-907e-4405-aa62-d3ea93c7a98a/sql-server-2014-always-on-ha-takes-814-seconds-to-fail-over-application-side-timeouts-occur?forum=sqldisasterrecovery) -
advice in the end was to post a question here.
SQL Server Nodes, 2014 (12.0.2480.0)
1 Share witness (on separate subnet)
1 Cluster
1 Listener
I have been testing the response time to failovers – both manual (right-click, fail over in SSMS) and Automatic (shut down the primary host). The way I am testing response is to have a SSMS query running on my desktop, connected to the listener querying
a small table and hit execute.
The Query response time, from execute to receiving the result, has been between 8 and 14 seconds based on my testing. My previous experience (in a separate environment) showed around 2 second fail over times in a very similar configuration.
Availability DB is 200Mb and is not actively used. The nodes are synchronised.
SQL Server Hosts: Windows 2012, 2 cpu, 8gb RAM.
Questions:
1: It’s a big question but what should I expect for a ‘normal’ fail over time. Keep in mind this scenario is about as simple as it gets.
2: As it stands an 8 to 14 second ‘outage’ could cause some applications to time out. Or am I being un-reasonable? I am seeing the very simple query in SSMS to time out with this:
Msg 983, Level 14, State 1, Line 2
Unable to access availability database 'DATABASE' because the database replica is not in the PRIMARY or SECONDARY role. Connections to
an availability database is permitted only when the database replica is in the PRIMARY or SECONDARY role. Try the operation again later.
Cluster logs are long - this section accounts for 8 seconds of the 11 second outage I experienced. I can supply the full log if required. Also this log is just the 2 cluster nodes, I removed the witness share to make sure it was as simple as possible.
00001090.00002128::2015/02/25-03:05:08.255 INFO [GEM] Node 2: Deleting [1:65 , 1:71] (both included) as it has been ack'd by every node
00001ee4.00002130::2015/02/25-03:05:10.107 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
00001090.00002128::2015/02/25-03:05:11.888 INFO [GEM] Node 2: Deleting [1:72 , 1:73] (both included) as it has been ack'd by every node
00001090.00002698::2015/02/25-03:05:11.889 INFO [GUM] Node 2: Processing RequestLock 2:49
00001090.00002128::2015/02/25-03:05:11.890 INFO [GUM] Node 2: Processing GrantLock to 2 (sent by 1 gumid: 67)
00001090.00002698::2015/02/25-03:05:11.890 INFO [GUM] Node 2: executing request locally, gumId:68, my action: /dm/update, # of updates: 1
00001090.00002128::2015/02/25-03:05:12.890 INFO [GEM] Node 2: Deleting [1:74 , 1:74] (both included) as it has been ack'd by every node
00001ee4.00002130::2015/02/25-03:05:15.107 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
00001090.00002128::2015/02/25-03:05:16.988 INFO [GUM] Node 2: Processing RequestLock 1:28
Thanks in advance.
KeeganHi Keegan,
From these event log , what I can see is "Sending request Netname" wasted the time .
Could you please tell us the network configuration of that cluster nodes ?
If I recall correctly , it is recommended to only remain Tcp/IP protocol and disable NetBIOS over TCP/IP for "Private Network" , also do not configure DNS/Wins default gateway for "Private Network" :
https://support.microsoft.com/kb/258750?wa=wsignin1.0
After that please test again .
Best Regards,
Elton JI
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected] . -
Hi all,
1. I setup a pool with three Front End servers (FQDN of pool is pool.site1.sip96x2.com and it's pointed to IP address of three Front End servers). Everything works fine. But When I disable network interface on FE1 and FE2, the Lync clients are disconnected.
I haven't understood clearly how the Lync clients failed over in a pool? Please clarify to me.
2. I have two central site (Root site and Primary site, they have different domain sip96x2.com and site1.sip96x2.com). The simple URL dialin is pointed to Front End server at Root site. So if the link between Root site and Primary site is down, how can the
users at Primary site connect to dialin URL?
3. In building topology for Front End pool, I checked Override FQDN internal web service and the FQDN is "poolint.site1.sip96x2.com". I created three A records "poolint.site1.sip96x2.com" and pointed to three IP addresses of Front End
servers. Is it right?
Thanks so much!Ah ok, well first thing if I am reading this correctly, pool pairing Standard with Enterprise is not supported. You should only pair Standard with Standard and Enterprise with Enterprise (even though topology builder won't stop you) Take a look here for
support scenarios http://technet.microsoft.com/en-us/library/jj204697.aspx
To deal with the simple URLs in the event of failover you need to add them using Powershell. Take a look at this article which explains and gives an example: http://blogs.perficient.com/microsoft/2012/01/configuring-simple-urls-for-multiple-lync-pools/
If this helped you please click "Vote As Helpful" if it answered your question please click "Mark As Answer"
Georg Thomas | Lync MVP
Blog www.lynced.com.au | Twitter
@georgathomas
Lync Edge Port Check (Beta) -
How Front End pool deals with fail over to keep user state?
Hello to all, I searched a lot of articles to understand how Lync 2010 keeps user state if a fail happens in a Front Pool node, but didn't find anything clear.
I found a MS info. about ths topic : " The Front End Servers maintain transient information—such as logged-on state and control information for an IM, Web, or audio/video (A/V) conference—only for the duration of a user’s session.
This configuration
is an advantage because in the event of a Front End Server failure, the clients connected to that server can quickly reconnect to another Front End Server that belongs to the same Front End pool. "
As I read, the client uses DNS to reconnect to another Front End in the pool. When it reconnects to an available server, does he lose what he/she was doing at Lync client? Can the server that is now hosting his section recover all
"user's session data"? Is positive, how?
Regards, EEOC.The presence information and other dynamic user data is stored in the RTCDYN database on the backend SQL database in a 2010 pool:
http://blog.insidelync.com/2011/04/the-lync-server-databases/ If you fail over to another pool member, this pool member has access to the same data.
Ongoing conversations and the like are cached at the workstation.
Please remember, if you see a post that helped you please click "Vote As Helpful" and if it answered your question please click "Mark As Answer".
SWC Unified Communications -
Is it possible to add hyper-V fail over clustering afterwards?
Hi,
We are testing Windows 2012R2 Hyper-V using only one stand alone host without fail over clustering now with few virtual machines. Is it possible to add fail over clustering afterwards and add second Hyper-V node and shared disk and move virtual
machines there or do we have to install both nodes from scratch?
~ Jukka ~Hi Jukka,
Inaddition, before you build hyper-v failover cluster please refer to these requirements within the article below :
http://technet.microsoft.com/en-us/library/jj863389.aspx
Best Regards
Elton Ji
We
are trying to better understand customer views on social support experience, so your participation in this
interview project would be greatly appreciated if you have time.
Thanks for helping make community forums a great place. -
OCR and voting disks on ASM, problems in case of fail-over instances
Hi everybody
in case at your site you :
- have an 11.2 fail-over cluster using Grid Infrastructure (CRS, OCR, voting disks),
where you have yourself created additional CRS resources to handle single-node db instances,
their listener, their disks and so on (which are started only on one node at a time,
can fail from that node and restart to another);
- have put OCR and voting disks into an ASM diskgroup (as strongly suggested by Oracle);
then you might have problems (as we had) because you might:
- reach max number of diskgroups handled by an ASM instance (63 only, above which you get ORA-15068);
- experiment delays (especially in case of multipath), find fake CRS resources, etc.
whenever you dismount disks from one node and mount to another;
So (if both conditions are true) you might be interested in this story,
then please keep reading on for the boring details.
One step backward (I'll try to keep it simple).
Oracle Grid Infrastructure is mainly used by RAC db instances,
which means that any db you create usually has one instance started on each node,
and all instances access read / write the same disks from each node.
So, ASM instance on each node will mount diskgroups in Shared Mode,
because the same diskgroups are mounted also by other ASM instances on the other nodes.
ASM instances have a spfile parameter CLUSTER_DATABASE=true (and this parameter implies
that every diskgroup is mounted in Shared Mode, among other things).
In this context, it is quite obvious that Oracle strongly recommends to put OCR and voting disks
inside ASM: this (usually called CRS_DATA) will become diskgroup number 1
and ASM instances will mount it before CRS starts.
Then, additional diskgroup will be added by users, for DATA, REDO, FRA etc of each RAC db,
and will be mounted later when a RAC db instance starts on the specific node.
In case of fail-over cluster, where instances are not RAC type and there is
only one instance running (on one of the nodes) at any time for each db, it is different.
All diskgroups of db instances don't need to be mounted in Shared Mode,
because they are used by one instance only at a time
(on the contrary, they should be mounted in Exclusive Mode).
Yet, if you follow Oracle advice and put OCR and voting inside ASM, then:
- at installation OUI will start ASM instance on each node with CLUSTER_DATABASE=true;
- the first diskgroup, which contains OCR and votings, will be mounted Shared Mode;
- all other diskgroups, used by each db instance, will be mounted Shared Mode, too,
even if you'll take care that they'll be mounted by one ASM instance at a time.
At our site, for our three-nodes cluster, this fact has two consequences.
One conseguence is that we hit ORA-15068 limit (max 63 diskgroups) earlier than expected:
- none ot the instances on this cluster are Production (only Test, Dev, etc);
- we planned to have usually 10 instances on each node, each of them with 3 diskgroups (DATA, REDO, FRA),
so 30 diskgroups each node, for a total of 90 diskgroups (30 instances) on the cluster;
- in case one node failed, surviving two should get resources of the failing node,
in the worst case: one node with 60 diskgroups (20 instances), the other one with 30 diskgroups (10 instances)
- in case two nodes failed, the only node survived should not be able to mount additional diskgroups
(because of limit of max 63 diskgroup mounted by an ASM instance), so all other would remain unmounted
and their db instances stopped (they are not Production instances);
But it didn't worked, since ASM has parameter CLUSTER_DATABASE=true, so you cannot mount 90 diskgroups,
you can mount 62 globally (once a diskgroup is mounted on one node, it is given a number between 2 and 63,
and other diskgroups mounted on other nodes cannot reuse that number).
So as a matter of fact we can mount only 21 diskgroups (about 7 instances) on each node.
The second conseguence is that, every time our CRS handmade scripts dismount diskgroups
from one node and mount it to another, there are delays in the range of seconds (especially with multipath).
Also we found inside CRS log that, whenever we mounted diskgroups (on one node only), then
behind the scenes were created on the fly additional fake resources
of type ora*.dg, maybe to accomodate the fact that on other nodes those diskgroups were left unmounted
(once again, instances are single-node here, and not RAC type).
That's all.
Did anyone go into similar problems?
We opened a SR to Oracle asking about what options do we have here, and we are disappointed by their answer.
Regards
OscarHi Klaas-Jan
- best practises require that also online redolog files are in a separate diskgroup, in case of ASM logical corruption (we are a little bit paranoid): in case DATA dg gets corrupted, you can restore Full backup plus Archived RedoLog plus Online Redolog (otherwise you will stop at the latest Archived).
So we have 3 diskgroups for each db instance: DATA, REDO, FRA.
- in case of fail-over cluster (active-passive), Oracle provide some templates of CRS scripts (in $CRS_HOME/crs/crs/public) that you edit and change at your will, also you might create additionale scripts in case of additional resources you might need (Oracle Agents, backups agent, file systems, monitoring tools, etc)
About our problem, the only solution is to move OCR and voting disks from ASM and change pfile af all ASM instance (parameter CLUSTER_DATABASE from true to false ).
Oracle aswers were a litlle bit odd:
- first they told us to use Grid Standalone (without CRS, OCR, voting at all), but we told them that we needed a Fail-over solution
- then they told us to use RAC Single Node, which actually has some better features, in csae of planned fail-over it might be able to migreate
client sessions without causing a reconnect (for SELECTs only, not in case of a running transaction), but we already have a few fail-over cluster, we cannot change them all
So we plan to move OCR and voting disks into block devices (we think that the other solution, which needs a Shared File System, will take longer).
Thanks Marko for pointing us to OCFS2 pros / cons.
We asked Oracle a confirmation that it supported, they said yes but it is discouraged (and also, doesn't work with OUI nor ASMCA).
Anyway that's the simplest approach, this is a non-Prod cluster, we'll start here and if everthing is fine, after a while we'll do it also on Prod ones.
- Note 605828.1, paragraph 5, Configuring non-raw multipath devices for Oracle Clusterware 11g (11.1.0, 11.2.0) on RHEL5/OL5
- Note 428681.1: OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE)
-"Grid Infrastructure Install on Linux", paragraph 3.1.6, Table 3-2
Oscar -
Hi,
New to 2012 and implementing a clustered environment for our File Services role. Have got to a point where I have successfully configured the Shadow copy settings.
Have a large (15tb) disk. S:
Have a VSS drive (volume shadow copy drive) V:
Have successfully configured through Windows Explorer the Shadow copy settings.
Created dependencies in Failcover Cluster Server console whereby S: depends on V:
However, when I failover the resource and browse the Client Access Point share there are no entries under the "Previous Versions" tab.
When I visit the S: drive in windows explorer and open the Shadow copy dialogue box, there are entries showing the times and dates of the shadow copies ran when on the original node. So the disk knows about the shadow copies that were ran on the
original node but the "previous versions" tab has no entries to display.
This is in a 2012 server (NOT R2 version).
Can anyone explain what might be the reason? Do I have an "issue" or is this by design?
All help apprecieated!
Kathy
Kathleen Hayhurst Senior IT Support AnalystHi,
Please first check the requirements in following article:
Using Shadow Copies of Shared Folders in a server cluster
http://technet.microsoft.com/en-us/library/cc779378(v=ws.10).aspx
Cluster-managed shadow copies can only be created in a single quorum device cluster on a disk with a Physical Disk resource. In a single node cluster or majority node set cluster without a shared cluster disk, shadow copies can only be created and managed
locally.
You cannot enable Shadow Copies of Shared Folders for the quorum resource, although you can enable Shadow Copies of Shared Folders for a File Share resource.
The recurring scheduled task that generates volume shadow copies must run on the same node that currently owns the storage volume.
The cluster resource that manages the scheduled task must be able to fail over with the Physical Disk resource that manages the storage volume.
If you have any feedback on our support, please send to [email protected] -
Which role do I need DFS or File server on fail over cluster server 2012 R2?
what I want to achieve is that I want to share all my user data files in a central location and to be highly available all the time whether it's a general share or folder redirection data. BUT I'm a bit confused; I have fail over cluster set-up
on server 2012, now I would like to add DFS as a role but than we have another role called File server and virtually it does the same thing as DFS? Means it creates a namespace share that can be access even one of the nodes goes down. Now I am thinking is
that DFS does the replication between two physical location but fail over cluster works slightly differently and with file server it pretty much does the same thing except for replicating data from one drive to another. Now what do you suggest I do or
did I get the concept wrong like a noob?DFS and Failover Clustering for file shares provides a similar end result for file access, but they are significantly different implementations.
Clustering provides high availability to files by presenting shared access to set a files served from a cluster. With 2012 R2 Microsoft added the ability to create a Scale-out File Server that even allows all nodes of the cluster to server access to
the files for a higher level of performance and other great things. Bottom line with Failover Clusters for files is that there is a single copy of the file presented from the cluster.
DFS on the other hand provides high availability to files by presenting multiple copies of the file by making a copy in two or more locations and presenting a naming space that allows access to the file through any of the network paths. DFS works very
well for files that are primarily read-only. When you get into a situation where there is a lot of updating of the shared files, DFS is not a very good solution. There are ways to implement DFS for read/write files, but it generally requires a
good knowledge of how the files are used and how you want to manage them.
The key to answering your question comes in your first sentence "I want to share all my user data files in a central location and to be highly available all the time". My initial reaction to this is that central location means Failover Cluster
- there is only a single copy of the file. However, "all the time" can be compromised by network failures to the central site. Remote sites would not have access if they can't access the central site. DFS provides the ability to
have copies remotely, but then if you allow updating at multiple sites, you have to manage the merging of the changes, among other things.
. : | : . : | : . tim -
On this system:
OS: Solaris 10 11/06 s10s_u3wos_10 SPARC
Cluster version: 3.1u4
A- Normally after how much time resource is moved to the other node if ipmp fails (e.g. gateway is unreacheable) ?
B- What happens if ipmp fails in both server ? packages are kept on their nodes ?
C- Does it exist timeout over 10 minutes in cluster configuration ?u have 2 options - u could increase the back end time out to a very large value so that server can wait rather than timing out rather than failing over or to do some thing like
<Object name=�default�>
NameTrans fn=map from=/ name=reverse-proxy-/
</Object>
<Object name=�reverse-proxy-/�>
Route fn=set-origin-server server=server1
ObjectType fn=http-client-config timeout=600
</Object>
see - http://docs.sun.com/app/docs/doc/820-4841/gdhrg?a=view
( or simply disable any fail over but have different individual servers distributing load across different application)
split your uri or application so that each application goes to 1 back end server. for example, let us say - u have 2 java applications that u would like jboss to do the job for you, u could do some thing like
now, u could edit your obj.conf or (<vs>-obj.conf) depending on your configuration so that it looks like this
<Object name=�default�>
NameTrans fn=map from=/ name=reverse-proxy-/
</Object>
<Object name=�reverse-proxy-/�>
<If $uri =~ /foo1>
Route fn=set-origin-server server=<ŝerver1>
</If>
<If $uri =~ /foo2>
Route fn=set-origin-server server=<ŝerver2>
</If>
</Object>
btw - i will file a RFE on your behalf for this feature.
Maybe you are looking for
-
MSI K7N2 Delta-ILSR Power problem?!
So I worked all last night trying to fix the damn thing. It wants to keep powering off on its own. So I unplugged all the devices from the board and powersupply except the only needed items, the board, power switch, and memory (cpu fan of course left
-
How do I find my lost preferences! (after hard drive swap-out
I replaced my bay 2/3 mirrored drives with larger drives, then copied over the data from one of the old drives from bay 4. My problem is I now have just a generic desktop, all my: dock and web preferences are missing, my copy of fetch wants a serial
-
Hi all, I am currently attempting to use the API Quick Start to load a form guide. I am able to load the guide itself without any issue. However, I cannot populate it with data from the PDF it's attached to, the form guide appears empty. My understan
-
GL trial balance and All inventory value reports
Hi, in oracle applications 11i the inventory super user claims that the results of the following two reports: GL trial balance and All inventory value report for the period MAY-09 DO NOT match. How can i investigate this issue?
-
Error msg: E_ADEPT_IO ActivationServiceInfo Error%20#2032
Due to a problem I ran into during the first installation of Adobe Digital Edition, I have been trying to uninstall and then re-install it again and keep running into the following error, Error Activating: E_ADEPT_IO ActivationServiceInfo Erro