Vlan HSRP states keep failing over

Hi
We have 2 core 6500 switches with a trunk between them. There there are access layer switch stack each stack has 1 link to each core.
Core 1--------------------------------trunk-----------------------------------Core2
| |
| |
**********************Switch Stacks****************************************
Lately I am having issues where hsrp states for different vlans keeps changing. Now this would be cause the active hsrp switch loses visibility of its neighbor on that vlan for a short period. HSRP timers are set to 1 sec hello, 3 sec dead. My spanning tree needs work done on so im believe it may be the issue as all values are currently defaults.
So my question, the trunk between the cores only has a few vlans running over it. Can I solve this issue by allowing all vlans to run over that trunk so to ensure the hsrp core do not lose visibiity of each other, is that a good networking practice
Thanks

So my question, the trunk between the cores only has a few vlans running over it. Can I solve this issue by allowing all vlans to run over that trunk so to ensure the hsrp core do not lose visibiity of each other, is that a good networking practice
So currently for a lot of the vlans you are not allowing them across the trunk link between the core switches. Is that correct ?
If so it is a common design (** see below) to have an etherchannel trunk between the core/distro switches and have all vlans running across it if you are running HSRP for those vlans on the core switches.
However please don't go and do that before you understand what the implications are.
Looking at your schematic if you had a vlan on the 6500s and the switch stack and you are running HSRP for that vlan but the vlan was not allowed across the trunk between the cores what that means is both uplinks from the switch stack are forwarding because there is no loop.
If you then allowed the vlan across the 6500 interconnect you then have a loop. So STP has to block one of the links. With this design it should block one of the uplinks from the switch stack providing your trunk between the 6500s has more bandwidth (which is why you use an etherchannel ).
Because you are running HSRP you would not lose throughput from the switch stack to the core switches because it should always use the uplink to the HSRP active core switch but return traffic from the core switches can at the moment use either link to the switch stack. If STP blocks one of those links then you may well lose throughput back to the switch stack depending on how the rest of your network looks.
** As I say this is a common design practice but it is gradually being replaced because you can now run MEC (Multichassis Etherchannel) to the core/distro switches providing the switches support it.
MEC would allow you to treat both uplinks from the switch stack to the cores as one logical link and so STP does not need to block any of the uplinks.
MEC is supported on 6500 switches running VSS (which yours aren't by the sounds of it). VSS would also mean you didn't need to individually configure HSRP on each switch ie. you only configure one of the 6500s with HSRP and the configuration is "active" on both.
So the general answer is yes, if you cannot run MEC, what you propose is a valid design but if you do it you could lose throughput and you would also need downtime because for every vlan you allowed across the trunk between the 6500s that wasn't already allowed there would be an STP recalculation.
It may not therefore be what you really want to do.
Edit - the above does not take into account the rest of your network topology ie. if you have multiple switch stacks with the same vlans on them you would be blocking on some of those links already due to the fact you would have loops with or without the vlans going across the trunk between the 6500s.
Jon

Similar Messages

Firewall keeps failing over when IPS fails

Is there a way to prevent the firewall from failing over if the IPS fails, I do not have it selected as a critera but I've been having some issues with the IPS module and the firewall keeps failing over.

Hello Matt,
There is an enhancement request for this:
http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCsm81086
But there isn't an ETA yet. You can save the bug to get updates.
Regards,
Felipe
Security Team.

Download of eprint keeps failing inviting retry.

My new printer is recognized and equipped with eprint, but the download keeps failing over and over.

What exactly are you trying to download?
To use ePrint, all you need to do is email the printers email address.
Here is a FAQ for ePrint with a lot of great information:
https://h30495.www3.hp.com/help#eprint
-------------How do I give Kudos? | How do I mark a post as Solved? --------------------------------------------------------

How Front End pool deals with fail over to keep user state?

   Hello to all, I searched a lot of articles to understand how Lync 2010 keeps user state if a fail happens in a Front Pool node, but didn't find anything clear.
     I found a MS info. about ths topic : " The Front End Servers maintain transient information—such as logged-on state and control information for an IM, Web, or audio/video (A/V) conference—only for the duration of a user’s session.
This configuration
is an advantage because in the event of a Front End Server failure, the clients connected to that server can quickly reconnect to another Front End Server that belongs to the same Front End pool. "
    As I read, the client uses DNS to reconnect to another Front End in the pool. When it reconnects to an available server, does he lose what he/she was doing at Lync client? Can the server that is now hosting his section recover all
"user's session data"? Is positive, how?
   Regards, EEOC.

The presence information and other dynamic user data is stored in the RTCDYN database on the backend SQL database in a 2010 pool:
http://blog.insidelync.com/2011/04/the-lync-server-databases/ If you fail over to another pool member, this pool member has access to the same data.
Ongoing conversations and the like are cached at the workstation.
Please remember, if you see a post that helped you please click "Vote As Helpful" and if it answered your question please click "Mark As Answer".
SWC Unified Communications

Failed over to a Aysnc Replica and now previous primary replica(Now Secondary) is in NOT SYNC state

Hello All,
Here is my situation :
3 Nodes in an AG configuration, and its a multi-site cluster. Sync commit between 2 nodes in one DC and Async commit to a node in the DR DC.
AG is failed over to the Async Replica which is the DR site and all the databases comes up fine and application also can connect using the listener.
When checked the state of secondary databases, its in NOT SYNC mode. Data is suspended automatically. I can resume data movement to fix the problem, but was curious why this will be in NOT SYNC mode?
Thanks in advance.
Thank you,
Anup
<div> Anup | Database Consultant Blog: <a href="www.sqlsailor.com/">www.sqlsailor.com</a> Twitter: <a href="https://twitter.com/#!/AnupWarrier"> Follow me !</a>
 Please use Mark as Answer if my post solved your problem and use Vote As Helpful if a post was useful. </div>

Hello Anup,
The reason this happens is because of the forced failover needed to be used when moving to an Async replica. It will cause all other replicas to become suspended due to the fact that it is never known if data loss will occur or not.
It might not make sense right now, but think about a situation where the databases are not synchronized and failover is forced (it has to work in all situations). There may be a good bit of data on the primary replica that has not yet made it (or partially)
to the async secondary. It wouldn't make sense to negotiate the primary back down (after all, it's the async one) and undo valid transactions. It also allows for a database snapshot or other method to be done on the old sync primary which could be used for
DR purposes to get those valid transactions and data out.
BOL Doc:
http://msdn.microsoft.com/en-us/library/hh213151.aspx#ForcedFailover
Sean Gallardy | Blog |
Twitter

Why sever-side state saving doesn't support fail over?

Hi all,
In my previous thread "ADF server-side state saving method" Frank said that it doesn't support fail over.
Re: ADF server-side state saving method
My customer is wondering the reason.
If anyone has a clear statement about it, could you share it?
Any help will be much appreciated.
Atsushi

Timo,
As I wrote in my previous thread, my customer adopted multi-windows application design because they didn't know it caused viewExpiredException frequently.
Now I'm looking for the best setting for avoiding the exception and need ADF guru's help.
Frank said that ADF is on Sun's RI. And it seems that the state-saving parameters of Mojarra are working correctly in my environment. However any ADF docs don't mention the behavior of server-side state saving clearly. When I set state-saving method "server", view states are managed per logical view (≒ window). And it seems better for multi-window application than using client-token based state management from the perspective of preventing viewExpiredException.
Because fail over is not their requirement, if we could make sure that server-side state saving doesn't have other side-effects they might adopt it.
So I'd like to know in more detail about the behavior.
Thanks,
Atsushi

Movie Download stuck at 2.1GB of 3.2GB for over 24 hours, keeps failing. How do i delete it and start over?

I have been trying to download an HD movie which is 3.2GB and it has gotten as far as 2.1GB several times over the last 24 hours but it keeps failing at 2.1GB stating: DOWNLOAD ERROR. TAP TO RETRY. Should I find a way to delete it and start over? Any suggestions?
Thanks

Step 12 Stopped the duplicate contact.
Deleted iCloud from my Mac. Hope this helps someone.

I have an iPhone 4 and the iOS 4.3.5 download keeps failing - error states its corrupted.

I have an iPhone 4 and the iOS 4.3.5 download keeps failing. It downloads then I get a message stating that the download was corrupted and couldn't be installed. I've tried it with bitdefender switched off. I've tried it with it in airplane mode. My partner has tried to update their iPhone on a different computer to mine and is also getting the same problem and error message. Help!

That's exactly right. And since you have a MacBook Air (as I do), that means the synching should occur just fine (if you've set it up right in iTunes). In other words, once you actually do the update, the iTunes on your MacBook Air will re-sync your photos back to your iPhone.
Basically, the process is that your iPhone gets backed up, wiped clean, then the new iOS version gets installed, then the backup gets restored. That will restore everything in the article above, which does not include your photos. But the sync process that happens after the iOS installation would restore your photos from iPhoto given you've set it up that way in iTunes.
When you have your iPhone plugged into iTunes, while looking at the iPhone's summary page, click "Photos" on the right side of iTunes to see how you have your sync set up.
Regardless, it sounds like you'll be fine! Your photos will wind up in iPhoto once you place them there. They will be safe there.

SQL 2005 cluster rejects SQL logins when in failed over state

When SQL 2005 SP4 on Windows 2003 server cluster is failed over from Server_A to Server_B, it rejects all SQL Server logins. domain logins are OK. The message is "user is not associated with a trusted server connection", then the IP of the
client. This is error 18452. Anyone know how to fix this? They should work fine from both servers. We think this started just after installing SP4.
DaveK

Hello,
The connection string is good, you're definitely using sql auth.
LoginMode on Server_B is REG_DWORD 0x00000001 (1) LoginMode on Server_A is REG_DWORD 0x00000002 (2) Looks like you are on to something. I will schedule another test failover. I assume a 2 is mixed mode? If so, why would SQL allow two different modes
on each side of a cluster?
You definitely have a registry replication issue, or at the very least a registry that isn't in sync with the cluster. This could happen for various reasons, none of which we'll probably find out about now, but never the less...
A good test would be to set it to windows only on Node A, wait a minute and then set it to Windows Auth and see if that replicates the registry setting across nodes correctly - this is actually the windows level and doesn't have anything to do with SQL Server.
SQL Server reads this value from the registry and it is not stored inside any databases (read, nothing stored in the master database) as such it's a per machine setting. Since it's not set correctly on Node B, when SQL server starts up it correctly reads
that registry key and acts on it as it should. The culprit isn't SQL Server, it's Windows Clustering.
Hopefully this makes a little more sense now. You can actually just edit the registry setting to match Node A and fail over to B, everything should work correctly. It doesn't help you with a root cause analysis which definitely needs to be done as who knows
what else may not be correctly in sync.
Sean Gallardy | Blog |
Twitter

SQL Server 2014 Always on HA takes 8-14 seconds to fail over. Application side timeouts occur

Hi All,
I have a very similar post in the SQL Server 2014 forums too (https://social.technet.microsoft.com/Forums/sqlserver/en-US/adb5e338-907e-4405-aa62-d3ea93c7a98a/sql-server-2014-always-on-ha-takes-814-seconds-to-fail-over-application-side-timeouts-occur?forum=sqldisasterrecovery) -
advice in the end was to post a question here.
SQL Server Nodes, 2014 (12.0.2480.0)
1 Share witness (on separate subnet)
1 Cluster
1 Listener
I have been testing the response time to failovers – both manual (right-click, fail over in SSMS) and Automatic (shut down the primary host). The way I am testing response is to have a SSMS query running on my desktop, connected to the listener querying
a small table and hit execute.
The Query response time, from execute to receiving the result, has been between 8 and 14 seconds based on my testing. My previous experience (in a separate environment) showed around 2 second fail over times in a very similar configuration.
Availability DB is 200Mb and is not actively used. The nodes are synchronised.
SQL Server Hosts: Windows 2012, 2 cpu, 8gb RAM.
Questions:
1: It’s a big question but what should I expect for a ‘normal’ fail over time. Keep in mind this scenario is about as simple as it gets.
2: As it stands an 8 to 14 second ‘outage’ could cause some applications to time out. Or am I being un-reasonable? I am seeing the very simple query in SSMS to time out with this:
Msg 983, Level 14, State 1, Line 2
Unable to access availability database 'DATABASE' because the database replica is not in the PRIMARY or SECONDARY role. Connections to
an availability database is permitted only when the database replica is in the PRIMARY or SECONDARY role. Try the operation again later.
Cluster logs are long - this section accounts for 8 seconds of the 11 second outage I experienced. I can supply the full log if required. Also this log is just the 2 cluster nodes, I removed the witness share to make sure it was as simple as possible.
00001090.00002128::2015/02/25-03:05:08.255 INFO [GEM] Node 2: Deleting [1:65 , 1:71] (both included) as it has been ack'd by every node
00001ee4.00002130::2015/02/25-03:05:10.107 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
00001090.00002128::2015/02/25-03:05:11.888 INFO [GEM] Node 2: Deleting [1:72 , 1:73] (both included) as it has been ack'd by every node
00001090.00002698::2015/02/25-03:05:11.889 INFO [GUM] Node 2: Processing RequestLock 2:49
00001090.00002128::2015/02/25-03:05:11.890 INFO [GUM] Node 2: Processing GrantLock to 2 (sent by 1 gumid: 67)
00001090.00002698::2015/02/25-03:05:11.890 INFO [GUM] Node 2: executing request locally, gumId:68, my action: /dm/update, # of updates: 1
00001090.00002128::2015/02/25-03:05:12.890 INFO [GEM] Node 2: Deleting [1:74 , 1:74] (both included) as it has been ack'd by every node
00001ee4.00002130::2015/02/25-03:05:15.107 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
00001090.00002128::2015/02/25-03:05:16.988 INFO [GUM] Node 2: Processing RequestLock 1:28
Thanks in advance.
Keegan

Hi Keegan,
From these event log , what I can see is "Sending request Netname" wasted the time .
Could you please tell us the network configuration of that cluster nodes ?
If I recall correctly , it is recommended to only remain Tcp/IP protocol and disable NetBIOS over TCP/IP for "Private Network" , also do not configure DNS/Wins default gateway for "Private Network" :
https://support.microsoft.com/kb/258750?wa=wsignin1.0
After that please test again .
Best Regards,
Elton JI
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected] .

Requirements on an EJB to be eligible for a fail-over

Hi all,
          I was reading the EJB developer guide for weblogic server 9.2. When talking about the fail-over feature the guide said
          "EJB failover requiers that bean methods must be idempotent and configured as such in weblogic-ejb-jar.xml"
          There are two points in this statement.
          1) Fail overs must be configured
          This is straight forward.
          2) The bean methods must be idempotent.
          I don't really understand this point. Does this suggest that the bean methods should conform to some guide lines? If so what are they?
          Probably these are clarified in some other document or other resources. Being impatient and a little lazy I would love to have this clarified in the forum.
          Thanks in advance,
          - Madhu

Daniel,
I think since this will be the ONLY system that will be running as a DC providing ADDS and the Direct access server, i should follow this advice from the article you sent:
For users who never connect directly to the Contoso intranet or through a VPN, they must use the DirectAccess
Offline Domain Join process to initially join the appropriate domain and configure DirectAccess. When this process
is complete, the users log on normally and have the same experience as if they were directly connected to the Contoso intranet.
Because remember, no user will ever connect directly to the subnet where the server is. so do an offline join First and then start managing.. Only thing im worried about is: they keep saying that the direct access function has significantly improved in windows
8. hmmmmm many systems will be using windows 7 Pro 64Bit. Some windows 8.1 Pro 64bit. should i worry?

Connection Fail-over

My database may be replicated to multiple servers, in the event that a connection fails to the primary server, my class should try the same statement on the next context in an ordered list.
Any advice on errors to watch for to know the connection is broken (versus an error resulting from the query itself)?
Any advice on executing a query for one connection, and then if it fails, executing the same statement on a different connection? I am hoping to keep the fail-over logic encapsulated in a simple to use class so that all the classes that call stored procedures don't have to contain the logic.
I primarily use stored procedures, but have an occasional text sql query too.
Many thanks in advance!
Darcy

the best way to test this would probably be to establish a connection to the database, and then stop the database and try to execute a query against it and see what SQLException gets thrown, then you can watch for that one.
also, if you are using database pooling (or even if you aren't) you can validate each connection object you create with something like "select getDate()" or something basic that should return at least one row. if you don't get a row back or if you get an exception you know something is wrong with the database and you can then move to the next one.

Vlan mapping lost when fail to secondary WLC

Hello
I have two WLCs,The primary WLC mode 5508 ,running code is 7.4.100.60, The secondary WLC mode 4402,running code is 7.0.230.0.
When ap working on 5508 wlc,it use flexconnect mode, when ap working on 4402, it will h-reap mode
ap mode:1242、1142.
question:
When ap fail to secondary WLC(4402),some ap will lost their vlan mapping information.not all of ap. during fail over, ap will doanloading firmware.
is there any way to solve? thanks!

I understand. Two controllers, two different code levels. 4400 is locked in at 7.0 code and you need 7.4 for the 2600 ap.
In your orginal post you state when aps fail over from one controller to the other you lose vlans and aps code upgrade/down grade. This is not a support deisgn. You cant properly failover betwen different code versions.
If you want them to stop failing over and clients dont roam from aps on controler to 1 to aps on controller 2, simple remove the controllers from the shared mobility group and put the controllers in their own group.
"Satisfaction does not come from knowing the solution, it comes from knowing why." - Rosalind Franklin
‎"I'm in a serious relationship with my Wi-Fi. You could say we have a connection."

Replicating router partitions for Fail/Over ?

Hi everyone,
Is it really worth the value to replicate a router partition on the same
node ?
In other words, has anyone ever seen a router partition failing on its
own (eg due
to an out of memory), ie not because of the node failure ?
What I am looking for is to avoid wasting system resources for unneeded
replicates.
Thanks.
Vincent Figari

You will use Active/Standby failover method to keep your fail-over configuration in secondary firewall (PIX).
Active/Standby Failover lets you use a standby security appliance to take over the functionality of a failed unit. When the active unit fails, it changes to the standby state while the standby unit changes to the active state. The unit that becomes active assumes the IP addresses (or, for a transparent firewall, the management IP address) and MAC addresses of the failed unit and begins to pass traffic. The unit that is now in standby state takes over the standby IP addresses and MAC addresses. Because network devices see no change in the MAC to IP address pairing, no ARP entries change or time out anywhere on the network. PIX Security Appliance with 7.x version and above supports failover.
For further information click this link.
http://www.cisco.com/en/US/products/hw/vpndevc/ps2030/products_configuration_example09186a00807dac5f.shtml#regu

ACE MIB Value for Fail over

Hi,
May I know the MIB value ACE (Application Contril Engine )will be generating while a fail over occurs. Is the value same for contetx fail over as well ??
Regards
Jithesh

Hi,
I think this is the error code your looking for;
727012
Error Message %ACE-2-727012: HA: FT Group group ID changed state to NewState. Reason:
reason str.
Table 2-2 NewState Values and Descriptions
NewState Value
Description
FSM_FT_STATE_INIT
The initial state. Visible only when the configuration for the FT group exists but it is not in service.
FSM_FT_STATE_ELECT
After you enter the inservice command when you are configuring an FT group, the ACE enters the ELECT state. The redundancy state machine negotiates with its peer context in the FT group to determine the redundancy role (active or standby)
FSM_FT_STATE_ACTIVE
The active member of the FT group.
FSM_FT_STATE_STANDBY_COLD
This state can be entered if:
•FT VLAN is down but the peer device is still alive.
•Configuration or application state synchronization failure have occurred.
FSM_FT_STATE_STANDBY_CONFIG
The standby context is waiting to receive configuration information. Upon entering this state, the active context will be notified to send a copy of the running configuration.
FSM_FT_STATE_STANDBY_BULK
The standby context is waiting to receive state information. Upon entering this state, the active context will be notified to send a copy of the current states information for all applications.
FSM_FT_STATE_STANDBY_HOT
The standby context is ready to become active in a failover situation.
Values returned for the reason str variable can be one of the following:
•FSM_FT_EV_PEER_DOWN
•FSM_FT_EV_PEER_FT_VLAN_DOWN
•FSM_FT_EV_PEER_SOFT_RESET
•FSM_FT_EV_STATE
•FSM_FT_EV_TIMEOUT
•FSM_FT_EV_CFG_SYNC_STATUS
•FSM_FT_EV_BULK_SYNC_STATUS
•FSM_FT_EV_COUP
•FSM_FT_EV_RELINQUISH
•FSM_FT_EV_TRACK_STATUS
•FSM_FT_EV_UPDATE
•FSM_FT_EV_ENABLE_INSERVICE
•FSM_FT_EV_DISABLE_INSERVICE
•FSM_FT_EV_SWITCHOVER
•FSM_FT_EV_PEER_COMPATIBLE
•FSM_FT_EV_MAINT_MODE_OFF
•FSM_FT_EV_MAINT_MODE_PARTIAL
•FSM_FT_EV_MAINT_MODE_FULL
Check from the above ID onwards for more details around ft status.
Cheers
Scott

Vlan HSRP states keep failing over

Similar Messages

Maybe you are looking for