Fabric Interconnect Fail over behavior expectations..

Dear Team,
We have 2 FI A and primary and B as subordinate in HA ready mode
Some configuration setting made in ESXi (2 VNic 1 in vlan 7 and other in Vlan 8,Vnic1-in FI A in vlan 7 connected and working vnic2-FI B in vlan8 is disconnected)and is working fine
Now we truned off A and B started the election process and we tried to make it as primary by "lead"  "force" then it said its doing that
process by itself please wait and it got emerged as "primary"
Now on turning off FI A connectivity to this ESXi was lost as expected as Vnic2-FI B in vlan 8 is not connected,ok
So we connected the vnic2 which is in FI B in ESXi,since the network flow was allowed for only Vlan7, we had to make the vlan for vnic2 in FI B as vlan7 and all fine ,and connectivity came back up.( This Change was made in Service profile Network tab and selection the vnic properties and changing vlan after connecting to FI B)
Now we turned on the FI A and it took over from FI B as primary, and we found that the connectivity was again lost, When we examined we found that
the changes (change of vlan from 8 to 7) to vnic2 made in FI B was reverted to its initial state that was in FI A before it was turned off ie the vlan got changed to Vlan 8,we had to change that to vlan7 manually in FI A for the network connectivity to come up
So the Question is why the changes made to vlan for vnic2 in FI B was reverted to the config that was there in  FI A before switching off?
Now its case of a vNIC so any other major config or Service profile is there a chance of that getting reverted?
Is there any thing that we have missed out in setting to achive the latest config (in FI B) only when the FI A takes over as primary from FI B?
Thanks and Regards
Jose

Release notes in 2.0(5a) showed 
CSCua91672
The fcoe_mgr hap reset will no longer cause FI reboot.
http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/release/notes/OL_25363.html
I am really never on the Fiber Interconnects to execute any CLI commands.  But a memory leak for some other reason would certainly be a possibility. The fact that its own High Availability process causes double reset in its own High Availability architecture is troubling. 

Similar Messages

  • Download firmware to UCS PE Fabric Interconnect fails

    I think because the Local Storage Information shows the bootflash partition, opt partition and workspace partition all have a size of 0MB.
    How can I increase the size of the Local Storage in the UCS Platform Emulator?
    Thanks

    Hello,
    the upgrade through CLI is too complicated compared with the same process done through the GUI.  If you have access through CLI, can you tell me what is the version of firmware running on the FI?
    Also, what is the specific error you receive when you try to connect through the GUI? Could provide an screenshot of it?
    Rate ALL helpful answers
    -Kenny

  • Error: Failed to parse path in URl (Cisco UCS 6100 Series Fabric Interconnect)

    I have to shut down parts of my network due to an electrical outage this weekend. I have a Cisco UCS 6100 Server Fabric Interconnect. I am trying to backup the configuration. This is the command I am using create backup tftp://*.24.1.88 full-state enabled.  The error I receive is Error: failed to parse URI.  Which I believe URI means URL.

    Hi Parakiteiz,
    Looks like the format is incorrect and you have an extra 't' on the tftp command you provided.  Please take a look at the link below for configuration assistance.
    Creating backup operaiton.
    http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/sw/cli/config/guide/1-1-1/b_CLI_Config_Guide_1_1_1/CLI_Config_Guide_1_1_1_chapter38.html#task_4704260519853629974
    Please let me know if this help

  • Failed:Error disabled - SFP vendor not supported on Fabric Interconnect 6248UP

    We are connecting a FC-uplink on a Fabric Interconnect 6248UP to a MDS9124. On the Fabric Interconnect side we use a 8Gbps SFP (DSpSFP-FC8G-SW) on the MDS9124 side there is a 4 Gbps SFP.This should work when speed is at fixed speed 4 Gbps on both sides (I was told). When the SFP's and cabling is connected I get an error in the UCS manager on the FC-uplink port we are using:
    Failed:Error disabled - SFP vendor not supported.
    I cannot change the speed on the FC-Uplink ( I can only set user-label). The Fabric Interconnect is configured for FC on the last 8 ports and that is where the SFP for the storage is located (port 31).

    I have a similar problem.  I'm using port 1/41 and gets this output
    show system firmware expand | head lines 10
    UCSM:
       Running-Vers: 2.1(3a)
       Package-Vers: 2.1(3a)A
       Activate-Status: Ready
    Catalog:
       Running-Vers: 2.1(3a)T
       Package-Vers: 2.1(3a)A
       Activate-Status: Ready
    sh interface fc 1/41
    fc1/41 is down (Error disabled - SFP vendor not supported)
       Hardware is Fibre Channel, SFP is Unknown(0)
       Port WWN is 20:29:00:2a:6a:7e:7b:00
       Admin port mode is F, trunk mode is off
       snmp link state traps are enabled
       Port vsan is 1
       Receive data field Size is 2112
       Beacon is turned off
       1 minute input rate 0 bits/sec, 0 bytes/sec, 0 frames/sec
       1 minute output rate 0 bits/sec, 0 bytes/sec, 0 frames/sec
         0 frames input, 0 bytes
           0 discards, 0 errors
           0 CRC, 0 unknown class
           0 too long, 0 too short
         0 frames output, 0 bytes
           0 discards, 0 errors
         0 input OLS, 0 LRR, 0 NOS, 0 loop inits
         0 output OLS, 0 LRR, 0 NOS, 0 loop inits
       last clearing of "show interface" counters never
    show int fc 1/41 transceiver details
    fc1/41 sfp is present but not supported
       name is CISCO-FINISAR  
       part number is FTLX8571D3BCL-C2
       revision is A  
       serial number is FNS17401NYG    
       FC Transmitter type is Unknown(0)
       FC Transmitter supports Unknown(0) link length
       Transmission medium is Unknown(0)
       Supported speeds are - Min speed: -1 Mb/s, Max speed: -1 Mb/s
       Nominal bit rate is 10300 MBits/sec
       Link length supported for 50/125mm fiber is 80 m(s)
       Link length supported for 62.5/125mm fiber is 20 m(s)
       No tx fault, no rx loss, no sync exists, diagnostic monitoring type is 0x68
       SFP Diagnostics Information:
                                         Alarms                 Warnings
    Is this a firmware problem?

  • Fabric Interconnect B, management services are unresponsive

    Hi,
    We have configured Call Home option in UCSM and we are getting below error from Call Home option since last Saturday. We have open TAC with Cisco to troubleshoot this error but as per TAC "The error is a transient error from which the fabric interconnects can automatically recover."
    Below is the error messages we are getting
    E-mail-1:
    Subject:
    System Notification from System-A - diagnostic:GOLD-major - 2011-12-27 17:54:09 GMT-00:00 Fabric Interconnect B, management services are unresponsive
    Body Message:
    System Name:System-A
    Time of Event:2011-12-27 17:54:09 GMT-00:00
    Event Description:Fabric Interconnect B, management services are unresponsive
    Severity Level:6
    E-mail-2:
    Subject:
    System Notification from System-A - diagnostic:GOLD-major - 2011-12-27 17:54:09 GMT-00:00 Fabric Interconnect B, management services are unresponsive
    Body Message:
    <?xml version="1.0" encoding="UTF-8" ?>
    <soap-env:Envelope xmlns:soap-env="http://www.w3.org/2003/05/soap-envelope">
    <soap-env:Header>
    <aml-session:Session xmlns:aml-session="http://www.cisco.com/2004/01/aml-session" soap-env:mustUnderstand="true" soap-env:role="http://www.w3.org/2003/05/soap-envelope/role/next">
    <aml-session:To>http://tools.cisco.com/neddce/services/DDCEService</aml-session:To>
    <aml-session:Path>
    <aml-session:Via>http://www.cisco.com/appliance/uri</aml-session:Via>
    </aml-session:Path>
    <aml-session:From>http://www.cisco.com/appliance/uri</aml-session:From>
    <aml-session:MessageId>1058:SSI1442BFRC:4EFA0641</aml-session:MessageId>
    </aml-session:Session>
    </soap-env:Header>
    <soap-env:Body>
    <aml-block:Block xmlns:aml-block="http://www.cisco.com/2004/01/aml-block">
    <aml-block:Header>
    <aml-block:Type>http://www.cisco.com/2005/05/callhome/diagnostic</aml-block:Type>
    <aml-block:CreationDate>2011-12-27 17:54:09 GMT-00:00</aml-block:CreationDate>
    <aml-block:Builder>
    <aml-block:Name>UCS 6100 Series Fabric Interconnect</aml-block:Name>
    <aml-block:Version>4.2(1)N1(1.43q)</aml-block:Version>
    </aml-block:Builder>
    <aml-block:BlockGroup>
    <aml-block:GroupId>1059:Serial Number:4EFA0641</aml-block:GroupId>
    <aml-block:Number>0</aml-block:Number>
    <aml-block:IsLast>true</aml-block:IsLast>
    <aml-block:IsPrimary>true</aml-block:IsPrimary>
    <aml-block:WaitForPrimary>false</aml-block:WaitForPrimary>
    </aml-block:BlockGroup>
    <aml-block:Severity>6</aml-block:Severity>
    </aml-block:Header>
    <aml-block:Content>
    <ch:CallHome xmlns:ch="http://www.cisco.com/2005/05/callhome" version="1.0">
    <ch:EventTime>2011-12-27 17:54:09 GMT-00:00</ch:EventTime>
    <ch:MessageDescription>Fabric Interconnect B, management services are unresponsive</ch:MessageDescription>
    <ch:Event>
    <ch:Type>diagnostic</ch:Type>
    <ch:SubType>GOLD-major</ch:SubType>
    <ch:Brand>Cisco</ch:Brand>
    <ch:Series>UCS 6100 Series Fabric Interconnect</ch:Series>
    </ch:Event>
    <ch:CustomerData>
    <ch:UserData>
    <ch:Email>[email protected]</ch:Email>
    </ch:UserData>
    <ch:ContractData>
    <ch:CustomerId>[email protected]</ch:CustomerId>
    <ch:ContractId>ContractID</ch:ContractId>
    <ch:DeviceId>N10-S6100@C@SSI1442BFRC</ch:DeviceId>
    </ch:ContractData>
    <ch:SystemInfo>
    <ch:Name>System-A</ch:Name>
    <ch:Contact>Name</ch:Contact>
    <ch:ContactEmail>[email protected]</ch:ContactEmail>
    <ch:ContactPhoneNumber>+00-0000000000</ch:ContactPhoneNumber>
    <ch:StreetAddress>Office Address</ch:StreetAddress>
    </ch:SystemInfo>
    </ch:CustomerData>
    <ch:Device>
    <rme:Chassis xmlns:rme="http://www.cisco.com/rme/4.0">
    <rme:Model>N10-S6100</rme:Model>
    <rme:HardwareVersion>0.0</rme:HardwareVersion>
    <rme:SerialNumber>SerialNumber</rme:SerialNumber>
    </rme:Chassis>
    </ch:Device>
    </ch:CallHome>
    </aml-block:Content>
    <aml-block:Attachments>
    <aml-block:Attachment type="inline">
    <aml-block:Name>sam_content_file</aml-block:Name>
    <aml-block:Data encoding="plain">
    <![CDATA[
    <faultInst
    ack="no"
    cause="management-services-unresponsive"
    changeSet=""
    code="F0452"
    created="2011-12-27T23:24:09.681"
    descr="Fabric Interconnect B, management services are unresponsive"
    dn="sys/mgmt-entity-B/fault-F0452"
    highestSeverity="critical"
    id="2036245"
    lastTransition="2011-12-27T23:24:09.681"
    lc=""
    occur="1"
    origSeverity="critical"
    prevSeverity="critical"
    rule="mgmt-entity-management-services-unresponsive"
    severity="critical"
    status="created"
    tags=""
    type="management"/>]]>
    </aml-block:Data>
    </aml-block:Attachment>
    </aml-block:Attachments>
    </aml-block:Block>
    </soap-env:Body>
    </soap-env:Envelope>
    We want to understand that what is the impact of this error and is there anything that we can do to prevent this error? Also want to know what might be the cause get this error?
    Let me know if anything else is needed from my side
    show-tech file uploaded.

    Padma,
    TAC Engineer sent below mail
    Hi Amit,
    I’ve checked through the show tech you’ve uploaded and have not found any indicators of errors for the error message you are seeing.
    As I mentioned in the call, the error is a transient error from which the fabric interconnects can automatically recover from. The recommended action is to wait for a few (10-15min) to see if the error clears automatically. If the error does not clear then we will need to do further troubleshooting. This error on its own is not a cause for worry. As you have HA in your system the management services would have failed over the to the other fabric interconnect and would not affect your system performance.
    We can leave the system under observation for a few days to see if other errors occur concurrently with this error.
    I will upload show-tech logs here, find my reply below
    Is the alert generated only for FI B or both FIs ->> Amit: Alert generated for FI-B only
    Any change in cluster state corresponding to alert time stamp ->> Amit: Unfortunately when this error generating we are unable to see the cluster state because of timing. If you can guide / suggest from any other location I can find the state that will be helpful
    Cluster physical link status ->> Amit: Cluster link is OK
    Does FI have any core dumps ->> Amit: I don't have any idea about this. How can check this ?
    Regards,
    Amit Vyas

  • Http cluster servlet not failing over when no answer received from server

              I am using weblogic 510 sp9. I have a weblogic server proxying all requests to
              a weblogic cluster using the httpclusterservlet.
              When I kill the weblogic process servicing my request, I see the next request
              get failed over to the secondary server and all my session information has been
              replicated. In short I see the behavior I expect.
              r.troon
              However, when I either disconnect the primary server from the network or just
              switch this server off, I just get a message back
              to the browser - "unable to connect to servers".
              I don't really understand why the behaviour should be different . I would expect
              both to failover in the same manner. Does the cluster servlet only handle tcp
              reset failures?
              Has anybody else experience this or have any ideas.
              Thanks
              

    I think I might have found the answer......
    The AD objects for the clusters had been moved from the Computers OU into a newly created OU. I'm suspecting that the cluster node computer objects didn't have perms to the cluster object within that OU and that was causing the issue. I know I've seen cluster
    object issues before when moving to a new OU.
    All has started working again for the moment so I now just need to investigate what permissions I need on the new OU so that I can move the cluster object in.

  • Fabric Interconnect 6248 & 5548 Connectivity on 4G SFP with FC

    Hi,
    Recently I came across a scenario when I connected a 4G SFP on Expansion Module of 6248 Fabric Interconnect at one end and at other end 4G SFP on 5548UP. I was unable to establish FC connectivity between both of the devices and the momemt I connected 4G SFP on Fixed Module of 6248 connectivity got established between both the devices
    I would like to know do I have to do any changes on FI's Expansion module to get the connectivity working or this kind of behivor is expected behavior
    Do let me know if you need any other information on this
    Regards,
    Amit Vyas

    Yes, On FI-B port 15-16 should be in VSAN 101 instead of 100, I have made that correction
    Q. are you migrating the fc ports from the fixed to the expansion module ?
         A: As off now I am not migrating FC port but in near future I have to migrate FC ports to Expansion module and I don't want to waste my time for troubleshooting at that time.
    Is my understanding correct, that you have 2 links from each FI to a 5548, no port fc port-channel ?
         A: Yes, your understanding is correct we have 2 links from each FI to 5548 and no FC port-channel is configured
    I will do the FC port-channel later on once I am able to fix the connectivity issue
    I will try to put 4G SFP on expansion module and will provide you output of "show interface brife"
    Following is the out of "show interface brife" from both 5548UP switches
    Primary5548_SW# show interface brief
    Interface  Vsan   Admin  Admin   Status          SFP    Oper  Oper   Port
                      Mode   Trunk                          Mode  Speed  Channel
                             Mode                                 (Gbps)
    fc1/29     100    auto   on      up               swl    F       4    --
    fc1/30     100    auto   on      up               swl    F       4    --
    fc1/31     100    auto   on      up               swl    F       4    --
    fc1/32     100    auto   on      up               swl    F       4    --
    Ethernet      VLAN    Type Mode   Status  Reason                   Speed     Port
    Interface                                                                    Ch #
    Eth1/1        1       eth  access down    Link not connected          10G(D) --
    Eth1/2        1       eth  access down    Link not connected          10G(D) --
    Eth1/3        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/4        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/5        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/6        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/7        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/8        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/9        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/10       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/11       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/12       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/13       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/14       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/15       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/16       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/17       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/18       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/19       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/20       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/21       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/22       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/23       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/24       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/25       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/26       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/27       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/28       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/1        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/2        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/3        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/4        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/5        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/6        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/7        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/8        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/9        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/10       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/11       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/12       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/13       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/14       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/15       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/16       1       eth  access down    SFP not inserted            10G(D) --
    Port   VRF          Status IP Address                              Speed    MTU
    mgmt0  --           up     172.20.10.82                            1000     1500
    Interface  Vsan   Admin  Admin   Status      Bind                 Oper    Oper
                      Mode   Trunk               Info                 Mode    Speed
                             Mode                                            (Gbps)
    vfc1       100    F     on     errDisabled Ethernet1/1              --
    Primary5548_SW#
    Secondary5548_SW# show interface brief
    Interface  Vsan   Admin  Admin   Status          SFP    Oper  Oper   Port
                      Mode   Trunk                          Mode  Speed  Channel
                             Mode                                 (Gbps)
    fc1/29     101    auto   on      up               swl    F       4    --
    fc1/30     101    auto   on      up               swl    F       4    --
    fc1/31     101    auto   on      up               swl    F       4    --
    fc1/32     101    auto   on      up               swl    F       4    --
    Ethernet      VLAN    Type Mode   Status  Reason                   Speed     Port
    Interface                                                                    Ch #
    Eth1/1        1       eth  access down    Link not connected          10G(D) --
    Eth1/2        1       eth  access down    Link not connected          10G(D) --
    Eth1/3        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/4        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/5        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/6        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/7        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/8        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/9        1       eth  access down    SFP not inserted            10G(D) --
    Eth1/10       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/11       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/12       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/13       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/14       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/15       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/16       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/17       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/18       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/19       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/20       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/21       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/22       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/23       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/24       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/25       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/26       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/27       1       eth  access down    SFP not inserted            10G(D) --
    Eth1/28       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/1        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/2        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/3        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/4        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/5        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/6        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/7        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/8        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/9        1       eth  access down    SFP not inserted            10G(D) --
    Eth2/10       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/11       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/12       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/13       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/14       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/15       1       eth  access down    SFP not inserted            10G(D) --
    Eth2/16       1       eth  access down    SFP not inserted            10G(D) --
    Port   VRF          Status IP Address                              Speed    MTU
    mgmt0  --           up     172.20.10.84                            1000     1500
    Interface  Vsan   Admin  Admin   Status      Bind                 Oper    Oper
                      Mode   Trunk               Info                 Mode    Speed
                             Mode                                            (Gbps)
    vfc1       1      F     on     errDisabled Ethernet1/1              --
    Secondary5548_SW#

  • CUMP 8.5; Should Director Fail-Over Cause Audio Conf Be Discon?

    We have a two region CUMP 8.5.5.14 global deployment with WebEx Type II deployment, which we are now piloting. We have a primary Director in one region and a backup Director in another. During fail-over testing in which we establish an audio/web meeting (using the primary Director to setup the call), and we force a failover of the primary Director (by disconnecting network connection(s). After approximately 5 minutes, the audio conference is disconnected. Callers can call back in, but I’m trying to understand whether this Director failover behavior is expected, where the audio conference would be disconnected. Otherwise , failover to backup Director does appear to be operating OK.
    Thanks.

    Hi Michael,
    That is expected behavior.
    http://www.cisco.com/en/US/docs/voice_ip_comm/meetingplace/8_5/english/administration/topology_management.html#wpxref82514
    Configuring Meeting Director Nodes
    Your first node is automatically configured as the Primary Meeting  Director node. The second node you add to your system is automatically  configured as the Secondary Meeting Director node. If your Primary  Meeting Director node fails, your Secondary Meeting Director node  becomes the active Meeting Director. Any conferences currently running  on the system are interrupted but attendants can dial back in  immediately.
    -Dejan

  • SQL Server 2014 Always on HA takes 8-14 seconds to fail over. Application side timeouts occur

    Hi All,
    I have a very similar post in the SQL Server 2014 forums too (https://social.technet.microsoft.com/Forums/sqlserver/en-US/adb5e338-907e-4405-aa62-d3ea93c7a98a/sql-server-2014-always-on-ha-takes-814-seconds-to-fail-over-application-side-timeouts-occur?forum=sqldisasterrecovery) -
    advice in the end was to post a question here.
    SQL Server Nodes, 2014 (12.0.2480.0)
    1 Share witness (on separate subnet)
    1 Cluster
    1 Listener
    I have been testing the response time to failovers – both manual (right-click, fail over in SSMS) and Automatic (shut down the primary host). The way I am testing response is to have a SSMS query running on my desktop, connected to the listener querying
    a small table and hit execute.
    The Query response time, from execute to receiving the result, has been between 8 and 14 seconds based on my testing. My previous experience (in a separate environment) showed around 2 second fail over times in a very similar configuration.
    Availability DB is 200Mb and is not actively used. The nodes are synchronised.
    SQL Server Hosts: Windows 2012, 2 cpu, 8gb RAM.
    Questions:
    1: It’s a big question but what should I expect for a ‘normal’ fail over time. Keep in mind this scenario is about as simple as it gets.
    2: As it stands an 8 to 14 second ‘outage’ could cause some applications to time out. Or am I being un-reasonable? I am seeing the very simple query in SSMS to time out with this:
    Msg 983, Level 14, State 1, Line 2
    Unable to access availability database 'DATABASE' because the database replica is not in the PRIMARY or SECONDARY role. Connections to
    an availability database is permitted only when the database replica is in the PRIMARY or SECONDARY role. Try the operation again later.
    Cluster logs are long - this section accounts for 8 seconds of the 11 second outage I experienced. I can supply the full log if required. Also this log is just the 2 cluster nodes, I removed the witness share to make sure it was as simple as possible.
    00001090.00002128::2015/02/25-03:05:08.255 INFO  [GEM] Node 2: Deleting [1:65 , 1:71] (both included) as it has been ack'd by every node
    00001ee4.00002130::2015/02/25-03:05:10.107 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
    00001090.00002128::2015/02/25-03:05:11.888 INFO  [GEM] Node 2: Deleting [1:72 , 1:73] (both included) as it has been ack'd by every node
    00001090.00002698::2015/02/25-03:05:11.889 INFO  [GUM] Node 2: Processing RequestLock 2:49
    00001090.00002128::2015/02/25-03:05:11.890 INFO  [GUM] Node 2: Processing GrantLock to 2 (sent by 1 gumid: 67)
    00001090.00002698::2015/02/25-03:05:11.890 INFO  [GUM] Node 2: executing request locally, gumId:68, my action: /dm/update, # of updates: 1
    00001090.00002128::2015/02/25-03:05:12.890 INFO  [GEM] Node 2: Deleting [1:74 , 1:74] (both included) as it has been ack'd by every node
    00001ee4.00002130::2015/02/25-03:05:15.107 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:5b81e7bd-58fe-4be9-a68a-c48ba2aa552b:Netbios
    00001090.00002128::2015/02/25-03:05:16.988 INFO  [GUM] Node 2: Processing RequestLock 1:28
    Thanks in advance.
    Keegan

    Hi Keegan,
    From these event log , what I can see is "Sending request Netname" wasted the time .
    Could you please tell us the network configuration of that cluster nodes ?
    If I recall correctly , it is recommended to only remain Tcp/IP protocol and disable NetBIOS over TCP/IP for "Private Network" , also do not configure DNS/Wins default gateway for "Private Network" :
    https://support.microsoft.com/kb/258750?wa=wsignin1.0
    After that please test again .
    Best Regards,
    Elton JI
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support, contact [email protected] .

  • OCR and voting disks on ASM, problems in case of fail-over instances

    Hi everybody
    in case at your site you :
    - have an 11.2 fail-over cluster using Grid Infrastructure (CRS, OCR, voting disks),
    where you have yourself created additional CRS resources to handle single-node db instances,
    their listener, their disks and so on (which are started only on one node at a time,
    can fail from that node and restart to another);
    - have put OCR and voting disks into an ASM diskgroup (as strongly suggested by Oracle);
    then you might have problems (as we had) because you might:
    - reach max number of diskgroups handled by an ASM instance (63 only, above which you get ORA-15068);
    - experiment delays (especially in case of multipath), find fake CRS resources, etc.
    whenever you dismount disks from one node and mount to another;
    So (if both conditions are true) you might be interested in this story,
    then please keep reading on for the boring details.
    One step backward (I'll try to keep it simple).
    Oracle Grid Infrastructure is mainly used by RAC db instances,
    which means that any db you create usually has one instance started on each node,
    and all instances access read / write the same disks from each node.
    So, ASM instance on each node will mount diskgroups in Shared Mode,
    because the same diskgroups are mounted also by other ASM instances on the other nodes.
    ASM instances have a spfile parameter CLUSTER_DATABASE=true (and this parameter implies
    that every diskgroup is mounted in Shared Mode, among other things).
    In this context, it is quite obvious that Oracle strongly recommends to put OCR and voting disks
    inside ASM: this (usually called CRS_DATA) will become diskgroup number 1
    and ASM instances will mount it before CRS starts.
    Then, additional diskgroup will be added by users, for DATA, REDO, FRA etc of each RAC db,
    and will be mounted later when a RAC db instance starts on the specific node.
    In case of fail-over cluster, where instances are not RAC type and there is
    only one instance running (on one of the nodes) at any time for each db, it is different.
    All diskgroups of db instances don't need to be mounted in Shared Mode,
    because they are used by one instance only at a time
    (on the contrary, they should be mounted in Exclusive Mode).
    Yet, if you follow Oracle advice and put OCR and voting inside ASM, then:
    - at installation OUI will start ASM instance on each node with CLUSTER_DATABASE=true;
    - the first diskgroup, which contains OCR and votings, will be mounted Shared Mode;
    - all other diskgroups, used by each db instance, will be mounted Shared Mode, too,
    even if you'll take care that they'll be mounted by one ASM instance at a time.
    At our site, for our three-nodes cluster, this fact has two consequences.
    One conseguence is that we hit ORA-15068 limit (max 63 diskgroups) earlier than expected:
    - none ot the instances on this cluster are Production (only Test, Dev, etc);
    - we planned to have usually 10 instances on each node, each of them with 3 diskgroups (DATA, REDO, FRA),
    so 30 diskgroups each node, for a total of 90 diskgroups (30 instances) on the cluster;
    - in case one node failed, surviving two should get resources of the failing node,
    in the worst case: one node with 60 diskgroups (20 instances), the other one with 30 diskgroups (10 instances)
    - in case two nodes failed, the only node survived should not be able to mount additional diskgroups
    (because of limit of max 63 diskgroup mounted by an ASM instance), so all other would remain unmounted
    and their db instances stopped (they are not Production instances);
    But it didn't worked, since ASM has parameter CLUSTER_DATABASE=true, so you cannot mount 90 diskgroups,
    you can mount 62 globally (once a diskgroup is mounted on one node, it is given a number between 2 and 63,
    and other diskgroups mounted on other nodes cannot reuse that number).
    So as a matter of fact we can mount only 21 diskgroups (about 7 instances) on each node.
    The second conseguence is that, every time our CRS handmade scripts dismount diskgroups
    from one node and mount it to another, there are delays in the range of seconds (especially with multipath).
    Also we found inside CRS log that, whenever we mounted diskgroups (on one node only), then
    behind the scenes were created on the fly additional fake resources
    of type ora*.dg, maybe to accomodate the fact that on other nodes those diskgroups were left unmounted
    (once again, instances are single-node here, and not RAC type).
    That's all.
    Did anyone go into similar problems?
    We opened a SR to Oracle asking about what options do we have here, and we are disappointed by their answer.
    Regards
    Oscar

    Hi Klaas-Jan
    - best practises require that also online redolog files are in a separate diskgroup, in case of ASM logical corruption (we are a little bit paranoid): in case DATA dg gets corrupted, you can restore Full backup plus Archived RedoLog plus Online Redolog (otherwise you will stop at the latest Archived).
    So we have 3 diskgroups for each db instance: DATA, REDO, FRA.
    - in case of fail-over cluster (active-passive), Oracle provide some templates of CRS scripts (in $CRS_HOME/crs/crs/public) that you edit and change at your will, also you might create additionale scripts in case of additional resources you might need (Oracle Agents, backups agent, file systems, monitoring tools, etc)
    About our problem, the only solution is to move OCR and voting disks from ASM and change pfile af all ASM instance (parameter CLUSTER_DATABASE from true to false ).
    Oracle aswers were a litlle bit odd:
    - first they told us to use Grid Standalone (without CRS, OCR, voting at all), but we told them that we needed a Fail-over solution
    - then they told us to use RAC Single Node, which actually has some better features, in csae of planned fail-over it might be able to migreate
    client sessions without causing a reconnect (for SELECTs only, not in case of a running transaction), but we already have a few fail-over cluster, we cannot change them all
    So we plan to move OCR and voting disks into block devices (we think that the other solution, which needs a Shared File System, will take longer).
    Thanks Marko for pointing us to OCFS2 pros / cons.
    We asked Oracle a confirmation that it supported, they said yes but it is discouraged (and also, doesn't work with OUI nor ASMCA).
    Anyway that's the simplest approach, this is a non-Prod cluster, we'll start here and if everthing is fine, after a while we'll do it also on Prod ones.
    - Note 605828.1, paragraph 5, Configuring non-raw multipath devices for Oracle Clusterware 11g (11.1.0, 11.2.0) on RHEL5/OL5
    - Note 428681.1: OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE)
    -"Grid Infrastructure Install on Linux", paragraph 3.1.6, Table 3-2
    Oscar

  • Firefox Proxy Fail-over is not working correctly

    I am in a corporate environment, where we must use a complex auto-proxy, by configuring an automatic proxy configuration of http://proxyconf/proxy.pac. I am seeing an intermittent failure with Firefox 3.6.13, where the same site will load after a delay in IE (e.g. it works for half an hour, then fails for a while, etc.).
    By using Wireshark and tracing the packets, I have identified that a proxy server is intermittently failing, and Firefox is failing to try the second proxy. The auto proxy rule that is being invoked is:
    if (!isResolvable(host)) return "PROXY 172.16.39.201:8080; PROXY 10.241.32.28:8080";
    The problem is that Firefox is never failing over - it tries the 172 address 6 times in a row, then gives up and displays the "The proxy server is refusing connections" "Firefox is configured to use a proxy server that is refusing connections." "* Check the proxy settings to make sure that they are correct." "* Contact your network administrator to make sure the proxy server is working." error message. It continues with this behavior regardless of how many attempts, reloads, restarts are tried.
    IE on the other hand will try and fail with the 172 address, and then start using the 10. address (which works correctly). Several other applications also work correctly, such as IRC clients.
    Obviously the corporate proxy that is failing must be fixed, however Firefox is failing to utilitize the 2nd proxy after the first one fails.
    Seems like a bug.
    Is there some easy way for me to replace the proxy file with my own file? E.g. replace http://http://proxyconf/proxy.pac with file://c:\..., or use some add-on?
    It must be an autoproxy script, as there is no single proxy that I can use for all addresses.

    You can correct this issue by forcing the file blocklist.xml to update or wait until Firefox updates the file.<br />
    That update will remove the severity="0" flags in the file that cause the problem.
    See:
    * [/questions/832793?page=2#answer-198407]
    * http://forums.mozillazine.org/viewtopic.php?p=10899869#p10899869
    *[https://bugzilla.mozilla.org/show_bug.cgi?id=663722 Bug 663722] - The blocklist output is including severity="0" where it shouldn't be

  • Why sever-side state saving doesn't support fail over?

    Hi all,
    In my previous thread "ADF server-side state saving method" Frank said that it doesn't support fail over.
    Re: ADF server-side state saving method
    My customer is wondering the reason.
    If anyone has a clear statement about it, could you share it?
    Any help will be much appreciated.
    Atsushi

    Timo,
    As I wrote in my previous thread, my customer adopted multi-windows application design because they didn't know it caused viewExpiredException frequently.
    Now I'm looking for the best setting for avoiding the exception and need ADF guru's help.
    Frank said that ADF is on Sun's RI. And it seems that the state-saving parameters of Mojarra are working correctly in my environment. However any ADF docs don't mention the behavior of server-side state saving clearly. When I set state-saving method "server", view states are managed per logical view (≒ window). And it seems better for multi-window application than using client-token based state management from the perspective of preventing viewExpiredException.
    Because fail over is not their requirement, if we could make sure that server-side state saving doesn't have other side-effects they might adopt it.
    So I'd like to know in more detail about the behavior.
    Thanks,
    Atsushi

  • BGP in Dual Homing setup not failing over correctly

    Hi all,
    we have dual homed BGP connections to our sister company network but the failover testing is failing.
    If i shutdown the WAN interface on the primary router, after about 5 minutes, everything converges and fails over fine.
    But, if i shut the LAN interface down on the primary router, we never regain connectivity to the sister network.
    Our two ASR's have an iBGP relationship  and I can see that after a certain amount of time, the BGP routes with a next hop of the primary router get flushed from BGP and the prefferred exit path is through the secondary router. This bit works OK, but i believe that the return traffic is still attempting to return over the primary link...
    To add to this, we have two inline firewalls on each link which are only performing IPS, no packet filtering.
    Any pointers would be great.
    thanks
    Mario                

    Hi John,
    right... please look at the output below which is the partial BGP table during a link failure...
    10.128.0.0/9 is the problematic summary that still keeps getting advertised out when we do not want it to during a failure....
    now there are prefixes in the BGP table which fall within that large summary address space. But I am sure that they are all routes that are being advertised to us from the eBGP peer...
    *> 10.128.0.0/9     0.0.0.0                            32768 i
    s> 10.128.56.16/32  172.17.17.241                 150      0 2856 64619 i
    s> 10.128.56.140/32 172.17.17.241                 150      0 2856 64619 i
    s> 10.160.0.0/21    172.17.17.241                 150      0 2856 64611 i
    s> 10.160.14.0/24   172.17.17.241                 150      0 2856 64611 i
    s> 10.160.16.0/24   172.17.17.241                 150      0 2856 64611 i
    s> 10.200.16.8/30   172.17.17.241                 150      0 2856 65008 ?
    s> 10.200.16.12/30  172.17.17.241                 150      0 2856 65006 ?
    s> 10.255.245.0/24  172.17.17.241                 150      0 2856 64548 ?
    s> 10.255.253.4/32  172.17.17.241                 150      0 2856 64548 ?
    s> 10.255.253.10/32 172.17.17.241                 150      0 2856 64548 ?
    s> 10.255.255.8/30  172.17.17.241                 150      0 2856 6670 ?
    s> 10.255.255.10/32 172.17.17.241                 150      0 2856 ?
    s> 10.255.255.12/30 172.17.17.241                 150      0 2856 6670 ?
    s> 10.255.255.14/32 172.17.17.241                 150      0 2856 ?
    i would not expect summary addresses to still be advertised if the specific prefixes are coming from eBGP... am i wrong?
    thanks for everything so far...
    Mario De Rosa

  • How to add a cloud machine as a node to existing windows fail over cluster having on-premise node in Windows server 2008 R2

    Hi All,
    We have a windows fail over cluster having one windows machine on local network as one of its node.
    I want to add a virtual cloud machine available on microsoft azure as another node to this existing cluster.
    Please suggest how to do this?
    Thanking all in advance,
    Raghvendra

    Before you even start working on the SQL side, you will need to create a Windows Server 2008 R2 cluster with no shared storage.  You can actually test that in-house.  Create a VM running 2008 R2 and cluster it with your physical (from your description,
    I am assuming physical) 2008 R2 machine. Create it with a file share witness for quorum. Then configure your environment to see that it works as expected.
    Once you know how to configure the cluster between physical and VM with a file share witness, build it to Azure.  The location of the FSW gets to be an interesting choice.  To have a FSW in Azure means that you will need another VM in Azure to
    host the file share, meaning you have two quorum votes in Azure and one in-house.  Or, you could create a file share witness on an in-house system, giving you two quorum votes in-house and one in Azure.
    In the FSW in Azure scenario, if you have a loss of the in-house server, automatic failover occurs because two quorum votes exist in Azure.  With FSW in-house, depending on the loss you have in-house, you might have to force quorum to get the Azure
    single-node cluster to run.  Loss of access to Azure reverses those scenarios.  Neither one is optimal, but it does provide some level of recoverability.
    . : | : . : | : . tim

  • Why would Fabric Interconnects populate in fabric topology?

    If the Fabric Interconnects are configured in End-Host mode, why would it be sending management traffic through an F port link making it populate on our fabric topology.  We don’t have this on our existing Blades centers and would expect that we would be able to suppress this for these UCS interconnect devices.  These devices are not managed by the SAN teams and they would prefer them NOT to appear as though they are.
    Any advice?

    Release notes in 2.0(5a) showed 
    CSCua91672
    The fcoe_mgr hap reset will no longer cause FI reboot.
    http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/release/notes/OL_25363.html
    I am really never on the Fiber Interconnects to execute any CLI commands.  But a memory leak for some other reason would certainly be a possibility. The fact that its own High Availability process causes double reset in its own High Availability architecture is troubling. 

Maybe you are looking for