TCP Probe failure on CSM

I have had a customer raise an issue with me. Unfortunately I am not too hot on the CSM.
Switch Type - Cisco WS-C6513
IOS VSN - 12.2(18)SXF4
CSM Module details
Card Type - SLB Application Processor Module
Model - WS-X6066-SLB-APC
Hardware VSN - 1.8
Software VSN - 4.2(5)
I have spotted bug CSCsc38892, but cannot tell if the catakyst is running VRF - this is all the info I have been given.
We have 6500 chassis each with a CSM module in fault tolerance (ft) mode.
The current standby module (at the time it was the active one) reports as per output below that probes failed:
sh mod csm 1 real sf SAPCCP
real server farm weight state conns/hits
10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
Probe failed could be for a number of reasons however the inconsistency is that the failed probes follow the CSM module and that the other module has working probes.
It is to be noted that the command “ping module csm 1 10.xxx.xxx.xxx” reports to be reachable as a good indicator that there is connectivity from the CSM to the server.
The following is the output from the currently active CSM (which used to be the standby one):
sh mod csm 1 real sf SAPCCP
real server farm weight state conns/hits
10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 51
10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 41
10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 52
For info:
FUNCTIONING CSM
sh mod 1
Mod Ports Card Type Model Serial No.
1 4 SLB Application Processor Complex WS-X6066-SLB-APC SAD094803UC
Mod MAC addresses Hw Fw Sw Status
1 0015.f998.a94a to 0015.f998.a951 1.8 4.2(5) Ok
Mod Online Diag Status
1 Pass
relevant config lines
serverfarm SAPCCP
nat server
no nat client
failaction purge
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
probe ABAP
vserver SAPCCP-VIP
virtual 10.xxx.xxx.xxx tcp www
vlan xxx
serverfarm SAPCCP
sticky 15 group 117
replicate csrp sticky
replicate csrp connection
no persistent rebalance
parse-length 4000
inservice
probe ABAP tcp
interval 2
retries 2
failed 6
open 3
port 8000

Continued:
NON-FUNCTIONING CSM
sh mod 1
Mod Ports Card Type Model Serial No.
1 4 SLB Application Processor Complex WS-X6066-SLB-APC SAD094609ZZ
Mod MAC addresses Hw Fw Sw Status
1 0015.f998.8386 to 0015.f998.838d 1.8 4.2(5) Ok
Mod Online Diag Status
1 Pass
relevant config lines
serverfarm SAPCCP
nat server
no nat client
failaction purge
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
probe ABAP-CCP
vserver SAPCCP-VIP
virtual 10xxx.xxx.xxx tcp www
vlan xxx
serverfarm SAPCCP
sticky 15 group 117
replicate csrp sticky
replicate csrp connection
no persistent rebalance
parse-length 4000
inservice
probe ABAP-CCP tcp
recover 2
interval 6
retries 2
failed 6
open 5
port 8000
Thanks for any pointers on where to look,
Paul.

Similar Messages

  • ACE Probe Failure Error Messages

    Hi,
    I'm looking for the difference between the below error messages for a probe failure:
    Server open timeout (no SYN ACK)
    Server reply timeout (no reply)
    I guess what I do not understand is if the ACE sends a TCP probe - he sends a syn and expects a syn ack back.  If no syn ack back then there's no reply right?  Any feed back on these errors would be greatly appreciated.
    /r
    Rob

    Hi Rob,
    By default, when the ACE sends a probe, it expects  a response within a time period of 10 seconds. For example, for an HTTP  probe, the timeout period is the number of seconds to receive an HTTP  reply for a GET or HEAD request. If the server fails to respond to the  probe, the ACE marks the server as failed.
    Here is where the "Server reply timeout (no reply)" comes into play, and it is due to the server not replaying back to the ACE once the content request was made (sequence: SYN,SYN/ACK,ACK,GETorHEAD,..........no reply), this is the difference between Server open timeout (no SYN ACK).
    Server open timeout (no SYN ACK), here the ACE is just opening the connection and doing the TCP synchronization.
    In the logic you explain, yes, we could say that if there is no SYN/ACK, there is no reply from the server. But, in ACE language, if there is no SYN/ACK, well we have an error to know that the problem is during the 3-way-handshake, reason could be that the port is not open in the server or in the firewall if we have one in between, etc..., and if there is a reply timeout, this is to know, that the issue might be related with the server having trouble replying back to the ACE, and possible reasons could be that the server is overloaded and unable to reply within the 10 seconds, there is a lot of congestion in the network, etc.....
    So as you can see, this help to differentiate the possible troubleshooting that we might need to apply depending on the error message.
    Hope this help.
    Rod.

  • Function of health probe timers on CSM

    Hi,
    we use the following configuration on a csm to monitor a server farm and I'm wondering how exactly the probe timers work.
    ===
    serverfarm sf
    nat server
    nat client natpool1
    failaction purge
    real name serv1
    weight 1
    inservice
    real name serv2
    weight 1
    inservice
    probe probe1
    probe probe1 script
    script LDAP_PROBE
    interval 5
    retries 2
    receive 1
    port 389
    ===
    So in my eyes the probes are sent every 5 seconds. When a probe isn't answered within one second it's marked as failed. If two probes are failed (retries 2) the real server is marked as down.
    Is this correct?
    In a network trace I see a different behaviour: Probes are sent every 5 seconds. If a real server goes out-of-service I see a probe which is not answered and the next probe is sent after 10 seconds (I expected 5 seconds). 5 seconds later the real server is marked down in the switch log.
    It would be fine if anybody could help me.
    Best Regards,
    Thorsten Steffen

    Hi,
    following the meaning of the parameters:
    Router(config-slb-probe)#
    interval seconds
    Sets the interval between probes in seconds (from the end of the previous probe to the beginning of the next probe) when the server is healthy.
    Range = 2-65535 seconds
    Default = 120 seconds
    Router(config-slb-probe)#
    retries retry-count
    Sets the number of failed probes that are allowed before marking the server as failed.
    Range = 0-65535
    Default = 3
    Router(config-slb-probe)#
    failed failed-interval
    Sets the time between health checks when the server has been marked as failed. The time is in seconds.
    Range = 2-65535
    Default = 300 seconds
    Router(config-slb-probe)# open
    open-timeout
    Sets the maximum time to wait for a TCP connection. This command is not used for any non-TCP health checks (ICMP or DNS1).
    Range = 1-65535
    Default = 10 seconds
    There are two different timeout values: open and receive. The open timeout specifies how many seconds to wait for the connection to open (that is, how many seconds to wait for SYN ACK after sending SYN). The receive timeout specifies how many seconds to wait for data to be received (that is, how many seconds to wait for an HTTP reply after sending a GET/HHEAD request). Because TCP probes close as soon as they open without sending any data, the receive timeout is not used.
    When sniffing, you should see a probe each 5 seconds. When a probe fails for the first time, a second probe should be send after 5 seconds. when this probe fails too, the server is put out of service.
    That should be the behaviour you should see.
    HTH,
    Dario

  • IOS SLB and probe failure

    Hello,
    we use server-load-balancing with IOS 12.1(19)E1
    We have a problem if the server receives more connections following error messages “REAL 192.168.197.8 (HSSAT1-LX) has changed to PROBE_FAILED” and few seconds later “REAL 192.168.197.8 (HSSAT1-LX) has changed to OPERATIONAL” appears and so on.
    We checked the server and they works proper.
    What could be the reason for probe failed?
    My configuration:
    ip slb probe HS-PROBE tcp
    interval 5
    ip slb serverfarm HSSAT1-LX
    nat server
    predictor leastconns
    failaction purge
    probe HS-PROBE
    real 192.168.197.8 99
    reassign 2
    inservice
    real 192.168.197.9 99
    reassign 2
    inservice
    ip slb vserver HS.SAT1.DE
    virtual xxx.xxx.xxx.xxx tcp www
    serverfarm HSSAT1-LX
    advertise active
    inservice standby allvips
    How does a TCP probe works? – I could not find more exact information in the documents to configure probes.
    Is it better to use another probe (icmp)? – or without any probe?
    When does it make sense to use probes?
    Best regards
    Stefan

    HI Stefan,
    tcp probes do a complete TCP 3-way handshake and normaly terminate the session. A problem which I had some times timeout for a session to be established might be to short if the server is "heavy" loaded.
    Probing on a specific method (TCP HTTP ...) is most of the times the better solution. Imagine a WEB-Server which is properly pingable but the httpd died due to some internal error. If you would probe on a per ping basis the loadbalancer will never notice this but if you monitor tcp-port 80 by a tcp probe or better a http probe you will notice this and the server would be taken out of the serverfarm. Even better but afaik not possible in IOS SLB is to probe a certain page e.g. index.html. As you know that the httpd is up and running and pages can be displayed.
    Regarding the probing issue it might be usefull to read the follwing link describing healthmonitoring with the CSM
    http://www.cisco.com/en/US/products/hw/switches/ps708/products_installation_and_configuration_guide_chapter09186a00801c5899.html#1024967
    Hope that helped.
    Best Regards,
    Joerg

  • Ace probe failure after IIS app pool recycle?

    Windows Server 2003 SP2
    ACE Module A2(1.6a)
    I suspect this is caused by an IIS6 setting, but posting here in case anyone has seen this.  For this one particular site, we have 4 servers in the farm.  2 of those servers are fine.  The other 2 (new) servers will generate probe failure after the site's app pool recycles.  I then remove the 2 servers from service and re-activate (no inservice, then inservice) and the probe comes back as operational.  It appears that the app pool recycle somehow is resetting the hash on the default page, though I'm not sure how.  Any ideas are very much appreciated. 

    Yeah, the hash is inside the probe.  Here's the config for the serverfarm and the probe.  Public-007 and Public-008 are new servers...the other 6 have been in the farm for the last 2.5 years and they don't have this issue.  It's only the 2 new boxes that the probe fails when the app pool is recycled.
    serverfarm host PUBLIC
      probe URL-DEFAULT-ASPX
      rserver PUBLIC-001
        inservice
      rserver PUBLIC-002
        inservice
      rserver PUBLIC-003
        inservice
      rserver PUBLIC-004
        inservice
      rserver PUBLIC-005
        inservice
      rserver PUBLIC-006
        inservice
      rserver PUBLIC-007
        inservice
      rserver PUBLIC-008
        inservice
    probe http URL-DEFAULT-ASPX
      interval 2
      faildetect 2
      passdetect interval 2
      passdetect count 2
      request method get url /default.aspx
      expect status 200 200
      hash

  • ACE show serverfarm - failure counter does not incremented on Probe-Failure event

    Hi,
    Despite of probe-failure the failure counter is not incremented. Is there any correlation between the configured probe and the failure counter?
    (Custom script probe is used for this serverfarm)
    # sh serverfarm xxxxxSt
    serverfarm     : xxxxxSt, type: HOST
    total rservers : 2
                                                    ----------connections-----------
           real                  weight state        current    total      failures
       ---+---------------------+------+------------+----------+----------+---------
       rserver: xxxxx6
           10.222.0.90:8000      8      OPERATIONAL  13         157        0
       rserver: xxxxx7
           10.222.0.92:8000      8      PROBE-FAILED 0          0          0
    Thanks,
    Attila

    Hi Attila,
    The Connection Failure counter under show serverfarm is for Loadbalanced Connections which are failing.
    If Probes are failing, this counter will not increment.
    The Connection failure counter can increment for various reasons some of them are,
    - Server not responding to the SYN packet sent by ACE for Loadbalanced connection
    - Server sending Reset to the SYN packet sent by ACE for Loadbalanced connection
    To check on stats for Probe, you can run "show probe detail" command.
    Hope this helps,
    Best Regards,
    Rahul

  • Monitoring for probe failure to real servers

    Hi All,
    I'm working on task to trigger an alert to SNMP server for the probe failure to the real server.
    Is there any way we can SNMP trap for probe failure to any real servers.
    Regards,
    Thiyagu

    Thiyagu,
    This is possible, refer the link below for ACE management features.
    https://supportforums.cisco.com/docs/DOC-22543
    Regards,
    Siva

  • GSLB probes in redundant CSM setup

    Hi -
    When using leastloaded in GSLB setup a probe is needed to get load data from remote CSM. Is it possible to initiate probes from specific interface on CSM?
    Does the secondary unit in a ft setup make own probes, or is it updated on the load from the primary?
    Right now I have a situation where probes from a CSM is sent with the source IP adr. belonging in one vlan out another - there is no bridging between theese vlans.
    Any help would be appreciated.

    Hi Gilles -
    Many thanks for your fast answer.
    Yes - the way to control it is to define routing within the vlan that I want to source the address. Came to the same conclusion, and it works. What really bothered me was to discover traffic sourced from one vlan interface in another vlan (especially because it is a DMZ setup). My problem was that I had defined gateway command on several client vlan interfaces. Is there a way to se the routing table of the CSM?
    Rgds Peter

  • ACE initiating TCP RST causing probe failure

    I have seen so many users reporting this issue on the Cisco Support Community and yet noone  has posted any resolution or followup.
    I am having the same issue and has to backout the cutover.
    I see in the traces that the ACE is initiating RST.
    Any solution to this or timers to tweek.
    Thanks

    Good afternoon,
    There are multiple reasons why an ACE may reset a connection, so it's impossible to give a universal solution for it. I would recommend you to open a TAC service request to have your specific issue investigated further.
    Regards
    Daniel

  • ACE - TCP probe goes into INVALID state

    Hello,
    I have a problem with the following configuration of a sticky serverfarm with a backup serverfarm
    (this setup is ofcourse used only for failover purposes, not loadbalancing):
    probe tcp tcp-8888-probe
      port 8888
      interval 5
      faildetect 2
      passdetect interval 3
      passdetect count 1
    rserver host rsrv1
      ip address 10.1.2.10
      inservice
    rserver host rsrv2
      ip address 10.1.2.11
      inservice
    serverfarm host rfarm-primary
      predictor leastconns
      probe tcp-8888-probe
      rserver rsrv1 8888
        inservice
    serverfarm host rfarm-backup
      predictor leastconns
      probe tcp-8888-probe
      rserver rsrv2 8888
       inservice
    sticky http-cookie RFARM-COOKIE sticky-rfarm-1
      cookie insert browser-expire
      serverfarm rfarm-primary backup rfarm-backup
    etc....
    The problem is that every time probe state changes (from SUCCESS to FAIL or otherwise), the tcp-8888-probe on the server that changed
    the state of service, goes into INVALID state:
    #show probe tcp-8888-probe detail
    probe       : tcp-8888-probe
    type        : TCP
    state       : ACTIVE
    description :
       port      : 8888    address     : 0.0.0.0         addr type  : -
       interval  : 5       pass intvl  : 3               pass count : 1
       fail count: 2       recv timeout: 10
       conn termination : GRACEFUL
       expect offset    : 0         , open timeout     : 10
       expect regex     : -
       send data        : -
                           --------------------- probe results --------------------
       probe association   probed-address  probes     failed     passed     health
       ------------------- ---------------+----------+----------+----------+-------
       serverfarm  : rfarm-backup
         real      : rsrv2[8888]
                           10.1.2.11    291        0          291        SUCCESS
       Socket state        : CLOSED
       No. Passed states   : 1         No. Failed states : 0
       No. Probes skipped  : 0         Last status code  : 0
       No. Out of Sockets  : 0         No. Internal error: 0
       Last disconnect err :  -
       Last probe time     : Thu Jun 17 22:12:31 2010
       Last fail time      : Never
       Last active time    : Thu Jun 17 21:48:21 2010
       serverfarm  : rfarm-primary
         real      : rsrv1[8888]
                           10.1.2.10    0          0          0          INVALID
       Socket state        : CLOSED
       No. Passed states   : 0         No. Failed states : 0
       No. Probes skipped  : 0         Last status code  : 0
       No. Out of Sockets  : 0         No. Internal error: 0
       Last disconnect err :  -
       Last probe time     : Never
       Last fail time      : Never
       Last active time    : Never
    I have managed to get the probe into FAIL state again for a moment by removing it from serverfarm, and then reapplying, but in a few seconds it goes again from FAIL to INVAILD state, and stays in this state regardless of avaliability of probed TCP port. Only when i'm reapplying it when the port is avaliable/up, it can stay in SUCCESS state, and work till the failure of service, when INVALID state reappears.
    What can be the cause of such behavior ?
    thanks,
    WM

    Hello,
    It looks very similar to this bug: CSCsh74871
    You may need to collect a #show tech-support and do the following:
    -remove the serverfarm in question
    -reboot the ace module under a maintenance window.
    You may upgrade to a higher version since your version is kind of old.
    Jorge

  • What is TCP Splicing on a CSM Vserver

    I'm having an issue with a service on my CSM where the server log is showing "An error occurred receiving data from (10.129.53.250) over TCP/IP. This may
    be due to a communications failure". That address is the CSM NAT Address. When I do a packet capture I see a good number of lost segments and retransmissions (TCP segment of a reassembled PDU) between the CSM and the server. When the CSM is removed from the equation and the server is directly accessed the issue goes away. We are not seeing issues with other VIP's. What is the TCP splicing feature and could it help with this issue? The manual has no real explaination of this feature. If this can't help does anyone have any other ideas?
    Thanks,
    Dave

    Hi Carlsond,
    Ip splicing/hijacking => is the process of a hacker that will predict a session number and use it to take over a legitimate session (usually TCP). The target station will not know that the peer has been changed.
    TCP splicing is a technique to splice two TCP connections by segment translation, so that data relaying between the two connections can be run at near router speeds. This technique can be used to speed up layer-7 switching, web proxy and application firewall running in the user space.
    TCP splicing is a technique to interconnection two separate TCP connections for fast data relay. A TCP splicer changes values in the IP and TCP headers: source and destination IP addresses, port numbers, sequence and acknowledgement numbers, and checksums.
    TCP splicing has been commonly used for increasing the performance of serving web content through proxies. Web server architectures built using TCP splicing suffer from two limitations: all traffic between clients and servers typically passes through the proxy, thus making the proxy scalability and performance bottlenecks; and this architecture cannot tolerate proxy failures.
    The CSM provides support for fragmented TCP packets. The TCP fragment feature only works with VIPs that have Level 4 policies defined and will not work for SYN packets or for Layer 7 policies. To support fragmented TCP packets, the CSM matches the TCP fragments to existing data flows or by matching the bridging VLAN ID. The CSM will not reassemble fragments for Layer 7 parsing. Because the CSM has a finite number of buffers and fragment ID buckets, packet resending is required when there are hash collisions.
    When enabling TCP splicing, you must designate a virtual server as a Layer 7 device even when it does not have a Layer 7 policy. This option is only valid for the TCP protocol.
    To configure TCP splicing, perform this task:
    Step 1 Router(config-module-csm)# vserver virtserver-name
    Purpose
    Identifies the virtual server and enters the virtual server configuration mode.
    Step 2 Router(config-slb-vserver)# vserver tcp-protect
    Purpose
    Designates the virtual server for TCP splicing2.
    Step 3 Router(config-slb-vserver)# virtual 100.100.100.100 tcp any service tcp-termination
    Purpose
    Enables TCP splicing.
    Kindly see the reference for this as follows:
    http://www.cisco.com/en/US/docs/interfaces_modules/services_modules/csm/3.2/configuration/guide/mapolcy.html#wp1038073
    Sachin Garg

  • ACE TCP probe

    My costomers ask different tcp port probes for different applications. Is there such things - standard probe TCL? So every time, I just need to work on the stand TCL and apply it to the serverfarm. That way can avoid a long probe config for different ports.
    Thank you in advance,
    June Hu

    Could the soultion be that the probe is configured to terminate a TCP connection by sending a RST, with the connection term command?
    It seems that this makes the probe pass the health check.
    Br
    Geir

  • TCP Probe on ACE 4710

    Hi,
    I am trying to configure proble on ACE device and I have few queries on those:
    1. I want to probe 10 different tcp ports for a serverfarm, is there any way i can give the range on probe ? if not and if i have to probe individual port and then configure in a serverfarm, how it would behave i.e. I want to fail the probe only when all the configured ports are failed.
    2. I am trying to configure probe for a particular tcp port, but I suppose server is not sending RST to that port, so probe is failing. However if I try to telnet that port from any other location it is getting connected. How can I configure probe in that case for that port ?
    Pls. suggest.
    Thanks
    Pawan

    You will need to configure a probe for each port.
    Add all the probes to the serverfarm.
    Use the command "fail-on-all" under the serverfarm.
    http://www.cisco.com/en/US/partner/docs/app_ntwk_services/data_center_app_services/ace_appliances/vA3_1_0/command/reference/servfarm.html#wp1106543
    Gilles.

  • ACE Module - HTTP Probe failure

    Hi,
    I have configured the http probe with expect status 200 202, but the probe fails despite availability of the port on rserver.
    I tried head/get method to see the return code, and it came back with HTTP1.1/302. How can I configure an http probe to understand HTTP 302 code as success return.
    Thanks.

    I changed the expect status value as below
    probe http TEST-HTTP
    interval 30
    passdetect interval 10
    request method head
    expect status 302 302
    The probe is still failing with the log message
    Apr 20 2009 12:04:35 : %ACE-3-251010: Health probe failed for server 192.168.1.10 on port 80, received invalid status code
    On 'show probe detail' it shows the last status code as 400 which means Bad Request
    --------------------- probe results --------------------
    probe association probed-address probes failed passed health
    ------------------- ---------------+----------+----------+----------+-------
    serverfarm : TEST-APP
    real : TEST-SERVER1[80]
    192.168.1.10 27 27 0 FAILED
    Socket state : CLOSED
    No. Passed states : 0 No. Failed states : 1
    No. Probes skipped : 0 Last status code : 400
    No. Out of Sockets : 0 No. Internal error: 0
    Last disconnect err : Received invalid status code
    Last probe time : Mon Apr 20 12:05:33 2009
    Last fail time : Mon Apr 20 12:00:53 2009
    Last active time : Never
    The http page is showing perfectly on the web browser. Also, using the http head/get tool, I can see that 302 is returned.
    What could be the problem.
    Regards.

  • ACE 4700 ssl probe failure

    We recently updated one of our servers to a new SSL certificate using the 4096 bit cipher key. Now the ACE probe to that server fails.
    We have SSL version set to any and SSL cipher set to any. Id there a problem with a ACE https probe not supporting cipher keyes longer then 1024 bit ?

    Hello DLance,
    The ACE supports ssl certs upto 2048bits..
    If you refer to the following guide, there is mention of the 2048 limit:
    http://www.cisco.com/en/US/partner/docs/app_ntwk_services/data_center_app_services/ace_appliances/vA4_1_0/configuration/ssl/guide/certkeys.html
    HTH. Regards.

Maybe you are looking for

  • Ipod classic no longer supported under new Itunes?

    I have an older 160GB IPod classic which is only used in my Porsche as it syncs best with the sound system. After downloading the latest Itunes software onto my PC yesterday, I see that I can no longer autofill or otherwise navigate through my Ipod v

  • Loading large files in Java Swing GUI

    Hello Everyone! I am trying to load large files(more then 70 MB of xml text) in a Java Swing GUI. I tried several approaches, 1)Byte based loading whith a loop similar to pane.setText("");              InputStream file_reader = new BufferedInputStrea

  • HR Position Structure

    All, I am looking to bring in the Position structure in R/3 to BW.  Currently, we have our standard org unit hierarchy master data coming in, however we are wanting to report based on the Position structure.  Is there standard content that contains t

  • Windows Perview Pane Error

    Hi, I have Windows 2008 R2 Terminal server (64Bit) with Reader X. (V10.0.0). I'm having a problem with Windows Preview Pane (in outlook preview pane it's work perfect). I'm getting error when I'm stand on PDF file. to solved it i need to open acrobat

  • Channels missing

    Hi I installed BT Vision a couple of days ago.  Everything is working fine - including getting access to the two sports channels using a viewing card - but I can only receive a few of the other channels.  I can't get any of the BBC channels, ITV 1, I