TCP Probe failure on CSM

I have had a customer raise an issue with me. Unfortunately I am not too hot on the CSM.
Switch Type - Cisco WS-C6513
IOS VSN - 12.2(18)SXF4
CSM Module details
Card Type - SLB Application Processor Module
Model - WS-X6066-SLB-APC
Hardware VSN - 1.8
Software VSN - 4.2(5)
I have spotted bug CSCsc38892, but cannot tell if the catakyst is running VRF - this is all the info I have been given.
We have 6500 chassis each with a CSM module in fault tolerance (ft) mode.
The current standby module (at the time it was the active one) reports as per output below that probes failed:
sh mod csm 1 real sf SAPCCP
real server farm weight state conns/hits
10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
Probe failed could be for a number of reasons however the inconsistency is that the failed probes follow the CSM module and that the other module has working probes.
It is to be noted that the command âping module csm 1 10.xxx.xxx.xxxâ reports to be reachable as a good indicator that there is connectivity from the CSM to the server.
The following is the output from the currently active CSM (which used to be the standby one):
sh mod csm 1 real sf SAPCCP
real server farm weight state conns/hits
10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 51
10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 41
10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 52
For info:
FUNCTIONING CSM
sh mod 1
Mod Ports Card Type Model Serial No.
1 4 SLB Application Processor Complex WS-X6066-SLB-APC SAD094803UC
Mod MAC addresses Hw Fw Sw Status
1 0015.f998.a94a to 0015.f998.a951 1.8 4.2(5) Ok
Mod Online Diag Status
1 Pass
relevant config lines
serverfarm SAPCCP
nat server
no nat client
failaction purge
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
probe ABAP
vserver SAPCCP-VIP
virtual 10.xxx.xxx.xxx tcp www
vlan xxx
serverfarm SAPCCP
sticky 15 group 117
replicate csrp sticky
replicate csrp connection
no persistent rebalance
parse-length 4000
inservice
probe ABAP tcp
interval 2
retries 2
failed 6
open 3
port 8000

Continued:
NON-FUNCTIONING CSM
sh mod 1
Mod Ports Card Type Model Serial No.
1 4 SLB Application Processor Complex WS-X6066-SLB-APC SAD094609ZZ
Mod MAC addresses Hw Fw Sw Status
1 0015.f998.8386 to 0015.f998.838d 1.8 4.2(5) Ok
Mod Online Diag Status
1 Pass
relevant config lines
serverfarm SAPCCP
nat server
no nat client
failaction purge
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
probe ABAP-CCP
vserver SAPCCP-VIP
virtual 10xxx.xxx.xxx tcp www
vlan xxx
serverfarm SAPCCP
sticky 15 group 117
replicate csrp sticky
replicate csrp connection
no persistent rebalance
parse-length 4000
inservice
probe ABAP-CCP tcp
recover 2
interval 6
retries 2
failed 6
open 5
port 8000
Thanks for any pointers on where to look,
Paul.

Similar Messages

ACE Probe Failure Error Messages

Hi,
I'm looking for the difference between the below error messages for a probe failure:
Server open timeout (no SYN ACK)
Server reply timeout (no reply)
I guess what I do not understand is if the ACE sends a TCP probe - he sends a syn and expects a syn ack back. If no syn ack back then there's no reply right? Any feed back on these errors would be greatly appreciated.
/r
Rob

Hi Rob,
By default, when the ACE sends a probe, it expects a response within a time period of 10 seconds. For example, for an HTTP probe, the timeout period is the number of seconds to receive an HTTP reply for a GET or HEAD request. If the server fails to respond to the probe, the ACE marks the server as failed.
Here is where the "Server reply timeout (no reply)" comes into play, and it is due to the server not replaying back to the ACE once the content request was made (sequence: SYN,SYN/ACK,ACK,GETorHEAD,..........no reply), this is the difference between Server open timeout (no SYN ACK).
Server open timeout (no SYN ACK), here the ACE is just opening the connection and doing the TCP synchronization.
In the logic you explain, yes, we could say that if there is no SYN/ACK, there is no reply from the server. But, in ACE language, if there is no SYN/ACK, well we have an error to know that the problem is during the 3-way-handshake, reason could be that the port is not open in the server or in the firewall if we have one in between, etc..., and if there is a reply timeout, this is to know, that the issue might be related with the server having trouble replying back to the ACE, and possible reasons could be that the server is overloaded and unable to reply within the 10 seconds, there is a lot of congestion in the network, etc.....
So as you can see, this help to differentiate the possible troubleshooting that we might need to apply depending on the error message.
Hope this help.
Rod.

Function of health probe timers on CSM

Hi,
we use the following configuration on a csm to monitor a server farm and I'm wondering how exactly the probe timers work.
===
serverfarm sf
nat server
nat client natpool1
failaction purge
real name serv1
weight 1
inservice
real name serv2
weight 1
inservice
probe probe1
probe probe1 script
script LDAP_PROBE
interval 5
retries 2
receive 1
port 389
===
So in my eyes the probes are sent every 5 seconds. When a probe isn't answered within one second it's marked as failed. If two probes are failed (retries 2) the real server is marked as down.
Is this correct?
In a network trace I see a different behaviour: Probes are sent every 5 seconds. If a real server goes out-of-service I see a probe which is not answered and the next probe is sent after 10 seconds (I expected 5 seconds). 5 seconds later the real server is marked down in the switch log.
It would be fine if anybody could help me.
Best Regards,
Thorsten Steffen

Hi,
following the meaning of the parameters:
Router(config-slb-probe)#
interval seconds
Sets the interval between probes in seconds (from the end of the previous probe to the beginning of the next probe) when the server is healthy.
Range = 2-65535 seconds
Default = 120 seconds
Router(config-slb-probe)#
retries retry-count
Sets the number of failed probes that are allowed before marking the server as failed.
Range = 0-65535
Default = 3
Router(config-slb-probe)#
failed failed-interval
Sets the time between health checks when the server has been marked as failed. The time is in seconds.
Range = 2-65535
Default = 300 seconds
Router(config-slb-probe)# open
open-timeout
Sets the maximum time to wait for a TCP connection. This command is not used for any non-TCP health checks (ICMP or DNS1).
Range = 1-65535
Default = 10 seconds
There are two different timeout values: open and receive. The open timeout specifies how many seconds to wait for the connection to open (that is, how many seconds to wait for SYN ACK after sending SYN). The receive timeout specifies how many seconds to wait for data to be received (that is, how many seconds to wait for an HTTP reply after sending a GET/HHEAD request). Because TCP probes close as soon as they open without sending any data, the receive timeout is not used.
When sniffing, you should see a probe each 5 seconds. When a probe fails for the first time, a second probe should be send after 5 seconds. when this probe fails too, the server is put out of service.
That should be the behaviour you should see.
HTH,
Dario

IOS SLB and probe failure

Hello,
we use server-load-balancing with IOS 12.1(19)E1
We have a problem if the server receives more connections following error messages REAL 192.168.197.8 (HSSAT1-LX) has changed to PROBE_FAILED and few seconds later REAL 192.168.197.8 (HSSAT1-LX) has changed to OPERATIONAL appears and so on.
We checked the server and they works proper.
What could be the reason for probe failed?
My configuration:
ip slb probe HS-PROBE tcp
interval 5
ip slb serverfarm HSSAT1-LX
nat server
predictor leastconns
failaction purge
probe HS-PROBE
real 192.168.197.8 99
reassign 2
inservice
real 192.168.197.9 99
reassign 2
inservice
ip slb vserver HS.SAT1.DE
virtual xxx.xxx.xxx.xxx tcp www
serverfarm HSSAT1-LX
advertise active
inservice standby allvips
How does a TCP probe works? I could not find more exact information in the documents to configure probes.
Is it better to use another probe (icmp)? or without any probe?
When does it make sense to use probes?
Best regards
Stefan

HI Stefan,
tcp probes do a complete TCP 3-way handshake and normaly terminate the session. A problem which I had some times timeout for a session to be established might be to short if the server is "heavy" loaded.
Probing on a specific method (TCP HTTP ...) is most of the times the better solution. Imagine a WEB-Server which is properly pingable but the httpd died due to some internal error. If you would probe on a per ping basis the loadbalancer will never notice this but if you monitor tcp-port 80 by a tcp probe or better a http probe you will notice this and the server would be taken out of the serverfarm. Even better but afaik not possible in IOS SLB is to probe a certain page e.g. index.html. As you know that the httpd is up and running and pages can be displayed.
Regarding the probing issue it might be usefull to read the follwing link describing healthmonitoring with the CSM
http://www.cisco.com/en/US/products/hw/switches/ps708/products_installation_and_configuration_guide_chapter09186a00801c5899.html#1024967
Hope that helped.
Best Regards,
Joerg

Ace probe failure after IIS app pool recycle?

Windows Server 2003 SP2
ACE Module A2(1.6a)
I suspect this is caused by an IIS6 setting, but posting here in case anyone has seen this. For this one particular site, we have 4 servers in the farm. 2 of those servers are fine. The other 2 (new) servers will generate probe failure after the site's app pool recycles. I then remove the 2 servers from service and re-activate (no inservice, then inservice) and the probe comes back as operational. It appears that the app pool recycle somehow is resetting the hash on the default page, though I'm not sure how. Any ideas are very much appreciated.

Yeah, the hash is inside the probe. Here's the config for the serverfarm and the probe. Public-007 and Public-008 are new servers...the other 6 have been in the farm for the last 2.5 years and they don't have this issue. It's only the 2 new boxes that the probe fails when the app pool is recycled.
serverfarm host PUBLIC
probe URL-DEFAULT-ASPX
rserver PUBLIC-001
    inservice
rserver PUBLIC-002
    inservice
rserver PUBLIC-003
    inservice
rserver PUBLIC-004
    inservice
rserver PUBLIC-005
    inservice
rserver PUBLIC-006
    inservice
rserver PUBLIC-007
    inservice
rserver PUBLIC-008
    inservice
probe http URL-DEFAULT-ASPX
interval 2
faildetect 2
passdetect interval 2
passdetect count 2
request method get url /default.aspx
expect status 200 200
hash

ACE show serverfarm - failure counter does not incremented on Probe-Failure event

Hi,
Despite of probe-failure the failure counter is not incremented. Is there any correlation between the configured probe and the failure counter?
(Custom script probe is used for this serverfarm)
# sh serverfarm xxxxxSt
serverfarm     : xxxxxSt, type: HOST
total rservers : 2
                                                ----------connections-----------
       real                  weight state        current    total      failures
   ---+---------------------+------+------------+----------+----------+---------
   rserver: xxxxx6
       10.222.0.90:8000      8      OPERATIONAL 13         157        0
   rserver: xxxxx7
       10.222.0.92:8000      8      PROBE-FAILED 0          0          0
Thanks,
Attila

Hi Attila,
The Connection Failure counter under show serverfarm is for Loadbalanced Connections which are failing.
If Probes are failing, this counter will not increment.
The Connection failure counter can increment for various reasons some of them are,
- Server not responding to the SYN packet sent by ACE for Loadbalanced connection
- Server sending Reset to the SYN packet sent by ACE for Loadbalanced connection
To check on stats for Probe, you can run "show probe detail" command.
Hope this helps,
Best Regards,
Rahul

Monitoring for probe failure to real servers

Hi All,
I'm working on task to trigger an alert to SNMP server for the probe failure to the real server.
Is there any way we can SNMP trap for probe failure to any real servers.
Regards,
Thiyagu

Thiyagu,
This is possible, refer the link below for ACE management features.
https://supportforums.cisco.com/docs/DOC-22543
Regards,
Siva

GSLB probes in redundant CSM setup

Hi -
When using leastloaded in GSLB setup a probe is needed to get load data from remote CSM. Is it possible to initiate probes from specific interface on CSM?
Does the secondary unit in a ft setup make own probes, or is it updated on the load from the primary?
Right now I have a situation where probes from a CSM is sent with the source IP adr. belonging in one vlan out another - there is no bridging between theese vlans.
Any help would be appreciated.

Hi Gilles -
Many thanks for your fast answer.
Yes - the way to control it is to define routing within the vlan that I want to source the address. Came to the same conclusion, and it works. What really bothered me was to discover traffic sourced from one vlan interface in another vlan (especially because it is a DMZ setup). My problem was that I had defined gateway command on several client vlan interfaces. Is there a way to se the routing table of the CSM?
Rgds Peter

ACE initiating TCP RST causing probe failure

I have seen so many users reporting this issue on the Cisco Support Community and yet noone has posted any resolution or followup.
I am having the same issue and has to backout the cutover.
I see in the traces that the ACE is initiating RST.
Any solution to this or timers to tweek.
Thanks

Good afternoon,
There are multiple reasons why an ACE may reset a connection, so it's impossible to give a universal solution for it. I would recommend you to open a TAC service request to have your specific issue investigated further.
Regards
Daniel

ACE - TCP probe goes into INVALID state

Hello,
I have a problem with the following configuration of a sticky serverfarm with a backup serverfarm
(this setup is ofcourse used only for failover purposes, not loadbalancing):
probe tcp tcp-8888-probe
port 8888
interval 5
faildetect 2
passdetect interval 3
passdetect count 1
rserver host rsrv1
ip address 10.1.2.10
inservice
rserver host rsrv2
ip address 10.1.2.11
inservice
serverfarm host rfarm-primary
predictor leastconns
probe tcp-8888-probe
rserver rsrv1 8888
    inservice
serverfarm host rfarm-backup
predictor leastconns
probe tcp-8888-probe
rserver rsrv2 8888
   inservice
sticky http-cookie RFARM-COOKIE sticky-rfarm-1
cookie insert browser-expire
serverfarm rfarm-primary backup rfarm-backup
etc....
The problem is that every time probe state changes (from SUCCESS to FAIL or otherwise), the tcp-8888-probe on the server that changed
the state of service, goes into INVALID state:
#show probe tcp-8888-probe detail
probe       : tcp-8888-probe
type        : TCP
state       : ACTIVE
description :
   port      : 8888    address     : 0.0.0.0         addr type : -
   interval : 5       pass intvl : 3               pass count : 1
   fail count: 2       recv timeout: 10
   conn termination : GRACEFUL
   expect offset    : 0         , open timeout     : 10
   expect regex     : -
   send data        : -
                       --------------------- probe results --------------------
   probe association   probed-address probes     failed     passed     health
   ------------------- ---------------+----------+----------+----------+-------
   serverfarm : rfarm-backup
     real      : rsrv2[8888]
                       10.1.2.11    291        0          291        SUCCESS
   Socket state        : CLOSED
   No. Passed states   : 1         No. Failed states : 0
   No. Probes skipped : 0         Last status code : 0
   No. Out of Sockets : 0         No. Internal error: 0
   Last disconnect err : -
   Last probe time     : Thu Jun 17 22:12:31 2010
   Last fail time      : Never
   Last active time    : Thu Jun 17 21:48:21 2010
   serverfarm : rfarm-primary
     real      : rsrv1[8888]
                       10.1.2.10    0          0          0          INVALID
   Socket state        : CLOSED
   No. Passed states   : 0         No. Failed states : 0
   No. Probes skipped : 0         Last status code : 0
   No. Out of Sockets : 0         No. Internal error: 0
   Last disconnect err : -
   Last probe time     : Never
   Last fail time      : Never
   Last active time    : Never
I have managed to get the probe into FAIL state again for a moment by removing it from serverfarm, and then reapplying, but in a few seconds it goes again from FAIL to INVAILD state, and stays in this state regardless of avaliability of probed TCP port. Only when i'm reapplying it when the port is avaliable/up, it can stay in SUCCESS state, and work till the failure of service, when INVALID state reappears.
What can be the cause of such behavior ?
thanks,
WM

Hello,
It looks very similar to this bug: CSCsh74871
You may need to collect a #show tech-support and do the following:
-remove the serverfarm in question
-reboot the ace module under a maintenance window.
You may upgrade to a higher version since your version is kind of old.
Jorge

What is TCP Splicing on a CSM Vserver

I'm having an issue with a service on my CSM where the server log is showing "An error occurred receiving data from (10.129.53.250) over TCP/IP. This may
be due to a communications failure". That address is the CSM NAT Address. When I do a packet capture I see a good number of lost segments and retransmissions (TCP segment of a reassembled PDU) between the CSM and the server. When the CSM is removed from the equation and the server is directly accessed the issue goes away. We are not seeing issues with other VIP's. What is the TCP splicing feature and could it help with this issue? The manual has no real explaination of this feature. If this can't help does anyone have any other ideas?
Thanks,
Dave

Hi Carlsond,
Ip splicing/hijacking => is the process of a hacker that will predict a session number and use it to take over a legitimate session (usually TCP). The target station will not know that the peer has been changed.
TCP splicing is a technique to splice two TCP connections by segment translation, so that data relaying between the two connections can be run at near router speeds. This technique can be used to speed up layer-7 switching, web proxy and application firewall running in the user space.
TCP splicing is a technique to interconnection two separate TCP connections for fast data relay. A TCP splicer changes values in the IP and TCP headers: source and destination IP addresses, port numbers, sequence and acknowledgement numbers, and checksums.
TCP splicing has been commonly used for increasing the performance of serving web content through proxies. Web server architectures built using TCP splicing suffer from two limitations: all traffic between clients and servers typically passes through the proxy, thus making the proxy scalability and performance bottlenecks; and this architecture cannot tolerate proxy failures.
The CSM provides support for fragmented TCP packets. The TCP fragment feature only works with VIPs that have Level 4 policies defined and will not work for SYN packets or for Layer 7 policies. To support fragmented TCP packets, the CSM matches the TCP fragments to existing data flows or by matching the bridging VLAN ID. The CSM will not reassemble fragments for Layer 7 parsing. Because the CSM has a finite number of buffers and fragment ID buckets, packet resending is required when there are hash collisions.
When enabling TCP splicing, you must designate a virtual server as a Layer 7 device even when it does not have a Layer 7 policy. This option is only valid for the TCP protocol.
To configure TCP splicing, perform this task:
Step 1 Router(config-module-csm)# vserver virtserver-name
Purpose
Identifies the virtual server and enters the virtual server configuration mode.
Step 2 Router(config-slb-vserver)# vserver tcp-protect
Purpose
Designates the virtual server for TCP splicing2.
Step 3 Router(config-slb-vserver)# virtual 100.100.100.100 tcp any service tcp-termination
Purpose
Enables TCP splicing.
Kindly see the reference for this as follows:
http://www.cisco.com/en/US/docs/interfaces_modules/services_modules/csm/3.2/configuration/guide/mapolcy.html#wp1038073
Sachin Garg

ACE TCP probe

My costomers ask different tcp port probes for different applications. Is there such things - standard probe TCL? So every time, I just need to work on the stand TCL and apply it to the serverfarm. That way can avoid a long probe config for different ports.
Thank you in advance,
June Hu

Could the soultion be that the probe is configured to terminate a TCP connection by sending a RST, with the connection term command?
It seems that this makes the probe pass the health check.
Br
Geir

TCP Probe on ACE 4710

Hi,
I am trying to configure proble on ACE device and I have few queries on those:
1. I want to probe 10 different tcp ports for a serverfarm, is there any way i can give the range on probe ? if not and if i have to probe individual port and then configure in a serverfarm, how it would behave i.e. I want to fail the probe only when all the configured ports are failed.
2. I am trying to configure probe for a particular tcp port, but I suppose server is not sending RST to that port, so probe is failing. However if I try to telnet that port from any other location it is getting connected. How can I configure probe in that case for that port ?
Pls. suggest.
Thanks
Pawan

You will need to configure a probe for each port.
Add all the probes to the serverfarm.
Use the command "fail-on-all" under the serverfarm.
http://www.cisco.com/en/US/partner/docs/app_ntwk_services/data_center_app_services/ace_appliances/vA3_1_0/command/reference/servfarm.html#wp1106543
Gilles.

ACE Module - HTTP Probe failure

Hi,
I have configured the http probe with expect status 200 202, but the probe fails despite availability of the port on rserver.
I tried head/get method to see the return code, and it came back with HTTP1.1/302. How can I configure an http probe to understand HTTP 302 code as success return.
Thanks.

I changed the expect status value as below
probe http TEST-HTTP
interval 30
passdetect interval 10
request method head
expect status 302 302
The probe is still failing with the log message
Apr 20 2009 12:04:35 : %ACE-3-251010: Health probe failed for server 192.168.1.10 on port 80, received invalid status code
On 'show probe detail' it shows the last status code as 400 which means Bad Request
--------------------- probe results --------------------
probe association probed-address probes failed passed health
------------------- ---------------+----------+----------+----------+-------
serverfarm : TEST-APP
real : TEST-SERVER1[80]
192.168.1.10 27 27 0 FAILED
Socket state : CLOSED
No. Passed states : 0 No. Failed states : 1
No. Probes skipped : 0 Last status code : 400
No. Out of Sockets : 0 No. Internal error: 0
Last disconnect err : Received invalid status code
Last probe time : Mon Apr 20 12:05:33 2009
Last fail time : Mon Apr 20 12:00:53 2009
Last active time : Never
The http page is showing perfectly on the web browser. Also, using the http head/get tool, I can see that 302 is returned.
What could be the problem.
Regards.

ACE 4700 ssl probe failure

We recently updated one of our servers to a new SSL certificate using the 4096 bit cipher key. Now the ACE probe to that server fails.
We have SSL version set to any and SSL cipher set to any. Id there a problem with a ACE https probe not supporting cipher keyes longer then 1024 bit ?

Hello DLance,
The ACE supports ssl certs upto 2048bits..
If you refer to the following guide, there is mention of the 2048 limit:
http://www.cisco.com/en/US/partner/docs/app_ntwk_services/data_center_app_services/ace_appliances/vA4_1_0/configuration/ssl/guide/certkeys.html
HTH. Regards.

TCP Probe failure on CSM

Similar Messages

Maybe you are looking for