Function of health probe timers on CSM

Hi,
we use the following configuration on a csm to monitor a server farm and I'm wondering how exactly the probe timers work.
===
serverfarm sf
nat server
nat client natpool1
failaction purge
real name serv1
weight 1
inservice
real name serv2
weight 1
inservice
probe probe1
probe probe1 script
script LDAP_PROBE
interval 5
retries 2
receive 1
port 389
===
So in my eyes the probes are sent every 5 seconds. When a probe isn't answered within one second it's marked as failed. If two probes are failed (retries 2) the real server is marked as down.
Is this correct?
In a network trace I see a different behaviour: Probes are sent every 5 seconds. If a real server goes out-of-service I see a probe which is not answered and the next probe is sent after 10 seconds (I expected 5 seconds). 5 seconds later the real server is marked down in the switch log.
It would be fine if anybody could help me.
Best Regards,
Thorsten Steffen

Hi,
following the meaning of the parameters:
Router(config-slb-probe)#
interval seconds
Sets the interval between probes in seconds (from the end of the previous probe to the beginning of the next probe) when the server is healthy.
Range = 2-65535 seconds
Default = 120 seconds
Router(config-slb-probe)#
retries retry-count
Sets the number of failed probes that are allowed before marking the server as failed.
Range = 0-65535
Default = 3
Router(config-slb-probe)#
failed failed-interval
Sets the time between health checks when the server has been marked as failed. The time is in seconds.
Range = 2-65535
Default = 300 seconds
Router(config-slb-probe)# open
open-timeout
Sets the maximum time to wait for a TCP connection. This command is not used for any non-TCP health checks (ICMP or DNS1).
Range = 1-65535
Default = 10 seconds
There are two different timeout values: open and receive. The open timeout specifies how many seconds to wait for the connection to open (that is, how many seconds to wait for SYN ACK after sending SYN). The receive timeout specifies how many seconds to wait for data to be received (that is, how many seconds to wait for an HTTP reply after sending a GET/HHEAD request). Because TCP probes close as soon as they open without sending any data, the receive timeout is not used.
When sniffing, you should see a probe each 5 seconds. When a probe fails for the first time, a second probe should be send after 5 seconds. when this probe fails too, the server is put out of service.
That should be the behaviour you should see.
HTH,
Dario

Similar Messages

SSLM Health Probe?

I have a (2) 6509's, each with a CSM and SSLM. One CSM is active and both SSLM's are active. I load balance encrypted requests to the SSLM's.
The SSLM decrypts the incoming HTTPS requests and sends the request back to the CSM using HTTP (clear text). The CSM serverfarm then load balances the session to one of the web servers. Because the web server responds back in clear text, I have implemented a health probe to monitor the web page for a specific string of characters within the serverfarm. If a web page displays the page incorrectly, the probe fails for that server.
Now I have a new requirement, where I must re-encrypt the traffic (backend encryption) and send the requests to the server encrypted (HTTPS).
My question are:
1. Can I implement health probes on the SSLM?
2. Can I implement an effective health probe from the CSM so that I can still poll for a string of characters?
Thank you.

SSLM should only be probed with ICMP

CSM 4.2(5): Reoccuring failed health probes

Hi all
I've finally started to investigate an issue I have with our CSM setup. Several times a day I get the below syslog message from the 6500
10:49:11: %CSM_SLB-6-RSERVERSTATE: Module 4 server state changed: SLB-NETMGT: TCP health probe failed for server
Then a few seconds later
10:49:41: %CSM_SLB-6-RSERVERSTATE: Module 4 server state changed: SLB-NETMGT: TCP health probe re-activated server
I never seems to catch the event in action and can never verify if the real server is indeed failed or if this is only a probe timeout. I have both layer 2 and layer 3 server farms in operation and this problem occurs on all of my server farms a few times a day.
No pattern and I have no other indications of any problems. I have most of the probes set on 1 repeat and 30sec timeout. Increase the probe timeouts perhaps?
Regards
Fredrik

Those error messages are related to probing the CSM does when determining server health. For a TCP probe, this means that the CSM either gets a TCP RST from the server or it does not see a SYN-ACK coming from the server.

CSM health probe for server farm with multiple vservers

Is there a way to specify the vserver port that a health probe monitors when multiple vservers are configured for the same serverfarm? Let's say I have a serverfarm named farm1. farm1 services two ports www and https so two vservers vserver_www and vserver_https are configured and bound to farm1. I would like to enable http health probe on farm1 with the intention of only monitoring vserver_www http port but, instead, the health probe monitors both www and https and since a http probe on https fails it takes farm1 reals and both vservers vserver_www and vserver_https out-of-service. Is there a way to configure a health probe to monitor a specific port? Or, should I create two duplicate serverfarms farm1 bound to vserver_www and farm2 bound to vserver_https and only enable http health probe on farm1? Any other ideas welcomed.

Appreciate the feedback. I also found what I was looking for in configuration examples. To summarize I've borrowed the comment from the URL below:
# The port for the probe is inherited from the vservers.
# The port is necessary in this case, since the same farm
# is serving a vserver on port 80 and one on port 23.
# If the "port 80" parameter is removed, the HTTP probe
# will be sent out on both ports 80 and 23, thus failing
# on port 23 which does not serve HTTP requests.
http://www.cisco.com/univercd/cc/td/doc/product/lan/cat6000/mod_icn/csm/csm_4_2/config/cfgxpls.htm

CSM HTTPS or SSL Health Probe

We are currently using TCP probe for HTTPS webServer health checking. Is there a HTTPS or SSL probe available on CSM to send a url to detect if the HTTPS Apache WebServer is up or not?
Many Thx, Q.Xie

You can download the TCL script file from the same locstion as the CSM software.
In this TCL file you should find the following scripts
[root@linux-1 cisco]# cat /tftpboot/c6slb-apc.4-2-1.tcl | grep -i "name ="
#!name = CHECKPORT_STD_SCRIPT
#!name = ECHO_PROBE_SCRIPT
#!name = FINGER_PROBE_SCRIPT
#!name = FTP_PROBE_SCRIPT
#!name = HTTPCONTENT_PROBE
#!name = HTTPHEADER_PROBE
#!name = HTTPPROXY_PROBE
#!name = HTTP_PROBE_SCRIPT
#!name = IMAP_PROBE
#!name = LDAP_PROBE
#!name = MAIL_PROBE
#!name = POP3_PROBE
#!name = PROBENOTICE_PROBE
#!name = RTSP_PROBE
#!name = SSL_PROBE_SCRIPT
#!name = TFTP_PROBE
There is a SSL_PROBE_SCRIPT that will verify that the SSL server respond to a client SSL HELLO message.
It does not verify if you can send an HTTP request.
It only sends a HELLO as a client and wait for the server HELLO.
With the SSLM for the CSM, there might be a way to achieve HTTPS probe.
I never tried it, but the solution I see would be to create an HTTP probe on the CSM and direct to the SSLM which will do the encryption and forward it to the server.
Regards,
Gilles

CSM Health Probe source IP

Can anyone tell me what IP address health probes are sourced from on the CSM? I've got a simple ICMP health probe setup but I'm trying to figure out what the source of those probes will be.
Is it the Vlan IP or maybe the VIP or possibily the router interface IP?
Thanks,
Bob

this is the vlan ip.
Gilles.

Multiple health probes on CSM

We have a CSM blade in a 6509, IOS 12.2(18)SXF7, CSM softvare version 4.2(7);
We'd like to create a serverfarm, where servers are checked for several ports and only considered as working when all probes succeed.
Although Cisco docs state that there should be a possibility to associate multiple probes with a serverfarm, I haven't managed to do so.
Here's what I've tried:
probe PING icmp
interval 5
failed 10
receive 4
probe TCP-1234 tcp
interval 10
retries 2
failed 25
port 1234
real PROBE-TEST-R
address 1.2.3.4
serverfarm PROBE-TEST-SF
real name PROBE-TEST-R
health probe PING
health probe TCP-1234
but when trying to add the second probe, I get:
% You must first disassociate from probe PING.
Any ideas, how multiple probes could be implemented?

Configure them as probe under the serverfarm..not health probe.
serverfarm PROBE-TEST-SF
probe PING
probe TCP-1234
Gilles.

CSM HTTP Health Probe

Is there any way to configure an HTTP health probe that will test a web page and fail if it takes too long for the server to respond. I have attempted to do this (see below) but the "receive" parameter doesn't seem to help. We are currently having a problem where one of the web servers for whatever reason gets really slow, while the other works fine with about the same number of users, I'd like to fail the slow when this occurrs.
Here is my probe config:
probe HTTP-SERVERASP http
request method get url /server.asp
expect status 200 299
interval 5
failed 30
receive 5
Thanks...Jeff

Jeff,
receive seems to be the solution for what you need.
Did you verify how fast/slow the server is responding.
Currently you allow 5 sec for the response to come back and 3 consecutives must fail before the server is brought down, so if your server resond 1 time fast enough, the server stays up.
So, use a sniffer trace to verify the response time.
Send me the trace if you want.
Gilles.

ACE: probe timers

Hi,
I've general question about ACE probe timers. I've following probe setup:
probe https probe:1061
port 1061
interval 34
passdetect interval 17
open 1
ACE# sh probe probe:1061detail
probe       : probe:1061
type        : HTTPS
state       : ACTIVE
description :
   port      : 1061   address     : 0.0.0.0         addr type : -
   interval : 34      pass intvl : 17              pass count : 3
   fail count: 3       recv timeout: 10
===
for above probe: when ACE will declare the server as down? will it declare it down after (17*3+34) 85 seconds or it will declare it down after 115 seconds (added recv timeout=secs 3 times = 30 seconds).
please help.
========
we did a test and bought down the server manually. ACE declared the server down after 91 seconds (from the time when server was brought down).

Hi Gavin, Krishna,
The explanation for all these parameters can be found in the health monitoring section of the configuration guide (
http://www.cisco.com/en/US/docs/interfaces_modules/services_modules/ace/vA2_3_0/configuration/slb/guide/probe.html#wp1031040)
Below are the definitions quoted from the guide:
Interval:
The time interval between probes is the frequency that the ACE sends probes to a server marked as passed. You can change the time interval between probes by using the interval command
Faildetect:
Before the ACE marks a server as failed, it must detect that probes have failed a consecutive number of times. By default, when three consecutive probes have failed, the ACE marks the server as failed. You can configure this number of failed probes by using the faildetect command
Passdetect interval/count:
To configure the time interval after which the ACE sends a probe to a failed server and the number of consecutive successful probes required to mark the server as passed, use the passdetect command.
So, to summarize, taking Gavin's configuration as example. A server failure would be detected in a time between 78 seconds (2x34 +10) and 112 (3x34 +10). Once it's down, it will become operational between 34 (2x17) and 51 (3x17) seconds after it comes back up.
I hope this helps
Daniel

TCP Probe failure on CSM

I have had a customer raise an issue with me. Unfortunately I am not too hot on the CSM.
Switch Type - Cisco WS-C6513
IOS VSN - 12.2(18)SXF4
CSM Module details
Card Type - SLB Application Processor Module
Model - WS-X6066-SLB-APC
Hardware VSN - 1.8
Software VSN - 4.2(5)
I have spotted bug CSCsc38892, but cannot tell if the catakyst is running VRF - this is all the info I have been given.
We have 6500 chassis each with a CSM module in fault tolerance (ft) mode.
The current standby module (at the time it was the active one) reports as per output below that probes failed:
sh mod csm 1 real sf SAPCCP
real server farm weight state conns/hits
10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
Probe failed could be for a number of reasons however the inconsistency is that the failed probes follow the CSM module and that the other module has working probes.
It is to be noted that the command âping module csm 1 10.xxx.xxx.xxxâ reports to be reachable as a good indicator that there is connectivity from the CSM to the server.
The following is the output from the currently active CSM (which used to be the standby one):
sh mod csm 1 real sf SAPCCP
real server farm weight state conns/hits
10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 51
10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 41
10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 52
For info:
FUNCTIONING CSM
sh mod 1
Mod Ports Card Type Model Serial No.
1 4 SLB Application Processor Complex WS-X6066-SLB-APC SAD094803UC
Mod MAC addresses Hw Fw Sw Status
1 0015.f998.a94a to 0015.f998.a951 1.8 4.2(5) Ok
Mod Online Diag Status
1 Pass
relevant config lines
serverfarm SAPCCP
nat server
no nat client
failaction purge
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
probe ABAP
vserver SAPCCP-VIP
virtual 10.xxx.xxx.xxx tcp www
vlan xxx
serverfarm SAPCCP
sticky 15 group 117
replicate csrp sticky
replicate csrp connection
no persistent rebalance
parse-length 4000
inservice
probe ABAP tcp
interval 2
retries 2
failed 6
open 3
port 8000

Continued:
NON-FUNCTIONING CSM
sh mod 1
Mod Ports Card Type Model Serial No.
1 4 SLB Application Processor Complex WS-X6066-SLB-APC SAD094609ZZ
Mod MAC addresses Hw Fw Sw Status
1 0015.f998.8386 to 0015.f998.838d 1.8 4.2(5) Ok
Mod Online Diag Status
1 Pass
relevant config lines
serverfarm SAPCCP
nat server
no nat client
failaction purge
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
real 10.xxx.xxx.xxx 8000
inservice
probe ABAP-CCP
vserver SAPCCP-VIP
virtual 10xxx.xxx.xxx tcp www
vlan xxx
serverfarm SAPCCP
sticky 15 group 117
replicate csrp sticky
replicate csrp connection
no persistent rebalance
parse-length 4000
inservice
probe ABAP-CCP tcp
recover 2
interval 6
retries 2
failed 6
open 5
port 8000
Thanks for any pointers on where to look,
Paul.

ACE http health probes - best practice for interval and passdetect interval?

Hi,
Is there a recommended standard for http health probes in terms of interval and passdetect interval timings, i.e. should the passdetect interval always be less than the interval or visa versa? Can a http probe be 'mis-configured', i.e. return a 'false positive' by configuring an interval timeout thats 'incompatible' with the device it's polling?
I have a http probe for a serverfarm consisting of two Apache http servers and get intermittent 'server reply timeout' probe failures. I'm keen to ensure that the configuration of the probe isn't at fault so I can be confident that a failed probe indicates a problem with the server and not my configuration.
The probe is currently configured as below:-
probe http http-apache
interval 30
passdetect interval 15
passdetect count 6
request method get url /cs/images/ACE.html
expect status 200 304
Any advice on the subject woud be gratefully received.
thanks
Matthew

Hi Gilles,
Thanks for the advice. In another dicussion (found here https://supportforums.cisco.com/message/462397#462397) a poster has stated that:-
"(The) "Probe interval" should always be less then (open+recieve) timeout value. Default open & receive timeouts are 10 seconds."
Are you able to advise on whether the above is correct and if so, why? I currently have an interval value of 30 that obviously goes against the advice above (which I've interpretted to mean that if you leave the open & receive timeouts at their default settings your probe interval should be less than 20 seconds?).
thanks
Matthew

ACE failing server out using TCP health probe

We have a mix of ACE20s and ACE30s currently and I am seeing the ACE in both HW platforms failing out our servers sporadically after a sucessful TCP handshake. Here is the configuration:
probe tcp TCP-25
   port 25
   interval 25
   faildetect 2
   passdetect interval 90
   open 10
When I do a show probe TCP-25 detail I see the default recv timeout is 10.
I captured a trace between the ACE and the server. When the health probes pass I see a good 3 way TCP handshake, then 50ms later the server sends a SMTP 220 then ace from ace, fin ack from ace and graceful TCP termination occurs. When the probe fails I see a sucessful TCP handshake but the ACE sends FIN ACK 47ms after it sends ACK for the TCP connection. Server then sends ACK and ACE sends RST.
Shouldn't ACE wait 10 seconds in this example for server to respond after TCP handshake?

TAC/Martin Nash was very helpful in explaining this. The TCP 3 way handshake was sucessful, but the ACE sent a FIN ACK as expected, but after the server sent an ACK the server did not send a FIN ACK so the ACE marked it down. The health check not only requires a 3 way handshake, but a clean teardown of the TCP session.

Configuring Health Probe for Server Farm

If I have a server farm with real servers listening on port 8888 and I apply an HTTP-type health probe with no port number specified, will the ACE know to probe the servers at 8888 or will it try to probe port 80?

Hi,
Yes it should inherit the port from the real servers defined in the serverfarm. This gives you the flexibility to associate same probe with different serverfarms probing different servers on different ports. This is probe port inheritance feature which is there in ACE.
Regards,
Kanwal

Health probe for RDP farm

I have an RDP server farm that lost a disk. The RDP service was still running but users were unable to log in. I'd like to create a health probe that does maybe a combination of TCP probe for port 3389 and something that can determine if the drive that stores user profiles is available.
I cannot add any new service (http or ftp) to the server.
Can anyone think of another way to do this? Is there any way I can check SNMP mibs on the windows server or maybe WMI through TCL?
Thanks.

Can you drop me a mail offline ([email protected]) and I can share what I have. Matthew

ACE Health probe using get URL

Hello,
We are trying to create a health probe for our google search appliances and as part of the URL get there is a question mark but the ACE doesn't like that. Is there a way around this or should it be done differently?
request method get url /searchq? (This is what we want the URL to be)
request method get url /searchq (This is where it thinks i'm asking it for help)
Thanks in Advance.

Hello,
You need to typ CRTL+v prior to entering the ?
That's the Control key then lowercase v, then your question mark.
Hope this helps,
Sean

Function of health probe timers on CSM

Similar Messages

Maybe you are looking for