Function of health probe timers on CSM

Hi,
we use the following configuration on a csm to monitor a server farm and I'm wondering how exactly the probe timers work.
===
serverfarm sf
nat server
nat client natpool1
failaction purge
real name serv1
weight 1
inservice
real name serv2
weight 1
inservice
probe probe1
probe probe1 script
script LDAP_PROBE
interval 5
retries 2
receive 1
port 389
===
So in my eyes the probes are sent every 5 seconds. When a probe isn't answered within one second it's marked as failed. If two probes are failed (retries 2) the real server is marked as down.
Is this correct?
In a network trace I see a different behaviour: Probes are sent every 5 seconds. If a real server goes out-of-service I see a probe which is not answered and the next probe is sent after 10 seconds (I expected 5 seconds). 5 seconds later the real server is marked down in the switch log.
It would be fine if anybody could help me.
Best Regards,
Thorsten Steffen

Hi,
following the meaning of the parameters:
Router(config-slb-probe)#
interval seconds
Sets the interval between probes in seconds (from the end of the previous probe to the beginning of the next probe) when the server is healthy.
Range = 2-65535 seconds
Default = 120 seconds
Router(config-slb-probe)#
retries retry-count
Sets the number of failed probes that are allowed before marking the server as failed.
Range = 0-65535
Default = 3
Router(config-slb-probe)#
failed failed-interval
Sets the time between health checks when the server has been marked as failed. The time is in seconds.
Range = 2-65535
Default = 300 seconds
Router(config-slb-probe)# open
open-timeout
Sets the maximum time to wait for a TCP connection. This command is not used for any non-TCP health checks (ICMP or DNS1).
Range = 1-65535
Default = 10 seconds
There are two different timeout values: open and receive. The open timeout specifies how many seconds to wait for the connection to open (that is, how many seconds to wait for SYN ACK after sending SYN). The receive timeout specifies how many seconds to wait for data to be received (that is, how many seconds to wait for an HTTP reply after sending a GET/HHEAD request). Because TCP probes close as soon as they open without sending any data, the receive timeout is not used.
When sniffing, you should see a probe each 5 seconds. When a probe fails for the first time, a second probe should be send after 5 seconds. when this probe fails too, the server is put out of service.
That should be the behaviour you should see.
HTH,
Dario

Similar Messages

  • SSLM Health Probe?

    I have a (2) 6509's, each with a CSM and SSLM. One CSM is active and both SSLM's are active. I load balance encrypted requests to the SSLM's.
    The SSLM decrypts the incoming HTTPS requests and sends the request back to the CSM using HTTP (clear text). The CSM serverfarm then load balances the session to one of the web servers. Because the web server responds back in clear text, I have implemented a health probe to monitor the web page for a specific string of characters within the serverfarm. If a web page displays the page incorrectly, the probe fails for that server.
    Now I have a new requirement, where I must re-encrypt the traffic (backend encryption) and send the requests to the server encrypted (HTTPS).
    My question are:
    1. Can I implement health probes on the SSLM?
    2. Can I implement an effective health probe from the CSM so that I can still poll for a string of characters?
    Thank you.

    SSLM should only be probed with ICMP

  • CSM 4.2(5): Reoccuring failed health probes

    Hi all
    I've finally started to investigate an issue I have with our CSM setup. Several times a day I get the below syslog message from the 6500
    10:49:11: %CSM_SLB-6-RSERVERSTATE: Module 4 server state changed: SLB-NETMGT: TCP health probe failed for server
    Then a few seconds later
    10:49:41: %CSM_SLB-6-RSERVERSTATE: Module 4 server state changed: SLB-NETMGT: TCP health probe re-activated server
    I never seems to catch the event in action and can never verify if the real server is indeed failed or if this is only a probe timeout. I have both layer 2 and layer 3 server farms in operation and this problem occurs on all of my server farms a few times a day.
    No pattern and I have no other indications of any problems. I have most of the probes set on 1 repeat and 30sec timeout. Increase the probe timeouts perhaps?
    Regards
    Fredrik

    Those error messages are related to probing the CSM does when determining server health. For a TCP probe, this means that the CSM either gets a TCP RST from the server or it does not see a SYN-ACK coming from the server.

  • CSM health probe for server farm with multiple vservers

    Is there a way to specify the vserver port that a health probe monitors when multiple vservers are configured for the same serverfarm? Let's say I have a serverfarm named farm1. farm1 services two ports www and https so two vservers vserver_www and vserver_https are configured and bound to farm1. I would like to enable http health probe on farm1 with the intention of only monitoring vserver_www http port but, instead, the health probe monitors both www and https and since a http probe on https fails it takes farm1 reals and both vservers vserver_www and vserver_https out-of-service. Is there a way to configure a health probe to monitor a specific port? Or, should I create two duplicate serverfarms farm1 bound to vserver_www and farm2 bound to vserver_https and only enable http health probe on farm1? Any other ideas welcomed.

    Appreciate the feedback. I also found what I was looking for in configuration examples. To summarize I've borrowed the comment from the URL below:
    # The port for the probe is inherited from the vservers.
    # The port is necessary in this case, since the same farm
    # is serving a vserver on port 80 and one on port 23.
    # If the "port 80" parameter is removed, the HTTP probe
    # will be sent out on both ports 80 and 23, thus failing
    # on port 23 which does not serve HTTP requests.
    http://www.cisco.com/univercd/cc/td/doc/product/lan/cat6000/mod_icn/csm/csm_4_2/config/cfgxpls.htm

  • CSM HTTPS or SSL Health Probe

    We are currently using TCP probe for HTTPS webServer health checking. Is there a HTTPS or SSL probe available on CSM to send a url to detect if the HTTPS Apache WebServer is up or not?
    Many Thx, Q.Xie

    You can download the TCL script file from the same locstion as the CSM software.
    In this TCL file you should find the following scripts
    [root@linux-1 cisco]# cat /tftpboot/c6slb-apc.4-2-1.tcl | grep -i "name ="
    #!name = CHECKPORT_STD_SCRIPT
    #!name = ECHO_PROBE_SCRIPT
    #!name = FINGER_PROBE_SCRIPT
    #!name = FTP_PROBE_SCRIPT
    #!name = HTTPCONTENT_PROBE
    #!name = HTTPHEADER_PROBE
    #!name = HTTPPROXY_PROBE
    #!name = HTTP_PROBE_SCRIPT
    #!name = IMAP_PROBE
    #!name = LDAP_PROBE
    #!name = MAIL_PROBE
    #!name = POP3_PROBE
    #!name = PROBENOTICE_PROBE
    #!name = RTSP_PROBE
    #!name = SSL_PROBE_SCRIPT
    #!name = TFTP_PROBE
    There is a SSL_PROBE_SCRIPT that will verify that the SSL server respond to a client SSL HELLO message.
    It does not verify if you can send an HTTP request.
    It only sends a HELLO as a client and wait for the server HELLO.
    With the SSLM for the CSM, there might be a way to achieve HTTPS probe.
    I never tried it, but the solution I see would be to create an HTTP probe on the CSM and direct to the SSLM which will do the encryption and forward it to the server.
    Regards,
    Gilles

  • CSM Health Probe source IP

    Can anyone tell me what IP address health probes are sourced from on the CSM? I've got a simple ICMP health probe setup but I'm trying to figure out what the source of those probes will be.
    Is it the Vlan IP or maybe the VIP or possibily the router interface IP?
    Thanks,
    Bob

    this is the vlan ip.
    Gilles.

  • Multiple health probes on CSM

    We have a CSM blade in a 6509, IOS 12.2(18)SXF7, CSM softvare version 4.2(7);
    We'd like to create a serverfarm, where servers are checked for several ports and only considered as working when all probes succeed.
    Although Cisco docs state that there should be a possibility to associate multiple probes with a serverfarm, I haven't managed to do so.
    Here's what I've tried:
    probe PING icmp
    interval 5
    failed 10
    receive 4
    probe TCP-1234 tcp
    interval 10
    retries 2
    failed 25
    port 1234
    real PROBE-TEST-R
    address 1.2.3.4
    serverfarm PROBE-TEST-SF
    real name PROBE-TEST-R
      health probe PING
      health probe TCP-1234
    but when trying to add the second probe, I get:
    % You must first disassociate from probe PING.
    Any ideas, how multiple probes could be implemented?

    Configure them as probe under the serverfarm..not health probe.
    serverfarm PROBE-TEST-SF
       probe PING
      probe  TCP-1234
    Gilles.

  • CSM HTTP Health Probe

    Is there any way to configure an HTTP health probe that will test a web page and fail if it takes too long for the server to respond. I have attempted to do this (see below) but the "receive" parameter doesn't seem to help. We are currently having a problem where one of the web servers for whatever reason gets really slow, while the other works fine with about the same number of users, I'd like to fail the slow when this occurrs.
    Here is my probe config:
    probe HTTP-SERVERASP http
    request method get url /server.asp
    expect status 200 299
    interval 5
    failed 30
    receive 5
    Thanks...Jeff

    Jeff,
    receive seems to be the solution for what you need.
    Did you verify how fast/slow the server is responding.
    Currently you allow 5 sec for the response to come back and 3 consecutives must fail before the server is brought down, so if your server resond 1 time fast enough, the server stays up.
    So, use a sniffer trace to verify the response time.
    Send me the trace if you want.
    Gilles.

  • ACE: probe timers

    Hi,
    I've general question about ACE probe timers. I've following probe setup:
    probe https probe:1061
      port 1061
      interval 34
      passdetect interval 17
      open 1
    ACE# sh probe probe:1061detail
    probe       : probe:1061
    type        : HTTPS
    state       : ACTIVE
    description :
       port      : 1061   address     : 0.0.0.0         addr type  : -
       interval  : 34      pass intvl  : 17              pass count : 3
       fail count: 3       recv timeout: 10
    ===
    for above probe: when ACE will declare the server as down? will it declare it down after (17*3+34) 85 seconds or it will declare it down after 115 seconds (added recv timeout=secs 3 times = 30 seconds).
    please help.
    ========
    we did a test and bought down the server manually. ACE declared the server down after 91 seconds (from the time when server was brought down).

    Hi Gavin, Krishna,
    The explanation for all these parameters can be found in the health monitoring section of the configuration guide (
    http://www.cisco.com/en/US/docs/interfaces_modules/services_modules/ace/vA2_3_0/configuration/slb/guide/probe.html#wp1031040)
    Below are the definitions quoted from the guide:
    Interval:
    The time interval between probes is the frequency  that the ACE sends probes to a server marked as passed. You can change  the time interval between probes by using the interval command
    Faildetect:
    Before the ACE marks a server as failed, it must  detect that probes have failed a consecutive number of times. By  default, when three consecutive probes have failed, the ACE marks the  server as failed. You can configure this number of failed probes by  using the faildetect command
    Passdetect interval/count:
    To configure the time interval after which the ACE  sends a probe to a failed server and the number of consecutive  successful probes required to mark the server as passed, use the passdetect command.
    So, to summarize, taking Gavin's configuration as example. A server failure would be detected in a time between 78 seconds (2x34 +10) and 112 (3x34 +10). Once it's down, it will become operational between 34 (2x17) and 51 (3x17) seconds after it comes back up.
    I hope this helps
    Daniel

  • TCP Probe failure on CSM

    I have had a customer raise an issue with me. Unfortunately I am not too hot on the CSM.
    Switch Type - Cisco WS-C6513
    IOS VSN - 12.2(18)SXF4
    CSM Module details
    Card Type - SLB Application Processor Module
    Model - WS-X6066-SLB-APC
    Hardware VSN - 1.8
    Software VSN - 4.2(5)
    I have spotted bug CSCsc38892, but cannot tell if the catakyst is running VRF - this is all the info I have been given.
    We have 6500 chassis each with a CSM module in fault tolerance (ft) mode.
    The current standby module (at the time it was the active one) reports as per output below that probes failed:
    sh mod csm 1 real sf SAPCCP
    real server farm weight state conns/hits
    10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
    10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
    10.xxx.xxx.xxx:8000 SAPCCP 8 PROBE_FAILED 0
    Probe failed could be for a number of reasons however the inconsistency is that the failed probes follow the CSM module and that the other module has working probes.
    It is to be noted that the command “ping module csm 1 10.xxx.xxx.xxx” reports to be reachable as a good indicator that there is connectivity from the CSM to the server.
    The following is the output from the currently active CSM (which used to be the standby one):
    sh mod csm 1 real sf SAPCCP
    real server farm weight state conns/hits
    10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 51
    10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 41
    10.xxx.xxx.xxx:8000 SAPCCP 8 OPERATIONAL 52
    For info:
    FUNCTIONING CSM
    sh mod 1
    Mod Ports Card Type Model Serial No.
    1 4 SLB Application Processor Complex WS-X6066-SLB-APC SAD094803UC
    Mod MAC addresses Hw Fw Sw Status
    1 0015.f998.a94a to 0015.f998.a951 1.8 4.2(5) Ok
    Mod Online Diag Status
    1 Pass
    relevant config lines
    serverfarm SAPCCP
    nat server
    no nat client
    failaction purge
    real 10.xxx.xxx.xxx 8000
    inservice
    real 10.xxx.xxx.xxx 8000
    inservice
    real 10.xxx.xxx.xxx 8000
    inservice
    probe ABAP
    vserver SAPCCP-VIP
    virtual 10.xxx.xxx.xxx tcp www
    vlan xxx
    serverfarm SAPCCP
    sticky 15 group 117
    replicate csrp sticky
    replicate csrp connection
    no persistent rebalance
    parse-length 4000
    inservice
    probe ABAP tcp
    interval 2
    retries 2
    failed 6
    open 3
    port 8000

    Continued:
    NON-FUNCTIONING CSM
    sh mod 1
    Mod Ports Card Type Model Serial No.
    1 4 SLB Application Processor Complex WS-X6066-SLB-APC SAD094609ZZ
    Mod MAC addresses Hw Fw Sw Status
    1 0015.f998.8386 to 0015.f998.838d 1.8 4.2(5) Ok
    Mod Online Diag Status
    1 Pass
    relevant config lines
    serverfarm SAPCCP
    nat server
    no nat client
    failaction purge
    real 10.xxx.xxx.xxx 8000
    inservice
    real 10.xxx.xxx.xxx 8000
    inservice
    real 10.xxx.xxx.xxx 8000
    inservice
    probe ABAP-CCP
    vserver SAPCCP-VIP
    virtual 10xxx.xxx.xxx tcp www
    vlan xxx
    serverfarm SAPCCP
    sticky 15 group 117
    replicate csrp sticky
    replicate csrp connection
    no persistent rebalance
    parse-length 4000
    inservice
    probe ABAP-CCP tcp
    recover 2
    interval 6
    retries 2
    failed 6
    open 5
    port 8000
    Thanks for any pointers on where to look,
    Paul.

  • ACE http health probes - best practice for interval and passdetect interval?

    Hi,
    Is there a recommended standard for http health probes in terms of interval and passdetect interval timings, i.e. should the passdetect interval always be less than the interval or visa versa? Can a http probe be 'mis-configured', i.e. return a 'false positive' by configuring an interval timeout thats 'incompatible' with the device it's polling?
    I have a http probe for a serverfarm consisting of two Apache http servers and get intermittent 'server reply timeout' probe failures. I'm keen to ensure that the configuration of the probe isn't at fault so I can be confident that a failed probe indicates a problem with the server and not my configuration.
    The probe is currently configured as below:-
    probe http http-apache
      interval 30
      passdetect interval 15
      passdetect count 6
      request method get url /cs/images/ACE.html
      expect status 200 304
    Any advice on the subject woud be gratefully received.
    thanks
    Matthew

    Hi Gilles,
    Thanks for the advice. In another dicussion (found here https://supportforums.cisco.com/message/462397#462397) a poster has stated that:-
    "(The) "Probe interval" should always be less then (open+recieve) timeout  value. Default open & receive timeouts are 10 seconds."
    Are you able to advise on whether the above is correct and if so, why? I currently have an interval value of 30 that obviously goes against the advice above (which I've interpretted to mean that if you leave the open & receive timeouts at their default settings your probe interval should be less than 20 seconds?).
    thanks
    Matthew

  • ACE failing server out using TCP health probe

    We have a mix of ACE20s and ACE30s currently and I am seeing the ACE in both HW platforms failing out our servers sporadically after a sucessful TCP handshake.  Here is the configuration:
    probe tcp TCP-25
       port 25
       interval 25
       faildetect 2
       passdetect interval 90
       open 10
    When I do a show probe TCP-25 detail I see the default recv timeout is 10.
    I captured a trace between the ACE and the server.  When the health probes pass I see a good 3 way TCP handshake, then 50ms later the server sends a SMTP 220 then ace from ace, fin ack from ace and graceful TCP termination occurs.  When the probe fails I see a sucessful TCP handshake but the ACE sends FIN ACK 47ms after it sends ACK for the TCP connection.  Server then sends ACK and ACE sends RST.
    Shouldn't ACE wait 10 seconds in this example for server to respond after TCP handshake?

    TAC/Martin Nash was very helpful in explaining this.  The TCP 3 way handshake was sucessful, but the ACE sent a FIN ACK as expected, but after the server sent an ACK the server did not send a FIN ACK so the ACE marked it down.  The health check not only requires a 3 way handshake, but a clean teardown of the TCP session.

  • Configuring Health Probe for Server Farm

    If I have a server farm with real servers listening on port 8888 and I apply an HTTP-type health probe with no port number specified, will the ACE know to probe the servers at 8888 or will it try to probe port 80?

    Hi,
    Yes it should inherit the port from the real servers defined in the serverfarm. This gives you the flexibility to associate same probe with different serverfarms probing different servers on different ports. This is probe port inheritance feature which is there in ACE.
    Regards,
    Kanwal

  • Health probe for RDP farm

    I have an RDP server farm that lost a disk. The RDP service was still running but users were unable to log in. I'd like to create a health probe that does maybe a combination of TCP probe for port 3389 and something that can determine if the drive that stores user profiles is available.
    I cannot add any new service (http or ftp) to the server.
    Can anyone think of another way to do this? Is there any way I can check SNMP mibs on the windows server or maybe WMI through TCL?
    Thanks.

    Can you drop me a mail offline ([email protected]) and I can share what I have. Matthew

  • ACE Health probe using get URL

    Hello,
    We are trying to create a health probe for our google search appliances and as part of the URL get there is a question mark but the ACE doesn't like that.  Is there a way around this or should it be done differently?
    request method get url /searchq? (This is what we want the URL to be)
    request method get url /searchq (This is where it thinks i'm asking it for help)
    Thanks in Advance.

    Hello,
    You need to typ CRTL+v prior to entering the ?
    That's the Control key then lowercase v, then your question mark.
    Hope this helps,
    Sean

Maybe you are looking for

  • Spry menus are corrupt in IE Explorer, not in Firefox?

    Here is the page... http://www.honeydewfavors.com/store/pc/home.asp Notice in Firefox, everything loads fine. But in IE, I get an 'error on page': quote: Webpage Script Errors User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0

  • Transferring a photo album from IPad onto a Macbook Pro?

    I've created an album of holiday photos on my IPad. I would like to transfer the whole album to my Macbook Pro, at present only the photos are there (via photo stream). Could anyone tell me how to get the album from my Ipad onto my Mac please?

  • NSD config question

    Hi all, We have an environment where we need to setup something like this: All new tickets need to be assigned to a generic user (Ticket Pool) Tickets are then assigned (manually) by any tech/manager/supervisor to an appropriate available technician.

  • I cannot update my Apple ID in my Icloud for my Iphone 5.

    I have changed my Apple ID and email address online and it is working for my app store and my itunes. I have done the email verification and everything and all is working. My issue is that the old Apple ID and password still shows in my "settings, Ic

  • SQL Query (Migrating report from SMS 2003 and SCCM 2012R2)

    Hello everyone, I solicit you because I have to work on migrating reports from SMS 2003 and SCCM 2012 R2. The following SQL query causes me some problems: SELECT DISTINCT SYS.Netbios_Name0 AS Computer, (SELECT SUM(ProcAddtl.NumberOfLogicalProcessors0