[f-nsp] stateless load balancing unbalanced after a real server goes down, and comes back up

Mon Aug 11 17:41:15 EDT 2014

Hi,

I have an older Foundry ServerIron WSM6 (with switch firmware) which I'm 
using to balance http and https traffic to 5 web servers.  The problem 
is that when one of the web servers goes down briefly, the ServerIron 
stops sending traffic to it, and doesn't resume when the web server is 
back online.  I have to resort to issuing the 'clear server 
slb-stateless-hash-table' command.

What's strange is that the SI does seem to acknowledge that the 
web-server is back up, at least according to the logs.  Here's a snippet 
of log:

Dynamic Log Buffer (50 lines):
Aug  8 15:34:33:N:L4 server <ip> hostname port 80 is up
Aug  8 15:34:32:N:L4 server <ip> hostname port 443 is up
Aug  8 15:32:38:I:Security: SSH login by root from src IP <ip>, src MAC 
0024.388e.d1f1 to USER EXEC mode
Aug  8 15:31:53:N:L4 server <ip> hostname port 443 is down due to 
healthcheck
Aug  8 15:31:04:N:L4 server <ip> hostname port 80 is down due to 
healthcheck

As you can see, it went down at 15:31, and back up 3 minutes later, but 
traffic is still not sent to this server.
Is there a configuration setting I'm missing, that would fix this 
behavior?  Any way to tell the SI to clear the hash-table whenever a 
real server comes back up?

I've been considering writing a cron-job, that would log into the SI and 
issue the 'clear server slb-stateless-hash-table' command, once every 
hour or so.  However, I'm not sure what the consequences would be of 
clearing the hash table, while one of the real servers is actually down. 
Would it force the SI to send traffic to a non-responding server, thus 
losing part of the traffic?  Or does it calculate the hash-table based 
on the real servers that are online at that moment?

I appreciate any help, or insight that you guys could offer.

Thanks,
Edward.