[c-nsp] Re-thinking (remembering) how a switch operates

Greg Schwimer gschwimer at godaddy.com
Wed Apr 27 23:20:30 EDT 2005


I've seen this scenario from time to time - hosts in the same VLAN on
different switches see a stream of traffic destined for a host on a
different switch.  I've seen it occur for a number of reasons, but
generally have been able to blame this behaviour on Windows Load
Balancing Services (WLBS).  The product works by "floating" a virtual
IP amongst several Windows servers, which communicate amongst
themselves to determine which should respond to a given request.  The
problem is, because this MAC is constantly floating around from switch
to switch within the VLAN (assuming the WLBS servers are distributed
somewhat), the switches can never build a proper MAC entry.  This can
have some pretty serious consequences.

For example, I'd seen it once where someone decided to "load test" a
WLBS cluster of servers on a production network.  The resultant inflow
of traffic to the VIP was unicast flooded to all ports within the VLAN,
thereby creating a denial of service.  Not pretty.  My suggestion is
that if you have WLBS or anything like it in your network, get rid of
it fast.

I'd expect traffic to your syslog server could cause the same problems. 
As you mentioned, the problem could be solved by having a monitoring
system ping the syslog server every minute or so to keep the mac table
in check.

I agree on the oddity of MAC table entry timeouts vs. ARP timeouts.  It
seems they should at least be somewhat similar.  I suspect problems
might arise in the case of bonded NICs connected to two different
switches, and in cases where the hosts are moved from one switch to
another, etc.   I've not tried testing any of this.  YMMV.  I'd be glad
to hear of any successes or failures.


> -------- Original Message --------
> Subject: [c-nsp] Re-thinking (remembering) how a switch operates
> From: "Jeff Kell" <jeff-kell at utc.edu>
> Date: Wed, April 27, 2005 7:21 pm
> To: cisco-nsp at puck.nether.net
> 
> I had a most "enlightening" discovery today of a rather serious traffic
> leak that has gone by unnoticed for, uh, well, an embarassingly long
> time.  After discovering the underlying reason, I thought this just
> might rattle a few of your heads as it did mine :-)
> 
> One of our engineers was investigating very erratic ssh response times
> on one of our central (network administration) servers.  This server
> resides in our management vlan, which reaches far and wide across campus
> to isolate our management traffic and access.
> 
> He reported to me seeing a very high stream of UDP traffic, which was
> soon identified as syslog traffic to our logging servers.  I thought he
> was nuts, the traffic was unicast and several hops (switches) away from
> either the source or destination of the syslog traffic.  So I cranked up
> ethereal on one of my boxes, and lo and behold, syslog traffic, LOTS of
> it.  Proper source and destination IPs, proper ports, verified MAC
> addresses; what the heck?
> 
> Ran back to our corner of the server farm, to our KVM switch, checked
> the logging servers, core switches, everything is up and healthy.
> Checked the routers, and the syslog server has a proper ARP entry.
> Everybody involved has the correct ARP entry.
> 
> To make a long story short, I started checking mac-address-tables (and
> cam on the lone CatOS Catalyst in the mix).  NOBODY has a mac entry!
> 
> The syslog server just sits and logs traffic.  As a general rule, it
> never transmits anything.  The switches, therefore, only very rarely see
> it's mac as a source address, so they never learn the mac entry.  So we
> go back to basic switch operation:  when they are sent a packet with a
> destination MAC of the syslog server, they don't know where it is, so
> they broadcast it out every port on the vlan (and trunk containing the
> vlan), and for the management vlan, that's a lot of noise to broadcast
> it over the whole vlan.  And a lot of traffic -- 5-10 gigs/day.
> 
> As a workaround, I added static mac table entries for the server, and
> the problem went away.  And the traffic graphs for uplink trunks across
> campus took a rather pleasing dip (not that it was all that significant
> in the big picture, but it was a lot of unnecessary "noise" that was
> previously going everywhere).
> 
> Now afterwards, it has me thinking philosophically about the relatively
> short default mac-address table aging time (300 secs is default in IOS
> and CatOS, IIRC) versus the relatively long ARP cache timeout (which is
> what, 400 minutes?  it's a real long time relative to mac-address
> aging).  Having the ARP cache saves you from having to do frequent ARPs,
> but if you *did* ARP a little more frequently, it would keep the
> mac-address tables loaded up when the answer was returned.  And if the
> device is down, but still in the ARP cache, anything sent to the device
> will be sent (layer-3) and broadcast (layer-2 due to the switches).
> 
> And adding the syslog servers to our polling list would at least
> generate a periodic response from the devices and refresh the mac table.
> 
> Well, enough for now.  Food for thought.
> 
> Jeff
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/



More information about the cisco-nsp mailing list