[c-nsp] Switches/nodes drop off the network

Wed Jan 10 19:19:29 EST 2007

Hi Paul,

On Wed, Jan 10, 2007 at 03:48:50PM -0000, Paul Davies wrote:
> Trent,
> 
> Yes, agreed the issue still stands - not denying that ;) 
> 
> It just doesn't really apply to this current situation that is all (our
> trunks are quite well load balanced and in general the circuits always have
> 20Mbps+).

It's not so much a function of the traffic flow, but whether there is
traffic to/from that particular MAC passing over that trunk.

I guess the real question is, did your configuration include the
"switchport block unicast" commands?

Cheers,
Trent

> 
> I will read a bit more into the "block unicast" command and its effects.
> Thank you for your assistance so far.
>  
> Regards,
>  
> Paul Davies
> 
> -----Original Message-----
> From: Trent Lloyd [mailto:lathiat at bur.st] 
> Sent: 10 January 2007 15:39
> To: Paul Davies
> Cc: cisco-nsp at puck.nether.net
> Subject: Re: [c-nsp] Switches/nodes drop off the network
> 
> Hi Paul,
> 
> On Wed, Jan 10, 2007 at 02:45:31PM -0000, Paul Davies wrote:
> > Trent
> > 
> > > One issue that comes to mind is using 
> > >	"switchport block unicast"
> > 
> > > This essentially stops the switch from flooding packets when the MAC's
> > > port is unknown, each device has to send a packet *out* before it is
> > > contactable, which can cause things to "disappear" and "come back"
> > 
> > Thank you for your response. This is interesting, but things have a habit
> of
> > "not coming back" as it were, it takes the hosts hours as opposed to
> > seconds/minutes to come back (and sometimes they don't come back at all -
> > the second time this happened I migrated all switches back to our 3550
> > platform because no matter what I did it would not come back (yet putting
> a
> > laptop online, on the same port, as any of the IP's in question worked
> > fine).
> > 
> > I found that when it did occur the distributions (3750 stack) ARP table
> > showed all the IP entries as incomplete for the affected servers
> (naturally
> > when the laptop the ARP entry was complete).
> > 
> > I had thought that this was some form of bug (it is as if the stack has
> run
> > out of resources and cannot store any more entries - yet we are no where
> > near the limit of the SDM template in question).
> > 
> > > Furthermore, a 3750 stack will share the mac table between switches in
> > > the same stack, but multiple stacks will not share it between the
> > > stacks, so you may end up in situations with trunks (especially
> > > when the trunks arent as busy) where one stack can get to it but
> > > the other can not. 
> > 
> > Yes this I understand, but these distributions are completely separate and
> > the stacks do not require one another for the network to operate (i.e.
> > different customer base for different stacks). Each customer edge switch
> is
> > connected to only one stack (there are no edge switches where one port
> goes
> > to one stack and another port goes to another stack), which I believe is
> > what you are suggesting could cause problems (which is understandable).
> 
> If the two stacks are not connected to each other in a way that requires
> layer-2 traffic to be passed then yeh this "sub-part" wouldn't affect
> you but the general issue still stands
> 
> The above case just serves to "amplify" the issue because on the one
> stack in a busy environment you may never have your ARPs timeout but
> over a trunk that isn't very busy it may be more likely to time out.
> 
> Cheers,
> Trent
> 
> > 
> > Regards,
> >  
> > Paul Davies
> > -----Original Message-----
> > From: Trent Lloyd [mailto:lathiat at bur.st] 
> > Sent: 10 January 2007 14:26
> > To: Paul Davies
> > Cc: cisco-nsp at puck.nether.net
> > Subject: Re: [c-nsp] Switches/nodes drop off the network
> > 
> > One issue that comes to mind is using 
> > 	"switchport block unicast"
> > 
> > This essentially stops the switch from flooding packets when the MAC's
> > port is unknown, each device has to send a packet *out* before it is
> > contactable, which can cause things to "disappear" and "come back"
> > 
> > Furthermore, a 3750 stack will share the mac table between switches in
> > the same stack, but multiple stacks will not share it between the
> > stacks, so you may end up in situations with trunks (especially
> > when the trunks arent as busy) where one stack can get to it but
> > the other can not.
> > 
> > Cheers, 
> > Trent
> > 
> > On Wed, Jan 10, 2007 at 10:53:10AM -0000, Paul Davies wrote:
> > > We currently operate 2 separate stacks of Cisco 3750 switches at our
> > > distribution layer.
> > > 
> > >  
> > > 
> > >  
> > > 
> > > Stack 1 - Distribution 3
> > > 
> > >  
> > > 
> > > Switch   Ports  Model              SW Version              SW Image
> > > 
> > > 
> > > ------   -----  -----              ----------              ----------
> > > 
> > > 
> > >      1   28     WS-C3750G-24TS     12.2(25)SED
> > > C3750-ADVIPSERVICESK
> > > 
> > > *    2   28     WS-C3750G-24TS     12.2(25)SED
> > > C3750-ADVIPSERVICESK
> > > 
> > >  
> > > 
> > >  
> > > 
> > > Stack 2 - Distribution 4
> > > 
> > >  
> > > 
> > > Switch   Ports  Model              SW Version              SW Image
> > > 
> > > 
> > > ------   -----  -----              ----------              ----------
> > > 
> > > 
> > >      1   28     WS-C3750G-24TS-1U  12.2(25)SED
> > > C3750-ADVIPSERVICESK
> > > 
> > > *    2   28     WS-C3750G-24TS-1U  12.2(25)SED
> > > C3750-ADVIPSERVICESK
> > > 
> > >  
> > > 
> > >  
> > > 
> > > All our customer edge switches are WS-C2950T-24 of which there are
> between
> > > 25 and 30 which use port channel configurations (2 x 1000Mbps) to
> connect
> > to
> > > the distribution switches. The distribution switches contain VLANs for
> our
> > > customers some private VLAN, some just standard VLANs, all are similarly
> > > configured VLANs (nothing special).
> > > 
> > >  
> > > 
> > > Prior to utilising the 3750G series, we were utilising the 3550 series
> at
> > > our distribution (which are still in service for certain customers), we
> > > migrated all these switches recently however have had to migrate them
> back
> > > due to these problems. During the migration period the distribution
> stack
> > > was initially configured with the SDM template as "ipv4/ipv6 default",
> > > however when we started migration, once we got to the 13th switch, we
> > > instantly saw between 35ms and 60ms of latency when tracing through the
> > > distribution to any node (apart from nodes on the actual switch we have
> > just
> > > migrated which were sill 0.x ms as expected). Initially I thought the
> > stack
> > > was running out of resources (we have approximately 180 VLANs active,
> and
> > > about 18 port channels, storing 800 - 900 MAC addresses, the CPU was
> > > constantly high along with memory usage), due to the SDM template
> chosen,
> > > therefore we changed it to "ipv4/ipv6 vlan" and we experienced similar
> > > issues. We then changed the template to "desktop default" and all seemed
> > to
> > > work fine, all VLANs were active, all port channels were active, no
> > latency,
> > > routing was fine, no problems in general, CPU load and memory was low.
> > > 
> > >  
> > > 
> > > Then a day later (after all had been working fine) some very strange
> > > behaviour started - random server nodes seemed to be falling off the
> > > network, and on some occasions whole VLANs disappeared. During this
> > period,
> > > the gateway of the VLAN is reachable globally (including from the
> customer
> > > edge switch); the VLAN is up, the VLAN trunk is up and functioning on
> the
> > > port channel. The nodes remain down for significant periods (i.e. 3 to 4
> > > hours), on some occasions they come back online on their own (very
> > random),
> > > however if we remove the server from the equation and put a laptop on
> the
> > > port, configure the IP on the laptop, it often works fine and can gain
> > > access to the rest of the world (once the old machine is put back it
> still
> > > does not work though). I have reconfigured VLANs (i.e. changed VLAN
> > > numbers), this does not work. All our switches/routers send all logs to
> a
> > > Syslog server which during this period shows nothing out of the
> ordinary,
> > I
> > > enabled debugging for various sections however this did at one point
> crash
> > > one of the switches, and did not show anything out of the ordinary upon
> > > search (however I did only analyse a fraction of the data).
> > > 
> > >  
> > > 
> > > Due to the continued issues we decided to move back all our switches to
> > the
> > > 3550 series until we figured out the problems - we have another 3750
> > stack:
> > > 
> > >  
> > > 
> > >  
> > > 
> > > Stack 3 - Distribution 5
> > > 
> > >  
> > > 
> > > Switch   Ports  Model              SW Version              SW Image
> > > 
> > > 
> > > ------   -----  -----              ----------              ----------
> > > 
> > > 
> > >      1   26     WS-C3750-24TS      12.2(25)SED
> > > C3750-ADVIPSERVICESK
> > > 
> > > *    2   26     WS-C3750-24TS      12.2(25)SED
> > > C3750-ADVIPSERVICESK
> > > 
> > >  
> > > 
> > >  
> > > 
> > > This runs the IPv4/IPv6 default SDM template and works fine (however
> does
> > > not have that many VLANs or customers on at this time) - however this
> does
> > > occasionally have randomly high CPU load (95% - 100%), which is not due
> to
> > > routing updates, topologies changes or anything such as that since this
> > > switch sees very little action in terms of changes - again I have
> enabled
> > > logging and during this period I cannot see anything out of the
> ordinary,
> > > the "show processes cpu history" shows the issues (and at the time
> access
> > > becomes sluggish), yet "show processes cpu sorted" never shows anything
> > out
> > > of the ordinary, or something that it using a lot of CPU, however the
> > > overview figures at the head of the table show the same high figures as
> > the
> > > "cpu history" graph.
> > > 
> > >  
> > > 
> > > I believe some of these issues relate to a bug in the IOS some how - can
> > > anyone confirm if they have had any similar issues?
> > > 
> > > Please advise with regards any information anyone has!
> > > 
> > >  
> > > 
> > > Regards,
> > >  
> > > Paul Davies
> > > 
> > >  
> > > 
> > 
> > 
> > 
> > > _______________________________________________
> > > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > > archive at http://puck.nether.net/pipermail/cisco-nsp/
> 
>