[c-nsp] Switches/nodes drop off the network
Paul Davies
P.Davies at coreix.net
Wed Jan 10 09:45:31 EST 2007
Trent
> One issue that comes to mind is using
> "switchport block unicast"
> This essentially stops the switch from flooding packets when the MAC's
> port is unknown, each device has to send a packet *out* before it is
> contactable, which can cause things to "disappear" and "come back"
Thank you for your response. This is interesting, but things have a habit of
"not coming back" as it were, it takes the hosts hours as opposed to
seconds/minutes to come back (and sometimes they don't come back at all -
the second time this happened I migrated all switches back to our 3550
platform because no matter what I did it would not come back (yet putting a
laptop online, on the same port, as any of the IP's in question worked
fine).
I found that when it did occur the distributions (3750 stack) ARP table
showed all the IP entries as incomplete for the affected servers (naturally
when the laptop the ARP entry was complete).
I had thought that this was some form of bug (it is as if the stack has run
out of resources and cannot store any more entries - yet we are no where
near the limit of the SDM template in question).
> Furthermore, a 3750 stack will share the mac table between switches in
> the same stack, but multiple stacks will not share it between the
> stacks, so you may end up in situations with trunks (especially
> when the trunks arent as busy) where one stack can get to it but
> the other can not.
Yes this I understand, but these distributions are completely separate and
the stacks do not require one another for the network to operate (i.e.
different customer base for different stacks). Each customer edge switch is
connected to only one stack (there are no edge switches where one port goes
to one stack and another port goes to another stack), which I believe is
what you are suggesting could cause problems (which is understandable).
Regards,
Paul Davies
-----Original Message-----
From: Trent Lloyd [mailto:lathiat at bur.st]
Sent: 10 January 2007 14:26
To: Paul Davies
Cc: cisco-nsp at puck.nether.net
Subject: Re: [c-nsp] Switches/nodes drop off the network
One issue that comes to mind is using
"switchport block unicast"
This essentially stops the switch from flooding packets when the MAC's
port is unknown, each device has to send a packet *out* before it is
contactable, which can cause things to "disappear" and "come back"
Furthermore, a 3750 stack will share the mac table between switches in
the same stack, but multiple stacks will not share it between the
stacks, so you may end up in situations with trunks (especially
when the trunks arent as busy) where one stack can get to it but
the other can not.
Cheers,
Trent
On Wed, Jan 10, 2007 at 10:53:10AM -0000, Paul Davies wrote:
> We currently operate 2 separate stacks of Cisco 3750 switches at our
> distribution layer.
>
>
>
>
>
> Stack 1 - Distribution 3
>
>
>
> Switch Ports Model SW Version SW Image
>
>
> ------ ----- ----- ---------- ----------
>
>
> 1 28 WS-C3750G-24TS 12.2(25)SED
> C3750-ADVIPSERVICESK
>
> * 2 28 WS-C3750G-24TS 12.2(25)SED
> C3750-ADVIPSERVICESK
>
>
>
>
>
> Stack 2 - Distribution 4
>
>
>
> Switch Ports Model SW Version SW Image
>
>
> ------ ----- ----- ---------- ----------
>
>
> 1 28 WS-C3750G-24TS-1U 12.2(25)SED
> C3750-ADVIPSERVICESK
>
> * 2 28 WS-C3750G-24TS-1U 12.2(25)SED
> C3750-ADVIPSERVICESK
>
>
>
>
>
> All our customer edge switches are WS-C2950T-24 of which there are between
> 25 and 30 which use port channel configurations (2 x 1000Mbps) to connect
to
> the distribution switches. The distribution switches contain VLANs for our
> customers some private VLAN, some just standard VLANs, all are similarly
> configured VLANs (nothing special).
>
>
>
> Prior to utilising the 3750G series, we were utilising the 3550 series at
> our distribution (which are still in service for certain customers), we
> migrated all these switches recently however have had to migrate them back
> due to these problems. During the migration period the distribution stack
> was initially configured with the SDM template as "ipv4/ipv6 default",
> however when we started migration, once we got to the 13th switch, we
> instantly saw between 35ms and 60ms of latency when tracing through the
> distribution to any node (apart from nodes on the actual switch we have
just
> migrated which were sill 0.x ms as expected). Initially I thought the
stack
> was running out of resources (we have approximately 180 VLANs active, and
> about 18 port channels, storing 800 - 900 MAC addresses, the CPU was
> constantly high along with memory usage), due to the SDM template chosen,
> therefore we changed it to "ipv4/ipv6 vlan" and we experienced similar
> issues. We then changed the template to "desktop default" and all seemed
to
> work fine, all VLANs were active, all port channels were active, no
latency,
> routing was fine, no problems in general, CPU load and memory was low.
>
>
>
> Then a day later (after all had been working fine) some very strange
> behaviour started - random server nodes seemed to be falling off the
> network, and on some occasions whole VLANs disappeared. During this
period,
> the gateway of the VLAN is reachable globally (including from the customer
> edge switch); the VLAN is up, the VLAN trunk is up and functioning on the
> port channel. The nodes remain down for significant periods (i.e. 3 to 4
> hours), on some occasions they come back online on their own (very
random),
> however if we remove the server from the equation and put a laptop on the
> port, configure the IP on the laptop, it often works fine and can gain
> access to the rest of the world (once the old machine is put back it still
> does not work though). I have reconfigured VLANs (i.e. changed VLAN
> numbers), this does not work. All our switches/routers send all logs to a
> Syslog server which during this period shows nothing out of the ordinary,
I
> enabled debugging for various sections however this did at one point crash
> one of the switches, and did not show anything out of the ordinary upon
> search (however I did only analyse a fraction of the data).
>
>
>
> Due to the continued issues we decided to move back all our switches to
the
> 3550 series until we figured out the problems - we have another 3750
stack:
>
>
>
>
>
> Stack 3 - Distribution 5
>
>
>
> Switch Ports Model SW Version SW Image
>
>
> ------ ----- ----- ---------- ----------
>
>
> 1 26 WS-C3750-24TS 12.2(25)SED
> C3750-ADVIPSERVICESK
>
> * 2 26 WS-C3750-24TS 12.2(25)SED
> C3750-ADVIPSERVICESK
>
>
>
>
>
> This runs the IPv4/IPv6 default SDM template and works fine (however does
> not have that many VLANs or customers on at this time) - however this does
> occasionally have randomly high CPU load (95% - 100%), which is not due to
> routing updates, topologies changes or anything such as that since this
> switch sees very little action in terms of changes - again I have enabled
> logging and during this period I cannot see anything out of the ordinary,
> the "show processes cpu history" shows the issues (and at the time access
> becomes sluggish), yet "show processes cpu sorted" never shows anything
out
> of the ordinary, or something that it using a lot of CPU, however the
> overview figures at the head of the table show the same high figures as
the
> "cpu history" graph.
>
>
>
> I believe some of these issues relate to a bug in the IOS some how - can
> anyone confirm if they have had any similar issues?
>
> Please advise with regards any information anyone has!
>
>
>
> Regards,
>
> Paul Davies
>
>
>
> _______________________________________________
> cisco-nsp mailing list cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3082 bytes
Desc: not available
Url : https://puck.nether.net/pipermail/cisco-nsp/attachments/20070110/57080f0c/attachment-0001.bin
More information about the cisco-nsp
mailing list