[f-nsp] How to protect a foundry device from layer2-loops
Stephen Wilcox
steve at telecomplete.co.uk
Sat Feb 24 05:19:27 EST 2007
Hi,
a port security mac limit would fix this or a packet storm limit.. looped ports always produce broadcast storms and loops of data - receiving many more macs than you are supposed to is a sure sign theres something wrong.
Steve
On Fri, Feb 23, 2007 at 02:58:34PM -0800, Brent Van Dussen wrote:
> Hi Gunter,
>
> Great question/problem you describe below... I think it is somewhat common
> for many people.
>
> The tricky part about bigiron's is they don't have a way to effectively
> deal with floods of broadcasts/multicasts/unknown unicasts other than
> sending them to the mgmt CPU. Our jetcore M4 modules can handle about
> 100,000 pps of this type of traffic before CPU reaches 100% and routing
> protocols start to fail.
>
> As you have learned, running spanning tree does absolutely nothing to
> protect your equipment from floods of packets so some kind of hardware
> filtering is needed before packets get punted to the mgmt cpu for
> processing. Why do you believe that some sort of broadcast/multicast
> limiting feature would not have helped in this situation? If you tried to
> run that command on a bigiron I could understand why you would feel that
> but from the MLX it should be no problem. What kind of switch does the
> leased line come in on?
>
> Loops are caused by broadcast/multicast/unk-unicast frames being generated
> somewhere on a network, and then being kept on the wire indefinately due to
> infinite forwarding. If you had a controlled lab environment with a loop,
> the switches would be fine, as soon as you inject a single arp broadcast,
> everything would be fine...it's not untill enough arp broadcasts or vrrp
> heartbeats compound themselves in this type of environment that traffic
> levels in the loop start becoming a problem.
>
> What is most disheartening is the apparent vulnerability the XMR has. The
> linecard should have been handling any bogus traffic and acting as a filter
> to the control plane that talks to the main management process. Did you
> get any snapshots of lc cpu on the effected XMR interface or was it just
> the main cpu that was showing signs of stress?
>
> Did you happen to get a dm raw from the bigiron to see what type of packets
> it was recieving? That information would be greatly beneficial to putting
> in place preventative measures for future problems that might flood your way.
>
> Thanks,
> -Brent
>
>
>
> At 11:00 AM 2/18/2007, Gunther Stammwitz wrote:
> >Hello colleagues,
> >
> >
> >We're using spanning tree and vlans in our internal network and everything
> >is working fine so far since layer2-loops are being resolved by spanning
> >tree and we can achieve redundancy this way.
> >
> >A few days ago a disturbing event happened: one of our leased line providers
> >who's providing us an untagged vlan between our site and a remote location
> >had a failed switch in his network which caused spanning tree to stop
> >working and therefore created a layer2-loop.
> >What we saw then was frightening: our network got "flooded" although we're
> >having only ONE port to the leased line provider and the loop was somewhere
> >in his network. The link from the ll-provider was coming in on a switch that
> >connects to our Bigiron 4000 core-switch with two links in the same untagged
> >vlan and uses spanning tree.
> >
> >Our Bigiron 4000 (SW: Version 07.8.01dT53) started melting down: the cli got
> >really slow and traffic wasn't switched anymore or at least there was a huge
> >packet loss.
> >The log file showed something like this:
> >W:System: Slot 1 Free Queue decreases less than the desirable values 3
> >consecutive times.
> >I:System: Slot 1 Write Sequence Drop 14177005 within 5 minutes.
> >I:System: Slot 1 Write Sequence Drop 14170290 within 5 minutes.
> >And so on..
> >
> >
> >Another thing we saw was that a Netiron MLX (software 3.2.x) that was
> >connected to the very same vlan got slow on the cli too. The cpu load seemed
> >to be very high and the device started loosing bgp sessions because the bgp
> >timers expired since it obviously didn't answer them in time.
> >N:BGP: Peer x.x.x.x DOWN (Rcv Notification:Hold Timer Expired)
> >
> >
> >Any idea how one can protect the network in such a situation?
> >Mac-Limits and Multicast-Limits wouldn't help. I guess broadcast storm
> >protection/broadcast limits wouldn't help either :-(
> >Would Limiting Unknown Unicasts help in such a situation? Is there some sort
> >of intelligence we can use on the switch in order to detect such situations
> >and use appropriate counter measures?
> >
> >How can it be that a loop in the ll providers network affects our switches
> >in such a bad way? I mean not only the vlan the ll-port was connected to
> >was down but all other vlans on the switch too because the switch started
> >failing.
> >
> >
> >
> >
> >And what exactly is happening on a network when there is a layer2-loop: as
> >far as I understand a packet being sent to the network is being copied and
> >copied again until forever and floods everything.
> >
> >
> >
> >_______________________________________________
> >foundry-nsp mailing list
> >foundry-nsp at puck.nether.net
> >http://puck.nether.net/mailman/listinfo/foundry-nsp
>
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp at puck.nether.net
> http://puck.nether.net/mailman/listinfo/foundry-nsp
More information about the foundry-nsp
mailing list