[F10-nsp] LACP drops simultaneously across multiple switches/products/versions

Mon Mar 5 15:37:52 EST 2012

You don't go into detail as to the log messages you see during the
failure, so it's certainly hard to diagnose with anything but
anecdote. However, here's my anecdote....

I have encountered similar sporadic LACP issues across numerous
switches on an extremely large scale. The best Force10 could suggest
was to try using 30 second LACP heartbeat timers, presumably so their
control plane had sufficient time to reply to heartbeat messages. To
be honest, this particular scenario was not acceptable so I didn't
even bother to validate if this actually "fixed" anything.

This is pretty much why we dropped all our layer 2 link aggregation
and moved to L3 ECMP load balancing across links.

In my opinion, a lot of these problems are fundamental design issues
with regards to control plane management.

On Thu, Mar 1, 2012 at 6:34 AM, Doug Warner <doug at warner.fm> wrote:
> We're having a strange issue where LACP will bounce on multiple switches
> simultaneously, typically several times in a row.
>
> We previously would see this on our S50n stack when it was our core switch,
> but it hadn't happened in over a year.  Now that we have a C300 in addition to
> the S50n stack we've seen it 5 times in 4 days.
>
> What we've seen so far is two LACP groups from the C300 to our only two S55s
> will bounce, then all the LACP groups on the C300 will bounce as well as all
> the LACP groups on the S50n stack.
>
> We don't get any CPU watchdog notices, and traces don't show that the LACP
> process has restarted.
>
> Has anyone experienced these types of problems?  I have an open TAC case
> currently but want to get others experiences here.
>
> -Doug
>
>
> _______________________________________________
> force10-nsp mailing list
> force10-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/force10-nsp
>