[F10-nsp] LACP drops simultaneously across multiple switches/products/versions

Doug Warner doug at warner.fm
Mon Mar 5 16:11:47 EST 2012


So far we've received the same suggestion from F10 to increase the LACP timers
and I agree that it basically means losing the feature we're trying to use.

Unfortunately I don't really have a whole lot of additional logging; I see the
LACP groups ungroup, RSTP changes, then LACP regroups, more RSTP changes, etc.
 I *finally* got a CPU interrupt watchdog notice on my S50n stack, but I've
seen this over half a dozen times now with no other error messages.

I appreciate the anecdotal support that others are seeing the same thing.

-Doug

On 03/05/2012 03:37 PM, Matt Hite wrote:
> You don't go into detail as to the log messages you see during the
> failure, so it's certainly hard to diagnose with anything but
> anecdote. However, here's my anecdote....
> 
> I have encountered similar sporadic LACP issues across numerous
> switches on an extremely large scale. The best Force10 could suggest
> was to try using 30 second LACP heartbeat timers, presumably so their
> control plane had sufficient time to reply to heartbeat messages. To
> be honest, this particular scenario was not acceptable so I didn't
> even bother to validate if this actually "fixed" anything.
> 
> This is pretty much why we dropped all our layer 2 link aggregation
> and moved to L3 ECMP load balancing across links.
> 
> In my opinion, a lot of these problems are fundamental design issues
> with regards to control plane management.
> 
> On Thu, Mar 1, 2012 at 6:34 AM, Doug Warner <doug at warner.fm> wrote:
>> We're having a strange issue where LACP will bounce on multiple switches
>> simultaneously, typically several times in a row.
>>
>> We previously would see this on our S50n stack when it was our core switch,
>> but it hadn't happened in over a year.  Now that we have a C300 in addition to
>> the S50n stack we've seen it 5 times in 4 days.
>>
>> What we've seen so far is two LACP groups from the C300 to our only two S55s
>> will bounce, then all the LACP groups on the C300 will bounce as well as all
>> the LACP groups on the S50n stack.
>>
>> We don't get any CPU watchdog notices, and traces don't show that the LACP
>> process has restarted.
>>
>> Has anyone experienced these types of problems?  I have an open TAC case
>> currently but want to get others experiences here.
>>
>> -Doug


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: OpenPGP digital signature
URL: <https://puck.nether.net/pipermail/force10-nsp/attachments/20120305/b59ee912/attachment.sig>


More information about the force10-nsp mailing list