Re: [nsp] dCEF repair pressure group (was: Re: performance impact on rate-limit)

From: Tony Tauber (ttauber@genuity.net)
Date: Fri Aug 18 2000 - 12:56:28 EDT


On Fri, 18 Aug 2000, Christian Panigl, ACOnet/VIX/UniVie wrote:

> George et al.,
>
> I'm singing this blues since two years now, upgraded to RSP4 in the
> meantime, changed all VIPs to VIP2-50 with 128MB DRAM, changed from
> 11.1CC to 12.0S but still, enabling dCEF cumulatively disconnects my
> customers from the Internet. In our environment, running
> (non-distributed) CEF on our 7507 is fine, just dCEF is the problem.
> I got the impression that specifically configurations with many
> sub-interfaces (ATM, FrameRelay, VLANs) do have problems with dCEF.

A year ago (exactly!) I noted on this list a dCEF problem, but
one that was certainly not crippling:

++> Date: Wed, 18 Aug 1999 11:06:32 -0400 (EDT)
++> From: Tony Tauber <ttauber@bbnplanet.com>
++> To: Ron Buchalski <rbuchals@hotmail.com>
++> cc: rogerio@embratel.net.br, cisco-nsp@puck.nether.net,
++> Subject: Re: [nsp] Distributed CEF x Distributed Fast Switching
++>
++> I'm not trying to spread FUD, but a depressing observation has been
++> that in many cases where dCEF is needed to take the load off the CPU
++> (high interrupt util due to lotsa traffic), it results in some kind
++> of blocking state where one full-ish interface can result in input
++> "ignores" accruing on another interface.
++> Disabling dCEF for regular CEF fixes the problem.
++>
++> Still looking on insight into the distributed switching architecture
++> to understand this phenomenon.
++>
++> Tony

I should've followed up sooner on what I found out.
Seems the problem was that the VIPs, the SRAM (not DRAM)
is carved up into outbound queues per outbound interface
on the box. The problem came when there were many outbound
interfaces (eg. CT3s or as you mention PVCs, VLANs) and we
had only 2MB of SRAM (the max on VIP2-40s).
The SRAM is carved into equal-sized chunks for these queues.

Take 2048K of SRAM on a CT3IP (which is VIP2-40-based).
Divide that by 28 channels and each gets 73K.
Divide that by 114 (if you've got 4xCT3 + 2xPOS on the box, hypothetically)
and you get about 640 bytes of SRAM buffers.
10x64byte packets (or one big packet) and you're done.

Perhaps some 7500 geek will correct me if I've gotten it wrong.

This I see as an poor design decision for an aggregation box
with almost all traffic exiting over a few high-speed interfaces
from the little guys.
     
Not sure this is the problem you're seeing since, as I said, the
symptom in our case was some amount of input ignores on some
interfaces, not complete loss of connectivity.

> Well, is there any interest in forming a pressure group to increase the
> motivation @ Ci$co to finally throw some manpower onto solving those more
> than annoying (d)CEF problems ?

One solution is to throw more SRAM at the problem by upgrading
to VIP2-50s which start with 4MB and can go to 8.

A knob to do the buffering differently might be nice if possible,
but I'm guessing Cisco may be unlikely to put the effort into this
given that the 7500 platform is past its prime.

Tony



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:12:15 EDT