Re: [nsp] first big cef issue

From: Tony Tauber (ttauber@genuity.net)
Date: Fri Mar 30 2001 - 17:07:57 EST


On Tue, 27 Mar 2001, Ramin K wrote:

> We had our first definite/serious cef issue and I realized that I really
> don't know jack about dealing with it.
>
> In this case packets sourced from one /23 routed fine over the OC-3. The
> other /23 did not work at all. Bounced the OC-3, cleared BGP, cleared the
> route, etc. no change. Decided it really was cef. I shut the OC-3 down
> till the maint window. Talked to Cisco and the peer, cleared cef on the
> line card that night and the problems went away.
>
> Am I limited to mostly just clear cef on the line card and hope the router
> does not explode? The adjacencies seemed fine from what I could tell. Is
> that also a symptom, everything looks okay? When I did the clear cef slot
> #, all the other linecards kicked out cef tracebacks and CPUHOG messages.
> Is that standard too? I'm looking for more real world practical experience
> since I'm also talking to Cisco to get an idea on how to handle it better.

I hate CEF FUD, so I'm chiming in.

Been awhile since I was in the thick of things but problems can
arise when linecard memory is insufficient for everything demanded
of it (eg. image footprint, CEF tables, etc.)
Insufficient resources are a generic computing issue, not a CEF problem.

Looking at "show cef linecard" can point up problems with
synchronization of CEF on the LCs and the RP.
If the table version numbers in the second column don't agree,
you've got some IPC problem and it may well be related to a
resource shortage (eg. DRAM).
There are also some plain old bugs that cause table inconsistencies.
Checking a particular problem CEF entry on the RP vs. the one on the
LC should show up this problem.

snjpca1-cr1>sh cef line
CEF table version 642126, 108126 routes
Slot CEF-ver MsgSent XdrSent Seq MaxSeq LowQ MedQ HighQ Flags
1 642126 2682836 37347369 4230 4254 0 0 0 up, sync
2 642126 2682804 37347367 4199 4223 0 0 0 up, sync
3 642126 2682770 37347358 4165 4189 0 0 0 up, sync
4 642126 2682826 37347382 4221 4245 0 0 0 up, sync
5 642126 2682821 37347384 4216 4240 0 0 0 up, sync
6 642126 2682815 37347395 4210 4234 0 0 0 up, sync

>
> We're approaching a Gb/s total at peak now and from what I've heard cef
> problems will likely increase with traffic. Is there a "Recognizing and
> Killing a CEF Issue During Production Without Causing Serious Downtime Best
> Practices" doc? :-)
>

I don't know that traffic load has any bearing on the problems above.
It's more likely tied to table size; however, larger amounts of traffic
can tickle bugs as new memory locations get touched that weren't
touched before.

Some ideas.

Tony



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:12:34 EDT