[c-nsp] N7K, SUP1, M1/M2/F2E, 6.2(10)

Phil Mayers p.mayers at imperial.ac.uk
Tue Dec 2 07:45:25 EST 2014


On 02/12/14 08:45, Saku Ytti wrote:

> For what problematic old fashioned architecture/design JunOS has, I've only
> ever seen similar programming issues due to ISSU in JunOS.
>
> I also don't see these issues in other Cisco kits, CRS1, ASR1k, ASR9k. I
> wonder if CSCO has recognized the same, or are these issues just treated as
> independent bugs rather than indication of some larger problem.
> Or am I seeing pattern where none exists?

I think the pattern is real. Like you, I've seen a surprisingly high 
number of FIB misprogramming on this platform over the years.

I'm assuming EARL7/8 have some specific characteristics that trigger 
these, for example strict timing or ordering requirements when 
programming, and that perhaps the IOS architecture - coop/yielding 
multitasking - make bugs in this area likely.

No idea why it happens on NX-OS but I assume they're re-using the HAL 
and probably the bugs live in there.

What I find most frustrating is that you can't "clear [mls|hardware] 
..." when these occur. There seem to be no way of resetting it to 
known-good state and reprogramming from scratch short of a reload; I 
would rather a 10 second outage whilst PFC is cleared and reprogrammed 
compared to 180 second as the box is reloaded :o/


FWIW I have seem FIB-misprogram on Juniper SRX high-end boxes where the 
tnp messages only propagate to 3 of 4 FPCs which causes odd problems. 
They are typically easier to clear though.


More information about the cisco-nsp mailing list