[nsp] Stable 6500 hybrid code?
Steve Francis
steve@expertcity.com
Wed, 20 Nov 2002 20:47:22 -0800
What are the current recommendations anyone has for stable 6500 code,
for hybrid mode SupII/MSFC2?
(Fairly vanilla BGP, OSPF, HSRP, with some PBR)
We have been running 6.3(6) CatOS, 12.1(8b)E9 IOS.
However, this morning we got inconsistency on the CEF tables in the
switch and the router. At first it looked like a RPF error (switch
would inconsistently drop packets only if the source address was routed
out one particular peering.) Yet RPF counters did not increment.
To avoid that, we reloaded the router, then basically nothing worked,
and we had to admin down almost all interfaces to get a working network.
(While you could ping an interface of the router via a router on a local
subnet, and things like the loopback of the router were being advertised
in OSPF, you could not ping the loopback from even an adjacent, shared
interface router.) An ACL with the log keyword made individual IP's
work, forcing CPU switching.
At this point the TAC engineer on the router tried "no mls ip unicast ",
which caused the whole switch to crash with TLB Exception. (And even
more fun - not respond to the console except with garbled Hex. Needed a
power cycle.)
I cannot find any bugs matching what we experienced, so I cant see what
versions fix them.
Most importantly, anyone have recommendations for stable CatOS and IOS?
Anyone recognize the above bugs?
Anyone have any idea how to make a 6500 run again if it crashes, and
outputs this:
TLB Exception (load/instruction fetch) occurred.
Software ver
sion = 6.3(6)
Process ID #1b, Name = Fib
EPC: 809EFC54
{stack trace}
GDB: TLB Exception (load/instruction fetch)
GDB: The system has trapped
into the debugger.
GDB: It will hang until examined with gdb.
Please use normal
gdb. special gdb will not work on this apollo+ board
||||$S10#b4
Getting remote staff to power cycle remote core switches (which,
incidentally, failed in such an interesting way that I could still talk
to some nodes attached to it, but it seemed to take out most nodes on
its functionally paired switch) was not the quickest way to restore service.
Thx