[c-nsp] IS-IS Emergency

Justin Shore justin at justinshore.com
Wed Jun 20 14:40:12 EDT 2007


Many thanks to everyone's replies.  Your suggestions were very helpful. 
  TAC found the problem in the end.  The overload-bit was causing LSPs 
to go ignored in a very bad place (such as the backbone).  This caused 
widespread issues throughout the network (loss of default route for 
example) that made us look in all directions at once.  Once he found the 
problem the fix was simple and was taken care of less than an hour into 
the business day.  I'm still at a loss to figure out how this happened 
in the first place.  My RANCID archives show set-overload-bit being used 
on the other 7600 many weeks prior to this morning's incident.  I'm 
pretty sure I didn't add it this morning when I removed a number of old 
nets.  We did a SRB1 upgrade at the same time.  Surely it wouldn't have 
added it when it first booted.  That doesn't make any sense.  I'm not 
sure how it got in the config but it was sure there after the reboot. 
Unfortunately I surpassed my scrollback buffer already.

That brings up a related question.  Does anyone have any recommendations 
for using the overload-bit with a startup delay or BGP hold?  We set it 
to 5 minutes on boot.  I figure that's just enough time for the BGP to 
settle down.  These 720-3BXLs are in an iBGP mesh with both border 
routers and get a full Internet table from both.  The RIB Update and BGP 
Scanner processes don't usually settle down for 3-4 minutes.  Should I 
set a hard timeframe or should I just set it up to wait for BGP?

Thanks again for the assistance this morning.

Justin


Justin Shore wrote:
> During maintenance last night we somehow lost the IS-IS default route on 
> all our access edge routers.  The default is originated on 2 borders 
> which are connected to a pair of 7613s.  The 7613s were propagating that 
> default route to all the other IS-IS devices and to another POP.  We're 
> one big L2 domain on all devices.  This worked perfectly until last 
> night.  The edge devices are seeing the default (this can be seen in the 
> database) but it is not being installed in the RIB for some reason.  Any 
> ideas?
> 
> IS-IS Level-2 Link State Database:
> LSPID                 LSP Seq Num  LSP Checksum  LSP Holdtime      ATT/P/OL
> 7206-1.clr.00-00      0x00000157   0x2DF5        59275             0/0/0
>    Area Address: 49.0010
>    NLPID:        0xCC
>    Hostname: 7206-1.clr
>    IP Address:   10.64.0.1
>    Metric: 100        IP 10.64.0.128/31
>    Metric: 100        IP 10.64.0.130/31
>    Metric: 100        IS-Extended 7206-1.clr.04
>    Metric: 100        IS-Extended 7206-1.clr.03
>    Metric: 0          IP 0.0.0.0/0
>    Metric: 10         IP 10.64.0.1/32
>    Metric: 10         IP 64.71.98.59/32
> 7206-1.clr.03-00      0x0000006C   0x809E        14112             0/0/0
>    Metric: 0          IS-Extended 7206-1.clr.00
>    Metric: 0          IS-Extended 7613-1.clr.00
> 7206-1.clr.04-00      0x0000006E   0xB655        10927             0/0/0
>    Metric: 0          IS-Extended 7206-1.clr.00
>    Metric: 0          IS-Extended 7613-2.clr.00
> 
> I'm manually installing static defaults to work around the problem.  I 
> also noticed that Lo0 on the 2 borders are no longer being installed in 
> each other's RIB.  There is general flakiness all around.  We're running 
> SRB on the Sup720-3BXLs.
> 
> Thanks
>   Justin
> 
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
> 
> 



More information about the cisco-nsp mailing list