[c-nsp] IS-IS LSP Generation/Expiry + Database Optimization - Issue

Sun Feb 22 00:57:02 EST 2009

Hello all.

We have a query that begs operational feedback from folk 
here re: IS-IS, particularly, the 'max-lsp-lifetime' and 
'lsp-refresh-interval' features that Cisco recommend as good 
practice for IS-IS deployments.

In our experience using 'max-lsp-lifetime 65535' and 'lsp-
refresh-interval 65000' as encouraged, we have encountered a 
couple of issues as regards recovering/restarting routers, 
and wonder whether this is a bug or feature as part of IOS's 
implementation of the IS-IS protocol.

We have seen that routers that return to operational status 
(either from a software crash or normal reload) may have 
some of their v4/v6 Loopback addresses not present in the 
IS-IS routing tables of the other routers in the network. 
This would lead to failure of iBGP to establish.

What's more interesting is that as all routers in the 
network are dual-homed to the core, each with 2x iBGP 
sessions to 2x route reflectors, we have found that both 
sessions may be up for v4, but only one for v6, where the v6 
session that's down is because the other route reflector 
doesn't see the recovered router's v6 Loopback address in 
it's IS-IS routing table. In other cases, the reverse is 
true, i.e., both v6 sessions are up, but only one or none of 
the v4 sessions is up - this issue can occur in various 
permutations, but you get the point.

To resolve this issue, we have seen that resetting the IS-IS 
process on the recovered router fixes the problem. In other 
cases, doing this on the DIS also solves the problem, but 
since the DIS is the Pseudonode for the rest of the network, 
we try to avoid doing it here unless really necessary.

Further, we have seen a somewhat similar issue with our 
backup DIS, where updates current on our DIS are sometimes 
not seen on the backup DIS.

We are wondering whether this is a function of the 'max-lsp-
lifetime' and 'lsp-refresh-interval' features we have 
enabled, or whether this is a bug. We are inclined to have 
more aggressive values for these features, than what Cisco 
recommend, because we can afford the "chatter" in our 
Level-1 areas (Gig-E or 10-Gig-E backbone), and CPU really 
isn't a big problem (the IS-IS database is very lean, 
Loopbacks + infrastructure only, and the CPU's are very 
fast).

We are running 12.2(33)SRC3 for all IS's, and our DIS and 
backup DIS are running 12.2(33)SXH3.

We've opened a case with TAC to figure out whether this is 
still recommended practice. 

Suffice it to say, we've had recovering JunOS-based routers, 
but haven't seen this issue (they still talk to SUP720-3BXL-
based DIS's).

Appreciate any (operational) feedback.

Cheers,

Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: <https://puck.nether.net/pipermail/cisco-nsp/attachments/20090222/3358ec0c/attachment.bin>