[c-nsp] IS-IS max-area-addresses
Justin Shore
justin at justinshore.com
Wed Apr 18 19:47:19 EDT 2007
I had to up the maximum number of IS-IS areas across our network last
night. Apparently max-area-addresses is one of those things that must
be common across IS-IS neighbors for adjacencies to build, or so it
seemed. I upped it to 254 though I only need a couple dozen. I did
this on 3 3800s, 2 2800s, 1 7206VXR and 2 Sup720-3BXLs in separate
chassis. All devices are running the latest greatest 12.4T except for
the 3BXLs which are running SRB.
Our NMS alerted me to a problem about 18 hours after the maintenance
window. 3 of the routers dropped off the network. As it turns out 2 of
the 3800s that provide SSL VPN termination and one of the 2800s stopped
sending IS-IS routes. At least I think they stopped *sending* them.
The routes may have been filtered on the 7600s; I'm not sure. Can
anyone refresh my memory on how to see what IS-IS routes are being
advertised? I remember OSPF and BGP but not IS-IS. Back to my story.
I check the IS-IS neighbors on the 7600s and found that System ID for
each of the affected routers was no longer the value of the hostname
command like normal but was instead their NSAP. For example:
7613-2.tld L1 Vl4005 10.64.130.3 UP 9
7613-2.tld.0F
0100.6400.0033 L1 Gi9/1 10.64.0.176 UP 8
0100.6400.0033.02
The Circuit ID was also affected. None of the routes that were supposed
to be advertised were in the 7600's RIB. The affected routers did
however have the routes from the 7600s. This is why my NMS thought the
hosts were down. The routes for Lo0 weren't being propagated. The
reason one of the 2800s was still being hit by the NMS was because I had
a static route to Lo0 on that 7600. Removing this demonstrates the problem.
All 4 affected routers had errors similar to this:
022564: Apr 17 19:51:41 CDT: %CLNS-3-BADPACKET: ISIS: L1 LSP, bad
max-area-addresses 0, ID 0100.6400.0034.00-00, seq 85, ht 0 from
0018.7425.6500 (GigabitEthernet0/0)
The 7206 and other 3845 that perform border functions were not affected
at all; they are also L1-2. To resolve this issue on the 2 3845s I
rebooted them both. The problem went away and has not returned. I have
not rebooted the 2800s yet. I opened a TAC case instead.
I do no believe this is a one-off problem. Not when 4 routers show the
problem. One common element is that all 4 routers are in a common VLAN
that's trunked between the 7600s. Another common element is that they
are all L1. The only routers with more than one area configured are the
7600s in the core. For the change I started with the 2800s, then the
pair of 3800s, the single 3800 and the corresponding 7600. After the
adjacencies re-established I did the other 7600 and 7206.
I held off on the reboot to give Cisco an opportunity to get in and look
at the routers when they are hosed. This sounds like a bug to me and
I'd like to see it fixed. I will have to reboot at least one 2800
tomorrow though. I have to restore service to that site. I'm trying to
get my TAC engineer to do whatever he needs to do to get someone's eyes
on the problem before I reboot.
Any ideas? Thanks
Justin
More information about the cisco-nsp
mailing list