[c-nsp] Reasons for "random" ISIS flapping?
Peter Rathlev
peter at rathlev.dk
Wed Aug 7 05:55:11 EDT 2013
On Wed, 2013-08-07 at 11:47 +0300, Saku Ytti wrote:
> a) are you seeing input drops in the hold-queue? (try 1k or even 4k
> hold-queue input)
Only one of the interfaces show any relevant amount of input queue drops
(122 drops) and the interface that have experienced most lost
adjacencies have only 1 drop.
ROUTR-A#sh int te5/4 | incl Input queue
Input queue: 0/256/1/0 (size/max/drops/flushes); Total output drops: 0
ROUTR-A#sh int gi4/2 | incl Input queue
Input queue: 0/256/1/0 (size/max/drops/flushes); Total output drops: 309614
ROUTR-A#sh int gi5/1 | incl Input queue
Input queue: 1/256/122/0 (size/max/drops/flushes); Total output drops: 107672
The device in question has a rather large-ish amount of SPD drops (42
ppm) according to "show ibc", but other devices in the network have much
higher values and no comparable problems. Is ISIS elegible for SPD or
prioritized? Top 5 devices with IBC drops among the C6k's in the
network:
Actual-paks Drops ppm SPD-drops ppm
----------- -------- ----- --------- ---
1807024142 1247536 690 1050616 581
1985318421 90561796 45616 321319 162
133750298 2974141 22237 9896 74
3633687626 57832307 15916 150898 42
2244275284 13140977 5855 90766 40
The "ROUTR-A" is number 4 on this list.
> b) is it busy running some other process? (try process-max-time 60)
We actually use "process-max-time 50" generally on all these devices.
The affected device is no different from the others in that regard. On
the other hand might this be too low? Maybe the ISIS process needs more
than 50ms to parse the hello packets in some strange instance and the
voluntary yielding (if applicable) means the packets are left unparsed.
Just blind guessing of course. :-)
> c) is it software defect
We're planning on upgrading to SXJ in the near future and might go with
15.1SY since others (e.g. Phil) seem to like it.
> I also couldn't help noticing you're running L1, why is this? It seems to
> be quite rare these days, you really have separate core L2 and various L1
> islands?
We only have one area and should actually be using L2 only. We hadn't
thought it through when we decided on L1 many years ago. I'm thinking
that L1 only or L2 only is better than L1+L2 everywhere and the only
practical drawback of using L1 seems to be the inability to inject a
default route. Any other gotchas we should be worrying about?
--
Peter
More information about the cisco-nsp
mailing list