[c-nsp] CEF inconsistency with 12.2(25)S1

Mon Nov 1 17:14:14 EST 2004

On Mon, Nov 01, 2004 at 12:29:20PM -0500, Rodney Dunn wrote:

> It's something pretty new so I don't have a lot of
> flying time with it yet but there are event traces
> for CEF in new code.
> 
> See if you can catch a route that is broken and see
> if it shows up changing in the event log.
> 
> 103_#sh monitor event-trace cef ipv6 ?

okay, here we go. I didn't dump the output of "... latest" regularly,
since there are timestamps in there and it is not deleted.

Here we go

BACK1-C7206-OTN-SATIP#sh bgp ipv6 unicast 2001:8E8::/32
BGP routing table entry for 2001:8E8::/32, version 2359
Paths: (5 available, best #3, table Global-IPv6-Table)
  Advertised to update-groups:
     1          3          4          6
  6453 10566 15589 5609 15469
    2001:5A0:200::5 from 2001:5A0:200::5 (66.110.0.14)
      Origin IGP, metric 20, localpref 100, valid, external
      Community: 29259:2100 29259:2160 29259:2161
  6453 10566 15589 5609 15469, (received-only)
    2001:5A0:200::5 from 2001:5A0:200::5 (66.110.0.14)
      Origin IGP, localpref 100, valid, external
  8767 5539 3257 5609 15469
    2001:A60:0:201::1:1 from 2001:A60:0:201::1:1 (62.245.135.1)
      Origin IGP, metric 5, localpref 110, valid, external, best
      Community: 0:110 0:1000 5539:100 8767:2000 29259:2100 29259:2170 29259:2171
  8767 5539 3257 5609 15469, (received-only)
    2001:A60:0:201::1:1 from 2001:A60:0:201::1:1 (62.245.135.1)
      Origin IGP, localpref 100, valid, external
      Community: 0:110 0:1000 5539:100 8767:2000
  8767 5539 3257 5609 15469
    2001:1B10::12 (metric 20) from 2001:1B10::12 (83.170.0.2)
      Origin IGP, metric 5, localpref 110, valid, internal
      Community: 0:110 0:1000 5539:100 8767:2000 29259:2100 29259:2170 29259:2171

BACK1-C7206-OTN-SATIP#sh ipv6 route 2001:8E8::/32
IPv6 Routing Table - 580 entries
[...]
B   2001:8E8::/32 [20/5]
     via 2001:A60:0:201::1:1, GigabitEthernet0/1.7

BACK1-C7206-OTN-SATIP#sh ipv6 cef 2001:8E8::/32
2001:8E8::/32
  nexthop FE80::20C:86FF:FE9A:3819 GigabitEthernet0/3

BACK1-C7206-OTN-SATIP#sh monitor event-trace cef IPv6 2001:8E8:: all detail

Nov  1 11:39:45.922:  [Default] 2001:8E8::/32         NDB up                   [OK]
Nov  1 18:12:12.391:  [Default] 2001:8E8::/32         NDB not default          [Fail]
Nov  1 18:12:16.475:  [Default] 2001:8E8::/32         NDB not default          [Fail]
Nov  1 18:25:21.259:  [Default] 2001:8E8::/32         NDB not default          [Fail]
Nov  1 18:25:21.275:  [Default] 2001:8E8::/32         NDB not default          [Fail]
Nov  1 18:28:02.687:  [Default] 2001:8E8::/32         NDB not default          [Fail]
Nov  1 18:28:02.703:  [Default] 2001:8E8::/32         NDB not default          [Fail]
Nov  1 18:29:01.627:  [Default] 2001:8E8::/32         NDB not default          [Fail]
Nov  1 18:29:01.639:  [Default] 2001:8E8::/32         NDB not default          [Fail]
Nov  1 18:29:28.359:  [Default] 2001:8E8::/32         NDB not default          [Fail]
Nov  1 18:29:28.375:  [Default] 2001:8E8::/32         NDB modified             [OK]
Nov  1 19:03:46.875:  [Default] 2001:8E8::/32         NDB modified             [OK]
Nov  1 19:03:46.892:  [Default] 2001:8E8::/32         NDB not default          [Fail]

there has been some severe flapping in BGP for this prefix, unfortunately
I do not have any monitored destination in this prefix so I can't tell
when exactly the connectivity got lost. For this I probably would have
to do a dump of "sh ipv6 cef" every minute :-\

 2004-11-01 19:04:16 | 29259 8767 5539 3257 5609 15469
 2004-11-01 19:03:46 | 29259 8767 5539 3257 5609 15469
 2004-11-01 18:30:53 | 29259 6453 10566 15589 5609 15469
 2004-11-01 18:29:57 | 29259 6453 10566 13944 4555 6830 3320 1275 5609 15469
 2004-11-01 18:29:27 | 29259 8767 5539 8472 1752 2110 12702 5424 1275 5609 15469
 2004-11-01 18:28:33 | 29259 8767 5539 3320 15589 1275 5609 15469
 2004-11-01 18:28:02 | 29259 8767 5539 3320 1275 5609 15469
 2004-11-01 18:25:50 | 29259 8767 5539 3320 1275 5609 15469
 2004-11-01 18:25:22 | 29259 8767 5539 3320 5609 15469
 2004-11-01 18:21:25 | 29259 8767 5539 3320 5609 15469
 2004-11-01 18:17:56 | 29259 8767 5539 3257 5609 15469
 2004-11-01 18:12:42 | 29259 8767 5539 3320 5609 15469
 2004-11-01 18:12:12 | 29259 8767 4589 3257 5609 15469

Note that this bgpdumps are taken from an eBGP peer daemon connected to
an iBGP peer of the two misbehaving routers, so those above are definitely
not all updates the problematic router receives.

The other backbone router is clear for 2001:8E8::/32, but shows the same
problem for 2001:16a8::/32, which also had a mass of updates (BGP count-to-
infinity, prefix withdrawn, came back 45 Minutes later). The last lines 
of the event-trace look like

Nov  1 17:00:04.740:  [Default] 2001:16A8::/32        NDB not default          [Fail]
Nov  1 17:00:57.048:  [Default] 2001:16A8::/32        NDB not default          [Fail]
Nov  1 17:00:57.096:  [Default] 2001:16A8::/32'00     FIB remove (flagged)     [OK]
Nov  1 17:00:57.096:  [Default] 2001:16A8::/32'00     FIB remove (deleted)     [OK]
Nov  1 17:00:57.094:  [Default] 2001:16A8::/32        NDB down                 [OK]
Nov  1 17:38:36.478:  [Default] 2001:16A8::/32        NDB up                   [OK]
Nov  1 17:42:26.374:  [Default] 2001:16A8::/32        NDB modified             [OK]
Nov  1 17:42:36.158:  [Default] 2001:16A8::/32        NDB not default          [Fail]

> This way maybe we could catch some information prior to
> it happening.  The problem is you don't know when the event
> happens to collect the event logs so maybe have a script
> that runs the command over and over periodically with the
> latest option.
> 
> 103_#sh monitor event-trace cef ipv6 latest 

As far as I can see that doesn't provide any additional information. You
have the timestamp and everything else for the last 10000 records in
the router.

Bernhard