[j-nsp] bfd = busted failure detection :)

Hoogen hoogen82 at gmail.com
Mon Dec 14 13:23:48 EST 2009


Thanks for all the great info Richard...

-Hoogen

On Mon, Dec 14, 2009 at 1:23 AM, Richard A Steenbergen <ras at e-gerbil.net>wrote:

> On Sun, Dec 13, 2009 at 03:11:29AM -0600, Richard A Steenbergen wrote:
> > That one is pretty different from the usual slowness issue that seems to
> > be affecting most people. I just cleared bgp sessions on a router to
> > demonstrate the issue, which you can portions of any time you make a
> > major routing change. Unfortunately (for my demonstration) this router
> > was pretty small and didn't exhibit any stalls in processing fib
> > updates. The performance was pretty acceptable, fully syncing in under a
> > minute. I'm sure the simultanious loss of IGP routes and having more
> > complex routing protocol configurations has something to do with it too.
>
> Oh what good timing, just had to reboot a router tonight to recover from
> a differnet Juniper bug (enabling graceful-switchover on a 9.5R3 box
> caused blackholing of traffic, disabling it didn't fix it, had to reboot
> the box to clear the issue which of course blew away all the state, so
> there will be no finding the root cause). But it did provide a perfect
> example of the FIB blocking issue, with the vast majority of the routing
> table blocking for over 13 minutes before finally installing within a
> few seconds.
>
> Here we are at just past the 13 minute mark, BGP fully synchronized, but
> the vast majority of the routing table not actually installed to FIB:
>
> Groups: 65 Peers: 92 Down peers: 15
> Table          Tot Paths  Act Paths Suppressed    History Damp State
>  Pending
> inet.0           2793497     333891          0          0          0
> 292429
> inetflow.0            27         27          0          0          0
>    4
> inet6.0             9438       2075          0          0          0
>  811
>
> Here is the show krt queue from the same time, showing almost nothing in
> the queue. A followup command a second later showed completely different
> items in the queue, leading one to believe that the krt queue was not
> stuck.
>
> Routing table add queue: 0 queued
> Interface add/delete/change queue: 0 queued
> Indirect next hop add/change: 0 queued
> MPLS add queue: 0 queued
> Indirect next hop delete: 2 queued
>             DELETE index 1048789
>             DELETE index 1048790
> High-priority deletion queue: 0 queued
> High-priority change queue: 0 queued
> High-priority add queue: 0 queued
> Normal-priority indirect next hop queue: 0 queued
> Normal-priority deletion queue: 0 queued
> Normal-priority composite next hop deletion queue: 0 queued
> Normal-priority change queue: 0 queued
> Normal-priority add queue: 7 queued
>                ADD gf 1 inst id 0 173.164.0.0/19 type 3
>         (20)
>                ADD gf 1 inst id 0 173.162.16.0/20 type 3
>         (20)
>                ADD gf 1 inst id 0 173.160.64.0/19 type 3
>         (20)
>                ADD gf 1 inst id 0 217.168.224.0/20 type 3
>         (20)
>                ADD gf 1 inst id 0 209.211.136.0/24 nexthop
>         x.x.x.x, xe-7/1/0.0
>         (19)
>                ADD gf 1 inst id 0 208.45.191.0/24 nexthop
>         x.x.x.x, xe-7/1/0.0
>         (19)
>                ADD gf 1 inst id 0 208.45.190.0/24 nexthop
>         x.x.x.x, xe-7/1/0.0
>         (19)
> Routing table delete queue: 0 queued
>
> Here is an example of a route which has been stuck trying to install for
> over 8 minutes (first entry in a show route, the rest all look roughly
> the same though):
>
> 2.0.0.0/16         +[BGP/170] 00:08:40, MED 0, localpref 200, from
> xx.xx.xxx.xxx
>                      AS path: 5413 12654 I
>                    > to xx.xx.xxx.xx via xe-3/2/0.0, label-switched-path
> XXXXX
>                      to xx.xx.xxx.xx via xe-3/2/0.0, label-switched-path
> XXXXX
>                      to xx.xx.xxx.xx via xe-3/2/0.0, label-switched-path
> XXXXX
>                      to xx.xx.xxx.xx via xe-3/2/0.0, label-switched-path
> XXXXX
>                      to xx.xx.xxx.xx via ae0.50, label-switched-path
> Bypass->xx.xx.xxx.xx->xx.xx.xxx.xx
>                      to xx.xx.xxx.xx via ae0.50, label-switched-path
> Bypass->xx.xx.xxx.xx->xx.xx.xxx.xx
>                      to xx.xx.xxx.xx via ae0.50, label-switched-path
> Bypass->xx.xx.xxx.xx->xx.xx.xxx.xx
>                      to xx.xx.xxx.xx via ae0.50, label-switched-path
> Bypass->xx.xx.xxx.xx->xx.xx.xxx.xx
>
> The above is pretty representative of the issue, which has been going on
> in one form or another since around the mid 7.x's (confirmed by dozens
> of people I've talked to who saw the same behavior beginning at around
> the same time).
>
> --
> Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
> GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>


More information about the juniper-nsp mailing list