[j-nsp] bfd = busted failure detection :)
Richard A Steenbergen
ras at e-gerbil.net
Wed Dec 9 01:27:55 EST 2009
On Tue, Dec 08, 2009 at 07:54:49PM -0500, Ross Vandegrift wrote:
> On Fri, Dec 04, 2009 at 02:40:14PM -0600, Richard A Steenbergen wrote:
> > FYI I found the root problem and hereby take back any comments impugning
> > BFD's reputation. It turns out there actually WAS some kind of pfe bug
> > which was causing intermittent blackholing of traffic for a few seconds
> > at a time at seemingly random intervals several times a day. Ping from
> > the affected devices didn't catch the issue becuase of the re->pfe
> > forwarding path, only traffic routed entirely via pfe was being
> > affected. BFD was actually doing its job and detected the failures that
> > were too short to be noticed by normal routing protocols. I discovered
> > the issue on several MX960s (mostly running 9.2R4, but one pair was
> > running 9.4R3), and upgrading them to 9.5R3 seems to solve it (or
> > perhaps it was just the pfe rstart that did it, remains to be seen).
>
> Is there a PR number for this issue? Sounds like it could be a drag.
Maybe, but I'm not sufficiently motivated to try and explain the issue
to JTAC to find out. :) FWIW we've now upgraded 3 of the MX960s that had
the issue from 9.2R4->9.5R3 and it resolved things completely. 9.2R4 is
a giant buggy mess for a lot of others reasons anyways, there is really
no good reason to still be running it. The only reason we still had it
deployed anywhere was the grief caused by the logical-routers ->
logical-systems rename in 9.3 (which broke backwards compatibility for
commit scripts), this was just extra motivation to get some long overdue
upgrades done.
--
Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
More information about the juniper-nsp
mailing list