[j-nsp] bfd = busted failure detection :)
Richard A Steenbergen
ras at e-gerbil.net
Wed Dec 16 15:28:25 EST 2009
On Tue, Dec 15, 2009 at 11:03:08PM -0600, Kevin Day wrote:
>
> I went back and forth on this forever (pestering you while doing it),
> because it was affecting us badly on old M20s. My "lab" boxes would
> never ever show the problem, but it would happen in on the production
> routers. I finally gave up and decided to figure out what the
> difference was between my production configuration and the lab
> simulation by slowly changing my production config to match the nearly
> identical lab config.
>
> The problem went away when I removed a BGP session with a peer that
> was extremely slow to accept routes, and we were exchanging full
> tables with each other. I think it was some kind of deadlock where the
> peer wasn't accepting routes because it was blocked trying to send me
> stuff, and I was in the same boat. Snooping at the TCP layer, I didn't
> see anything unusual except both peers ended up in a state where they
> were advertising 0 window size to each other. The moment the KRT queue
> cleared up, they finished exchanging routes and all was happy.
>
> I can't say for certain that was the problem, but shutting down that
> peer was a pretty reliable way to clear the KRT queue problem whenever
> it happened.
What code was this? In theory shouldn't the routes be in a bgp queue
regardless of whats happening with the tcp layer? Should see if we can
reproduce this with modern hardware and code.
--
Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
More information about the juniper-nsp
mailing list