[j-nsp] KRT Queue issue (was: Re: bfd = busted failure detection :)
Jeff.Richmond at frontiercorp.com
Thu Jan 7 13:29:19 EST 2010
David, thank you, much appreciated. Our biggest issue was on recovery times when using LS's and GRES/NSR. We don't use LS's in production, but at the time I only had a single MX960 in the lab to start testing with (along with other routers: M320s, M40e's, etc.). So, I carved it up a bit into a few LSs and that is when we started seeing extremely long recovery times during link and RE failure testing. Turns out it isn't really supported when using LS's (at least not in 9.5), but since we don't use them in production I hadn't really researched it ahead of time like I normally would. The other major issue we have had with 9.5R3.7 is with FRR or node/link-protection. We ran into this with a POC lab setup back in November and currently ATAC has it now in their lab and can reproduce it, but haven't figured out why the problem is happening. Basically when a failure occurs, the forwarding entry for the prefixes on the LSP are withdrawn and never readvertised, so instant blackhole. I am hoping to have another update by the end of the week that has more specifics on what is triggering the issue and the exact behavior. There is still a lot of confusion surrounding it, but it is good that they can see it, I suppose.
From: David Ball [davidtball at gmail.com]
Sent: Thursday, January 07, 2010 10:12 AM
To: Richmond, Jeff
Cc: Felix Schueren; juniper-nsp at puck.nether.net; Richard A Steenbergen
Subject: Re: [j-nsp] KRT Queue issue (was: Re: bfd = busted failure detection :)
Hi Jeff. They initially figured they had this solved a while ago,
and provided the PRs and fixes associated (PR291407, fixed in 9.3R3,
9.4R3, 9.5R2/3, 9.6R1). As such, we too were in the process of
eval'ing 9.5R3, though we were just getting going with it this week in
the lab (on Ts and MXs). I'm now waiting to hear back as to whether
we were hitting the same PR this time around (the trigger appeared to
be different) before finalizing on an upgrade path. I'd be interested
to hear about your 9.5R3 testing and associated problems.......we
don't currently use logical systems, but you never know.....
2010/1/7 Richmond, Jeff <Jeff.Richmond at frontiercorp.com>:
> David, did ATAC give any indication if this was a widespread issue over all releases of JUNOS, or are they thinking it is more localized to one more more specific versions? I am still doing a 9.5R3.7 lab evaluation with GRES and NSR on MX960s, and I know we have seen a couple of strange issues that looked to have been related to Logical Systems in conjunction with GRES/NSR, but I want to go back and take a closer look (seem Richard specifically mentioned 9.5R3.7 in one of his earlier emails).
> From: juniper-nsp-bounces at puck.nether.net [juniper-nsp-bounces at puck.nether.net] On Behalf Of David Ball [davidtball at gmail.com]
> Sent: Thursday, January 07, 2010 9:31 AM
> To: Felix Schueren
> Cc: juniper-nsp at puck.nether.net; Richard A Steenbergen
> Subject: Re: [j-nsp] KRT Queue issue (was: Re: bfd = busted failure detection :)
> GRES with NSR, yes. Apparently the stalling has to do with the
> master RE not receiving 'ok' from backup RE when it says it has an
> update. It won't install the new route to the forwarding table until
> the 'ok' is received from the backup, or similar, based on ATAC's
> information. At any rate, running core dumps went fine last night,
> but entire box reset (including SIBs) when GRES and NSR were
> deactivated (deactivating/reactivating was supposed to flush the
> queue....the outage was obviously not anticipated).
> 2010/1/7 Felix Schueren <felix.schueren at hosteurope.de>:
>>> I'm working with ATAC tonight to get them a running kernel core dump
>>> so they can look for root cause, but apparently disabling GRES,
>>> committing, re-enabling GRES, and committing again, somehow can
>>> temporarily resolve the issue (get the routes installed, I guess
>>> ?!?!). Don't ask me how GRES has anything to do with it....
>> is that GRES with or without NSR (nonstop-routing)? It it's with NSR, then I
>> could potentially see how this "stalling" might happen.
>> Kind regards,
>> Felix Schüren
>> Head of Network
>> Host Europe GmbH - http://www.hosteurope.de
>> Welserstraße 14 - 51149 Köln - Germany
>> Telefon: 0800 467 8387 - Fax: +49 180 5 66 3233 (*)
>> HRB 28495 Amtsgericht Köln - USt-IdNr.: DE187370678
>> Uwe Braun - Alex Collins - Mark Joseph - Patrick Pulvermüller
>> (*) 0,14 EUR/Min. aus dem dt. Festnetz, Mobilfunkpreise ggf. abweichend
>> __________ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 4749
>> (20100106) __________
>> E-Mail wurde geprüft mit ESET NOD32 Antivirus.
> juniper-nsp mailing list juniper-nsp at puck.nether.net
More information about the juniper-nsp