[j-nsp] ISSU timeouts on MX upgrades due to large routing tables?

Wed May 22 18:44:03 EDT 2013

On Tue, May 21, 2013 at 09:01:57PM -0400, Clarke Morledge wrote:
> I was curious to know if anyone has run into any issues with large 
> routing tables on an MX causing ISSU upgrades to fail?
> 
> On several occasions, I have been able to successfully do an 
> In-Software-Service-Upgrade (ISSU) in a lab environment but then it 
> fails to work in production.
> 
> I find it difficult to replicate the issue in a lab, since in 
> production I am dealing with lots of routes as compared to a small 
> lab.  Does anyone have any experience when the backup RE gets its new 
> software, then reboots, but since it takes a long time to populate the 
> routing kernel database on the newly upgraded RE that it appears to 
> timeout?
> 
> I have seen behavior like this with upgrades moving from 10.x to a 
> newer 10.y and from 10.x to 11.y.

We had that issue for many years. There is a hard-coded timeout in the 
NSR process which is very easy to hit if you have a box with a large 
number of routes.

We had a case open on it for about 1.5 years, but Juniper refused to 
actually fix it ("it works fine in the lab"), and eventually we just 
gave us and declared ISSU to be dead. There are way too many other bugs 
with it anyways, even turning on NSR caused nothing but problems.

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)