[j-nsp] EX4200 VC PFE crashes

Thu Jan 17 09:33:05 EST 2013

On Jan 17, 2013, at 2:38 PM, David Siebörger <drs at sieborger.nom.za> wrote:

> Hi,
> 
> I've experienced something at least slightly similar.  We have VC pairs of
> EX4200s as campus distribution, acting as the default gateways for end-user
> subnets ranging from /27s to (in one case) a /21, also with OSPF + OSPFv3
> and LAGs down to access switches.
> 
> The symptoms I've seen most often are different to yours: one of the VCs
> will suddenly stop responding on ARP/NDP requests from some users' PCs any
> time after two weeks of uptime.  Digging in the pfe shows that the affected
> PCs have nhdb entries in the "hold" state.  (The other VCs also do the same
> thing, though much less regularly.)  Rebooting the master fixes the problem
> -- for another two weeks.  I've experienced the same thing while running
> JUNOS 10.4R9, 11.1R2, and 12.1R1.
> 
> However, on one occasion pfem crashed and left a core dump, as you've
> described.  pfem restarted and traffic returned to normal within a minute or
> two.  JTAC analysed the core dump resulting in PR790201, for which a fix is
> in recent releases of 12.x:
> 
> https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR790201
> 
> JTAC have now told me that both sets of symptoms are addressed by that fix. 
> I've deployed 12.1R4 on the worst-affected VC and it's now at 28 days uptime
> without incident.  I'm not celebrating yet because our university is still
> on summer vacation so network usage is lower than normal, but so far so
> good….

Hello David,

Thank you very much for your response. I came across that PR but didn't think it would apply to older releases. That's good to know this, thanks!

We're still waiting for a response from Juniper engineering. Hope the VC holds up until we get word back.

Regards,

--
Dennis Krul
Tilaa