[j-nsp] some bugs to avoid

Jeff Wheeler jsw at inconcepts.biz
Mon May 16 19:14:49 EDT 2011

We have had our first instance of serious filesystem corruption on an
EX4200 running 10.3R1.9.  I am hopeful that the new-fangled stuff in
10.4 will stop these incidents from causing switches to reboot into an
un-usable state requiring a reinstall from USB. :-/

In other news, we also observed an M160 with two REs (one in the
process of upgrading from JUNOS 6.2) exhibit an interesting new
failure mode.  The second RE incorrectly reported its CPU Temperature
as about 800 million degrees, which caused the master RE's chassisd to
spawn children emitting warnings about 3 times each minute.
Unfortunately, chassisd was not wait(2)ing on these children after
they exited.  This produced additional console warnings about the
maximum number of processes for uid 0.  After about 15 minutes, there
were enough <defunct> children of chassisd that the kernel process
table was full, resulting in a kernel panic and automatic reboot.  We
had to remove the second routing engine to prevent it from happening a
second time.

IRB on MX80 10.4R4.5 appears badly broken, too.  Configuring a
bridge-domain with one untagged/"access" interface and one
dot1q-tagged sub-interface, plus an IRB interface for layer-3, is a
pretty good way to waste a couple hours troubleshooting the router.
It works fine most of the time, and all looks well in the PFE console;
but a few times per hour the Bridge-Domain simply stops forwarding any
traffic, while the IRB loses its ARP entries.  This fault sometimes
lasts long enough for BGP to drop.

Jeff S Wheeler <jsw at inconcepts.biz>
Sr Network Operator  /  Innovative Network Concepts

More information about the juniper-nsp mailing list