[j-nsp] some bugs to avoid

Patrick Tye ptye at iprimus.com.au
Mon May 16 19:40:13 EDT 2011


Hi Jeff,
            Could you provide a little more information on your last fault 
with the MX80 as I am in the process on rolling out the code 10.4R4.5 to 
MX480's.  If you have a case number off list that would be great as I will 
be asking the J-TAC about this as well.

Thanks

Patrick

---------------------------------------------------
Patrick Tye
Senior IP Engineer
Primus Telecommunications Aust.
Email: ptye at primustel.com.au
-----------------------------------------------------

----- Original Message ----- 
From: "Jeff Wheeler" <jsw at inconcepts.biz>
To: <juniper-nsp at puck.nether.net>
Sent: Tuesday, May 17, 2011 9:14 AM
Subject: [j-nsp] some bugs to avoid


We have had our first instance of serious filesystem corruption on an
EX4200 running 10.3R1.9.  I am hopeful that the new-fangled stuff in
10.4 will stop these incidents from causing switches to reboot into an
un-usable state requiring a reinstall from USB. :-/

In other news, we also observed an M160 with two REs (one in the
process of upgrading from JUNOS 6.2) exhibit an interesting new
failure mode.  The second RE incorrectly reported its CPU Temperature
as about 800 million degrees, which caused the master RE's chassisd to
spawn children emitting warnings about 3 times each minute.
Unfortunately, chassisd was not wait(2)ing on these children after
they exited.  This produced additional console warnings about the
maximum number of processes for uid 0.  After about 15 minutes, there
were enough <defunct> children of chassisd that the kernel process
table was full, resulting in a kernel panic and automatic reboot.  We
had to remove the second routing engine to prevent it from happening a
second time.

IRB on MX80 10.4R4.5 appears badly broken, too.  Configuring a
bridge-domain with one untagged/"access" interface and one
dot1q-tagged sub-interface, plus an IRB interface for layer-3, is a
pretty good way to waste a couple hours troubleshooting the router.
It works fine most of the time, and all looks well in the PFE console;
but a few times per hour the Bridge-Domain simply stops forwarding any
traffic, while the IRB loses its ARP entries.  This fault sometimes
lasts long enough for BGP to drop.

-- 
Jeff S Wheeler <jsw at inconcepts.biz>
Sr Network Operator / Innovative Network Concepts

_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp 



More information about the juniper-nsp mailing list