[j-nsp] current JunOS versions for MX80, EX8200?

Richard A Steenbergen ras at e-gerbil.net
Fri Jan 7 17:18:03 EST 2011


On Thu, Jan 06, 2011 at 01:08:16PM -0800, Steve Feldman wrote:
> Does anyone have recommendations for stable JunOS versions for the  
> MX80 and EX8200?
> 
> The MX80s will be doing primarily BGP (a couple of full transit feeds  
> and some peering), OSPF, and some limited packet filtering.
> 
> The EX8200s are doing lots of L2 switching, along with some BGP, OSPF  
> and packet filtering.
> 
> Also desirable on both are IPv6 and flow export.

Well, we're currently running 10.3R2 on our MX80s, and so far nothing 
amazingly bad has happened on them. I've heard far worse things about 
10.4R1 on Trio, but we haven't actually touched that code yet. We hit a 
lot of serious Trio specific issues in earlier 10.2 branches on MX960s, 
some (but not all) of which were supposed to be fixed in 10.2R3, but we 
also haven't had a chance to test it on MX.

The biggest issue we've had with 10.3R2 so far was during an attempt to 
deploy it on an MX960 during a DPC->MPC migration, which involved adding 
MPCs, moving ports, then pulling DPCs. All of the DPC cards booted with 
some serious looking pfe errors, but we were still able to successfully 
migrate to the MPCs one port at a time, and things settled down after 
the DPCs were pulled. Next runner up would probably be this bug on "show 
interface ae# extensive", which doesn't show correct lacp/logical int 
counters, or even show the correct logical subints, like so:

  Logical interface ae1.16 (Index 79) (SNMP ifIndex 502) (Generation 144)
    Description: blah
    Flags: SNMP-Traps 0x4000 VLAN-Tag [ 0x8100.16 ]  Encapsulation: ENET2
    Statistics        Packets        pps         Bytes          bps
    Bundle:
        Input : 4742420284061    3077307 4903848257976971  26087741336
        Output: 4225342397941    2774101 3213147811040925  16815729632
    Link:
      xe-4/1/1.3003 <------ erm what?
        Input :             0          0             0            0
        Output:             0          0             0            0
      xe-3/1/0.16   ^^^^^^^^^^^^^^^^^^^^ Where is my traffic?
        Input :             0          0             0            0
        Output:        558416          0     135765498            0
      xe-3/3/1.16
        Input :             0          0             0            0
        Output:             0          0             0            0
      xe-4/1/0.16                       
        Input :             0          0             0            0
        Output:             0          0             0            0

Overall I'd say 10.3R2 has been "less bad" than 10.2 has treated us, but 
it really doesn't take much to hit that bar. :) We had previous been 
chasing our MX bugs all through 10.1, and 10.1R4 was supposed to be the 
magic release that fixed everything, but our first deployment of it 
resulted in rpd coredumps when rsvp autobandwidth params were changed 
(still no progress on figuring out why), so that put a quick halt to 
further deployments.

As for EX8200 recommendations... 10.2R3 was supposed to be our near term 
"fix all the really massively horrible disaster level bugs" code, but we 
deployed it on a couple of EX8200s and discovered some kind of an SNMP 
glitch which has a tendency to break polling. Essentially the SNMP 
system "stalls" for several seconds at a time, failing to respond to 
queries, which can trick your poller into thinking that the box is dead. 
Eventually the stalls clear and all the pending requests get answered 
(often with duplicates, so you might want to try snmp filter-duplicates 
if you do end up hacking this into working), but you will probably need 
to hack your poller to extend the timeout beyond what would be 
considered normal. Still haven't found a cause of this one, but you can 
tell if you'd be affected by it by looking at the time it takes to query 
snmp, even locally. The first is a 10.2R3 EX8200, the second is a 10.1S6 
box with a similar number of interfaces:

> repeat 10 time cli -c "show snmp mib walk ifHCInOctets | no-more" | grep %
0.105u 0.148s 0:16.03 1.4%      4827+3069k 0+0io 0pf+0w
0.107u 0.154s 0:18.06 1.3%      5100+3235k 0+0io 0pf+0w
0.098u 0.162s 0:13.82 1.8%      4663+3030k 0+0io 0pf+0w
0.102u 0.160s 0:09.02 2.8%      4460+2780k 0+0io 0pf+0w
0.104u 0.157s 0:08.16 3.0%      4825+3186k 0+0io 0pf+0w
0.102u 0.159s 0:09.05 2.7%      4501+2864k 0+0io 0pf+0w
0.112u 0.151s 0:11.68 2.2%      4644+3006k 0+0io 0pf+0w
0.104u 0.168s 0:11.82 2.1%      4760+3024k 0+0io 0pf+0w
0.101u 0.158s 0:08.34 2.9%      5033+3203k 0+0io 0pf+0w
0.099u 0.167s 0:08.54 2.9%      4929+3195k 0+0io 0pf+0w
              ^^^^^^^

> repeat 10 time cli -c "show snmp mib walk ifHCInOctets | no-more" | grep %
0.088u 0.155s 0:02.92 7.8%      4473+2967k 0+0io 0pf+0w
0.086u 0.123s 0:00.56 35.7%     4434+3052k 0+0io 0pf+0w
0.102u 0.167s 0:00.74 35.1%     4587+3118k 0+0io 0pf+0w
0.091u 0.128s 0:00.62 33.8%     4154+2710k 0+0io 0pf+0w
0.106u 0.155s 0:00.91 27.4%     4408+2995k 0+0io 0pf+0w
0.096u 0.169s 0:02.90 8.6%      4804+3277k 0+0io 0pf+0w
0.088u 0.137s 0:00.61 34.4%     4345+2926k 0+0io 0pf+0w
0.104u 0.159s 0:00.68 36.7%     4491+3047k 0+0io 0pf+0w
0.114u 0.159s 0:03.94 6.5%      4425+3080k 0+0io 0pf+0w
0.111u 0.147s 0:00.80 31.2%     4441+3026k 0+0io 0pf+0w
              ^^^^^^^

I forget if 10.1R4 had most of the fixes for the issues we've been 
dealing with, but it might be your best runner up if you need something 
that works immediately (and if you don't mind that sflow doesn't work at 
all on routed interfaces in pre-10.2 code). If you can wait a little 
while, we're currently targeting 10.3R3 as our new "try to fix 
everything" release for EX8200, and it's due out at the end of Jan.

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)


More information about the juniper-nsp mailing list