[j-nsp] Rock-solid JUNOS for QFX5100

Philippe Girard philippe.girard at metrooptic.com
Mon Aug 12 11:31:33 EDT 2019


Hi Ross

We've recently switched our 5100s to 18.1R3-S5. 18.1 is stable with BGP/OSPF/LDP/RSVP/MPLS and LACP LAG in general. We don't use STP of any kind with the QFXs so I can't really help there.

I was hesitant to upgrade to 18.X since the 5100 was still the only QFX not to have and 18 version recommended on KB21476, but recently they updated the KB to include that model, so I'd say it's pretty safe now. They've pushed out S6 in July, if I'd have to re-do it now I'd use that one instead of S5.

The kind of problem you're describing sounds like what we've lived through with 14.X and VCF when we first started using these. We'd commit a change and some random ports would stop passing traffic, we'd then have to delete port config and re provision for traffic to resume. Lots of weird stuff like that kept happening until we go fed up with the architecture and moved to routed MPLS with almost no layer2 switching.

Good luck.

-phil




-----Original Message-----
From: juniper-nsp <juniper-nsp-bounces at puck.nether.net> On Behalf Of Ross Halliday
Sent: August 12, 2019 9:20 AM
To: juniper-nsp at puck.nether.net
Subject: [j-nsp] Rock-solid JUNOS for QFX5100

Dear List,

I'm curious if anybody can recommend a JUNOS release for QFX5100 that is seriously stable. Right now we're on the previously-recommended version 17.3R3-S1.5. Everything's been fine in testing, and suddenly out of the blue there will be weird issues when I make a change. I suspect maybe they are related to VSTP or LAG, or both.

1. Add a VLAN to a trunk port, all the access ports on that VLAN completely stopped moving packets. Disable/delete disable all of the broken interfaces restored function. This happened during the day. I opened a JTAC ticket and they'd never heard of an issue like this, of course we couldn't reproduce it. I no longer recall with confidence, but I think the trunk port may have been a one-member LAG (replacement of a downstream switch).

2. New trunk port (a two-port LACP LAG) not sending VSTP BPDUs for some VLANs. I'm not sure if it was coincidence or always broken as I had recently began feeding new VSTP BPDUs (thus the root bridge changed) before I even looked at this. Other trunk ports did not exhibit the same issue. Completely deleted the LAG and rolled back to fix. This was on a fresh turnup and luckily wasn't in a topology that could form a loop.

Features I'm using include:

- BGP
- OSPF
- PIM
- VSTP
- LACP
- VRRP
- IGMPv2 and v3
- Routing-instance
- CoS for multicast
- CoS for unicast
- CoS classification by ingress filter
- IPv4-only
- ~7k routes in FIB (total of all tables)
- ~1k multicast groups


There are no automation features, no MPLS, no MC-LAG, no EVPN, VXLAN, etc. These switches are L3 boxes that hand off IP to an MX core. Management is in the default instance/table, everything else is in a routing instance.

These boxes have us scared to touch them outside of a window as seemingly basic changes risk blowing the whole thing up. Is this a case where an ancient version might be a better choice or is this release a lemon? I recall that JTAC used to recommend two releases, one being for if you didn't require "new features". I find myself stuck between the adages of "If it ain't broke, don't fix it" and "Software doesn't age like wine". Given how poorly multicast seems to be understood by JTAC I'm very hesitant to upgrade to significantly newer releases.

If anybody can give advice or suggestions I would appreciate it immensely!

Thanks
Ross

_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp


More information about the juniper-nsp mailing list