[j-nsp] Suggestions for Edge/Peering Router..

Karl Gerhard karl_gerh at gmx.at
Tue Oct 1 05:44:03 EDT 2019


Hello Kamal

thanks for sharing your valuable insight. Very interesting to see what happens when one tries to upgrade hundreds of Juniper devices.
Upon reading your mail one could get the impression that the upgrade woes and all the bugs+regressions are a structural problem related to how Junos is being developed.

Two more things:
*The two bugs I mentioned in my last mail affected Junos 17.4 and Junos 18.1. I wish QA was bad just for Junos 14/15, but no, the problems seem to persist.
*We've solved the problem you described Kamal by never upgrading any of our access switches (EX4300 and EX4200) at all. We install a tried and tested firmware before deploying them to prod and then we're done, hopefully forever. After lots of experimentation, we use our EX4200 and EX4300 only for switching, me0 is the only interface with an IP address. However, I do understand that this only feasible if you run a very uniform environment as we do.

Regards
Karl



On 01.10.19 00:40, Kamal Dissanayaka wrote:
> Hi,
>
>
> Up grade issues !
>
> This is very valid point, we are totally fed up with,
>
> Upgrade is a horrible
>
> We got round 2 and half years to upgrade around 200 Ex4200 devices.
> Initially every one in three devices failed to boot after the upgrade.
> Juniper keep on preaching us go to the site and format install.
> They don’t seems understand how impractical to upgrade large number of
> devices on console.
>
> Finally long chase up they agreed they have issue and provided working long
> process which took lots of time to script. Still more than 5% failed
> causing incidents,
>
> Worse thing is after 6 months all these hard works. We found some nasty
> bugs, now they recommend new version.
>
>
> Two months back i started up grading around 120 numbers of  ACx2100 devices.
> Process is very complicated. Firstly we found Re filters are broken in
> interim Version loosing remote access. Some has to go to site and remove RE
> filters.
> That is fixed by removing RE filters before the upgrade  then we found ssh
> folders are deleted during the request system software add process this
> cause access issues when upgrade fails due to validation errors. There are
> were many incompatible configs between versions.
> It took two solid months and many sleepless nights to get smooth running
> process, JTAC response to these issues are not very helpfull some times.
> there response is always workarounds not to the root course.
>
> On MX upgrades we had very horrible stories too. Large number of DPC cards
> didnt boot up after the upgrades, They are like price of a luxury car when
> we bought them. I cant imagine why they are so delicate. Our upgrades put
> on hold few times due large number of failures. We still have one MX960
> with old version, no one likes to touch because it had many DPC cards on it.
>
> I agree that Junos is very mature OS but these issues causing the network
> hard to maintain.
>
> Thank you
>
> Kamal
>
> On Monday, September 23, 2019, Karl Gerhard <karl_gerh at gmx.at> wrote:
>
>> Hi,
>>
>> I'd like to point out one more thing because I feel that this point hasn't
>> been stressed enough:
>> Upgrading Junos might be more time consuming than many people expect it to
>> be.
>>
>> The reason for this is that quite often, things that previously worked in
>> Junos will break in a new release. This affects very, very basic things
>> too. Even if you don't do magic fairy-stuff like EVPN-MPLS to EVPN-VXLAN
>> handoff or any MPLS at all, you might get bitten by this at some point.
>>
>> One example:
>> After an upgrade we noticed that local-preference just stopped working.
>> After investigating this, we realized that traffic for some subnets was
>> being diverted, while for others it was still following the old path.
>> Reason for this was that we had BGP PIC enabled. We had to downgrade to
>> another version because BGP PIC is important to us.
>>
>> Another, more nasty example of very basic things failing after an upgrade
>> is this:
>> We had packet loss on an IBGP link due to a dirty fiber. No problem, we
>> run LAGs everywhere, so we just disabled the link and send a technician to
>> the datacenter to clean the fiber. However, at the moment when we disabled
>> the dirty fiber some customers went completely offline. At this point a
>> minor incident had turned into a major one because customers were feeling
>> the impact and we started receiving calls from them. However, the routers
>> did not show any errors on either end. LACP was still working fine on the
>> three other links, BGP was still fine, lots of traffic was being forwarded.
>> But we were dropping packets somewhere, some customers were offline and
>> network engineers couldn't tell their boss what was wrong. Not a very
>> pleasant situation to be in.
>> In the end, we found out that the last upgrade introduced a new bug: If
>> you disable an interface that is part of a LAG, Junos would still hash
>> traffic on to that interface, so with a LAG consisting of 4 interfaces
>> where you disabled one, a quater of the traffic would just be blackholed.
>>
>> Be ready to put up with shit like this if you buy Juniper. The officially
>> recommended Juniper releases won't save you either: They too contain newly
>> introduced bugs that break things that previously worked flawlessly. This
>> is the main problem I have with Juniper: If you upgrade, you might spend
>> days debugging stuff that used to work flawlessly for years. Even the most
>> basic things like LAGs aren't safe. You might think that you found a
>> version that works for you, but then, weeks later, you find something that
>> got broken with the upgrade and then you need to schedule a new
>> upgrade/downgrade.
>>
>> The big companies have fancy and expensive labs and employees that spend
>> weeks testing new releases. However, we're a small hosting provider running
>> a bunch of MX480ies and other Juniper stuff. I need routers that I can
>> upgrade without fearing that my network will explode. Can't have that with
>> Juniper.
>>
>> If I were to rebuild our network again, I'd take a very good look at
>> Arista/ANET. As Saku already mentioned, they're the ones that have the best
>> practices in developing software. They might not have all the bells and
>> whistles that Juniper have, but at least I might get more sleep and peace
>> of mind when upgrading those than I got with my Juniper gear.
>>
>> Regards
>> Karl
>>
>>
>>
>>
>> On 19.09.19 09:09, Gert Doering wrote:
>>> Hi,
>>>
>>> On Thu, Sep 19, 2019 at 05:04:54PM +1200, Phil Reilly wrote:
>>>> MX104's are the dual brain unit of the 204. Though a 204 has 40/100G
>>>> capabilities. If I read your original request correctly about ip
>>>> routing. Not sure the 104/204 is grunty enough to deal with multiple
>>>> internet tables. Thats a demanding task these days best left to the
>>>> larger chassis.
>>> You can't really compare MX104 to MX204.
>>>
>>> The MX104 is ppc based and *slow*, and should have never ever shipped.
>>>
>>> The MX204 is a really really nice box, with a fast intel RE and 40/100G
>>> ports (though some - documented - restrictions on how they can be
>> combined),
>>> and from a RE/BGP point of view, en par with the larger MXes.
>>>
>>> And given the price point of the MX204, if the amount of interfaces is
>>> sufficient, just get two of them :-)
>>>
>>> gert
>>>
>>> _______________________________________________
>>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



More information about the juniper-nsp mailing list