[j-nsp] How to pick JUNOS Version

Wed Sep 2 03:37:12 EDT 2020

On Wed, 2 Sep 2020 at 10:23, Andrew Alston
<Andrew.Alston at liquidtelecom.com> wrote:

>   2.  Start looking at the new features - decide what may be useful - if anything - and start testing to that to death - again preferably before release so that the fixes can be in when it is released

How do people measure this? Vendors spend tens or hundreds millions
annually on testing, and still deliver absolute trash NOS, to every
vendor, and there is no change that I can observe +20 years in
quality. Basic things are broken, and everyone finds new fundamental
bugs all the time.

I think NOS are shit, because shit NOS is a good business case and
good NOS is a bad business case, I know it sounds outrageous, but let
me explain. Vendor revenue is support contract, not HW sales. And a
lot of us don't need help on configuring or troubleshooting, a lot of
us have access to community which outperforms TAC on how to get that
box working. But none of us has access to the code, we can't commit
and push a fix. If the NOS would work, like Windows, Macos or Linux
that you rarely find bugs, a lot of us would opt out from support
contracts, and would just buy spare HW, destroying the vendor's
business case.

I don't think vendors sit in scary skull towers and plans for shit
NOS, I think it's emergent behaviour from how the market is modelled.
And there are ways I think the market could change, but I'm already
venturing too far from the question to explore that topic.

Now when it comes to testing, many claim it is important and it
matters. I'm not convinced. And I don't think people are looking at
this in any formality, it's more like religion, and its utility is to
improve comfort-to-deploy in the organisation, it doesn't do much
towards probability-of- success in my mind. I've worked for companies
who test not at all, companies who boot it in the lab and companies
who had a team doing just testing, and I can't say I've seen different
amounts of TAC cases on software issues.

People who invest lot on testing, and are not comfortable with idea
that value is just 'comfort-to-deploy' (that may be sufficiently
important value), I recommend looking at TAC cases you had which
actually did cause customer outage, then try to evaluate 'was this
reasonable to catch in the lab', try to be honest.
The problem I see, whole NOS quality is shit, it's not so shit that
it's always broken, the problems that manifest require usually more
than one condition, then if you start to do back-of-the-envelope math
on testing everything with every permutations, you will notice no
amount of money can fix the fact that you're limited by
heat-death-of-universe on the wall clock. So now you're venturing into
an area where you gotta choose, what to test and what not to test, and
you don't have nearly enough outages and faults to apply statistical
analysis on it, so you're actually just guessing.
It's religion, which has some utility, but not the utility we think it has.

Note I'm not saying testing wholesale is useless, I'm more saying it
has an exponentially or worse diminishing return. I would say push 1
packet through all your products in the lab, and you're done, you're
as far as you're reasonably gonna get.
And start thinking in terms 'the NOS is shit and I exercise no power
over it', what actions work in that world? Staging pop with real but
outage insensitive subscriptions?

-- 
  ++ytti