[j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Saku Ytti saku at ytti.fi
Tue Jan 28 02:45:31 EST 2020


On Mon, 27 Jan 2020 at 22:30, <adamv0025 at netconsultings.com> wrote:

> Then nowadays there's also the possibility to enable tons upon tons of streaming telemetry -where I could see it all landing in a common data lake where some form of deep convolutional neural networks could be used for unsupervised pattern/feature learning, -reason being I'd like the system to tell me look if this counter is high and that one too and this is low then this usually happens. But I'd rather wait to see what the industry offers in this area than developing such solutions internally. For now I'm glad I have automation projects going, when I asked whether we should have AI in network strategy for 2020 I got awkward silence in response.

We should learn to crawl before we take rocket to proxima centauri.

You don't need ML/AI to find problems in your network, using algorithm
'this counter which increments at rate X stopped incrementing or
started to increment 100 times slower' and 'this counter which does
not increment, started to increment', and you'll find a lot of
problems in your network. But do you care about every problem in your
network, or only problems that customers care about?

Juniper once in EBC had some really smart academics explaining us
their ML/AI project which predicts resource needs on a given system.
They quoted how close they got to real numbers then I asked how does
it perform against naive system, after explaining by naive system I
mean system like 'my box has 1M FIB entries so FIB entry uses
RLDRAM/1M' to extrapolate FIB usage in arbitrary config. They hadn't
tried this and couldn't tell how well the ML/AI performs against this.

Can you really train today ML/AI to determine what actually matters? I
don't think you can, because what actually matters is something that
impacted customer, and you simply cannot put enough learning data in,
you don't have nearly enough customer trouble tickets to be able to
correlate them to network data you're collecting and start predicting
which complex counter combinations are predicting customer ticket
later.

But are you at least monitoring how many networks are lost inside your
network? Delta of input/output? That is fairly trivial to cover _all
reasons for packet loss_, of course latency/jitter are not covered,
but still, it covers alot of ground fast. Do you have a single system
where you collect all data? Have you enrichened the data stuff like
npu, linecard, city, country, region? Almost no one is doing even very
basic stuff, so I think ML/AI isn't going to be the low hanging fruit
any time soon. If you have a single system with lot of labels for
every counter, you can do a lot with very naive analytics. If you
don't have the data, you can't do anything with the smartest possible
system. And I think almost no one is collecting data in such a manner
that it's actually capitalisable, because we can keep running the
network with how how we did in 90s, IF-MIB and netflow, in separate
systems, with no encrichement at all.

-- 
  ++ytti


More information about the juniper-nsp mailing list