[j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)
mark.tinka at seacom.mu
Sun Feb 2 07:44:37 EST 2020
On 28/Jan/20 09:45, Saku Ytti wrote:
> We should learn to crawl before we take rocket to proxima centauri.
> You don't need ML/AI to find problems in your network, using algorithm
> 'this counter which increments at rate X stopped incrementing or
> started to increment 100 times slower' and 'this counter which does
> not increment, started to increment', and you'll find a lot of
> problems in your network. But do you care about every problem in your
> network, or only problems that customers care about?
> Juniper once in EBC had some really smart academics explaining us
> their ML/AI project which predicts resource needs on a given system.
> They quoted how close they got to real numbers then I asked how does
> it perform against naive system, after explaining by naive system I
> mean system like 'my box has 1M FIB entries so FIB entry uses
> RLDRAM/1M' to extrapolate FIB usage in arbitrary config. They hadn't
> tried this and couldn't tell how well the ML/AI performs against this.
> Can you really train today ML/AI to determine what actually matters? I
> don't think you can, because what actually matters is something that
> impacted customer, and you simply cannot put enough learning data in,
> you don't have nearly enough customer trouble tickets to be able to
> correlate them to network data you're collecting and start predicting
> which complex counter combinations are predicting customer ticket
> But are you at least monitoring how many networks are lost inside your
> network? Delta of input/output? That is fairly trivial to cover _all
> reasons for packet loss_, of course latency/jitter are not covered,
> but still, it covers alot of ground fast. Do you have a single system
> where you collect all data? Have you enrichened the data stuff like
> npu, linecard, city, country, region? Almost no one is doing even very
> basic stuff, so I think ML/AI isn't going to be the low hanging fruit
> any time soon. If you have a single system with lot of labels for
> every counter, you can do a lot with very naive analytics. If you
> don't have the data, you can't do anything with the smartest possible
> system. And I think almost no one is collecting data in such a manner
> that it's actually capitalisable, because we can keep running the
> network with how how we did in 90s, IF-MIB and netflow, in separate
> systems, with no encrichement at all.
For us, between Iris (a South African-written NMS), Kentik and Blue
Planet ROA (formerly Packet Design) gives us plenty of insight into what
our network is doing, what it did, and what it may do. We pay Iris for
their NMS, and this gives us quite a bit of flexibility in what we can
monitor and alert, provided there is way we can get the data off the box.
We don't believe spending too much time and effort in building ML/AI
engines will solve a real problem in our specific network, today.
I'd rather spend time upgrading Iris to support telemetry streaming, as
this has immediate and tangible benefits.
More information about the juniper-nsp