[j-nsp] Automation - The Skinny (Was: Re: ACX5448 & ACX710)

Mon Jan 27 15:30:51 EST 2020

> From: Robert Raszuk <robert at raszuk.net>
> Sent: Sunday, January 26, 2020 10:18 PM
> 
> Hi Adam,
> 
> I would almost agree entirely with you except that there are two completely
> different reasons for automation.
> 
> One as you described is related to service provisioning - here we have full
> agreement.
> 
> The other one is actually of keeping your network running. Imagine router
> maintaining entire control plane perfectly fine, imagine BFD working fine to
> the box from peers but dropping between line cards via fabric from 20% to
> 80% traffic. Unfortunately this is not a theory but real world :(
> 
Very good point Robert,
There are indeed two parts to the whole automation story 
(it' obvious that this theme deserve a series of blog posts, but I keep on finding excuses).

The analogy I usually use in presentations is the left brain right brain analogy,
Where left brain is responsible for logical thinking  and right brain is responsible for creative thinking and intuition.
So a complete automation solution is built similarly:
Left brain is responsible for routine automated service provisioning 
- and contains models of resources, services, devices, workflows, policies -and you can teach it by loading new/additional models.
Right brain on the other hand is responsible for "self-driving" the network (yeah I know can't think of better term)
- and collects data from network and acts on distributed policies, and also performs trending, analytics, correlation, arbitration etc...  
Now left brain and right brain talk to each other obviously,
Policies are defined in left brain and distributed to right brain to act on them.
Also right brain can trigger workflows in left brain. 

Major paradigm shift for our service designers here will be that they are now going to be responsible not only for putting the individual service building blocks together in term of config (and service lifecycle workflow -tbd), but also in terms of policies - determining the health of the provisioned service (including thresholds, post-checks, ongoing checks etc...)
But following the MDE (Model driven Engineering) theme it's not just service designers contributing to the policy library, it's Ops teams, Security teams, etc...
Main advantage is see is that some of the policies that will be created for the soon to be automated service certification testing could then be reused for the particular service provisioning post-test and service lifecycle monitoring and vice versa.    
Then obviously there are policies defining what to do in various DDoS scenarios, and I consider the vendor solutions actually doing analytics, correlation, arbitration all part of the left brain).

> Without proper automation in place going way above basic IGP, BGP, LDP,
> BFD etc ... you need a bit of clever automation to detect it and either alarm
> noc or if they are really smart take such router out of the SPF network wide.
> If not you sit and wait till pissed customers call - which is already a failure.
> 
Then nowadays there's also the possibility to enable tons upon tons of streaming telemetry -where I could see it all landing in a common data lake where some form of deep convolutional neural networks could be used for unsupervised pattern/feature learning, -reason being I'd like the system to tell me look if this counter is high and that one too and this is low then this usually happens. But I'd rather wait to see what the industry offers in this area than developing such solutions internally. For now I'm glad I have automation projects going, when I asked whether we should have AI in network strategy for 2020 I got awkward silence in response. 

> Sure not everyone needs to be great coder ... but having network eng with
> skills sufficient enough to understand code, ability to debug it or at min
> design functional blocks of the automation routines are really must have
> today.
> 
I don't know, my experience is that working in tandem with a devops person (as opposed to trying to figure it myself) gets me the desired results much faster (and in line with whatever their sys-architecture guidelines or coding principles are) while I can focus on WHAT (from the network perspective) not HOW (coding/system perspective). Although yes for some of the POC stuff I wish I had some coding skills. 
But to give you a concrete example from my work, when I had a choice to read some python books or some more microservice architecture books I chose the latter as it was more important for me to know the difference between for instance orchestration and choreography among other aspects of microservice architectures to assess the pros and cons of each in order to make an educated argument for the service workflow engine architecture choice - so it lines up with what I had in mind for service layer workflows flexibility/agility.  

> And I am not even mentioning about all of the new OEM platforms with OS
> coming from completely different part of the world :) That's when the real
> fun starts and rubber hits the road when network eng can not run gdb on a
> daily basis.
> 
Well we are starting to get a glimpse of it already with VM of a Route-Reflector running on a server - who owns the host (HW & SW) is it sys-admins or ip-ops, which Mark could shed some light on based on his experience running vRRs. 
But I guess my argument stands in this area as well, once you develop a successful OEM HW and NOS match that gets traction within the company you're no longer a network engineer as you've been promoted to full-time vendor of this product (dealing with bugs, new features, the overall support of this inhouse built platform).
So I stay by my MDE mantra -I'd rather stay as a SME for the networking side of things on the project and let devops/sysadmins do what they are best at.

adam