[j-nsp] Junos Telemetry Interface

Jeffrey Haas jhaas at juniper.net
Wed Apr 22 14:50:10 EDT 2020


Note that I'm not speaking for my employer's implementation here.  I don't know enough of the code in question to give a deep dive answer into the issue from top of my head.

That said, I've spent a number of years working on data aggregation tools in a prior life.

Getting a device that has a lot of varying workloads in a non-realtime OS to spit out something on strict timers is often challenging at the best of times.  Telemetry in particular often has to be serviced at a rate that means that it's prone to slippage into bins the next sample over.  This is usually apparent when you see a zero-bin directly adjacent to a bin that appears to be doubly counted.

Longer sampling windows help this.  On the producer, sadly, this sometimes means that you simply have more data binned with similar slippage depending on when timers fire.  This typically means that the smoothing operations are best done on the consumer of the data.

---

Generalities aside, I'll see if I can find someone to comment on this thread.  Because I've been the person trying to consume this state, and it'd drive me nuts too!

If I had to guess, we're seeing timer and bin size mismatches between the PFE exporter and the RE local aggregator.

-- Jeff


> On Apr 22, 2020, at 9:06 AM, Martin Tonusoo <martin at jumation.com> wrote:
> 
> Hi Aaron,
> 
>> I tried decimals and zero to see what would happen, seems that 1 is the
> lowest.
> 
> Looks like it is possible to configure 0 as a reporting-rate
> using ephemeral database, but then the device simply does not send any
> telemetry data.
> 
> 
> I also did some further testing with Grafana and it looks like the 5 second
> aggregation mentioned in my previous e-mail is too short time-interval for
> Junos telemetry data. I built a small setup where the server sent exactly
> 10 ICMP "echo request" packets with 1472 byte payload in each second to vMX
> router using the ge-0/0/1.88 interface. There was no other traffic on that
> link. Telemetry data was exported from the vMX over another link using the
> shortest possible "reporting-rate". 30 second screencast with 5 second
> aggregation can be seen here: https://urldefense.com/v3/__https://i.imgur.com/Cfn6Lwp.gif__;!!NEt6yMaO-gk!V6SwreFZy_Qf-A4rbW2HmKNyNClNOcJfKpcmTUiwVIEDW-W8DTh0y9vX4ebhJec$  ..and with 30
> second aggregation: https://urldefense.com/v3/__https://i.imgur.com/OSKPSYr.gif__;!!NEt6yMaO-gk!V6SwreFZy_Qf-A4rbW2HmKNyNClNOcJfKpcmTUiwVIEDW-W8DTh0y9vXxxcz-ho$  With 5 second
> aggregation the graph is cleary way too choppy and while it's better with
> 30 second interval, then ideally the graph should be a flat line at 120
> kbps.
> There are probably technical reasons for this, but it's weird that PFE
> sensors telemetry data in Junos is exported that infrequently. Especially
> native sensors which are exported by PFE directly.
> 
> 
> WBR,
> Martin
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://urldefense.com/v3/__https://puck.nether.net/mailman/listinfo/juniper-nsp__;!!NEt6yMaO-gk!V6SwreFZy_Qf-A4rbW2HmKNyNClNOcJfKpcmTUiwVIEDW-W8DTh0y9vXvFY7hdc$



More information about the juniper-nsp mailing list