[j-nsp] ACL for lo0 template/example comprehensive list of 'things to think about'?

Thu Jul 12 13:17:50 EDT 2018

Hey,

> This one I was not aware of actually, so you say that theoretically aggregate from all LPTS policers can be more than what a single worker queue can handle resulting in tail-drops (well assuming that the hashing is imperfect congesting this one worker queue), is that right?

I'm saying in practice traffic from single NPU LPTS admitted can be
more than single XIPC worker can handle. The fix for this is rather
involved/complicated, where as Juniper approach is 'if you punt it,
you shouldn't drop it' and I think Cisco should adopt similar strategy
and investigate why XIPC worker performance is so variant (because it
doesn't have scheduling priority and they do not dare to use kernel
scheduling priorities because they've been hurt before).

> But what is the theoretical probability of that happening in production? I mean the hash and packet keys would need just line up to result in very bad distribution resulting in congestion of one of the 8 queues.

Happens about twice a month for years.

> > Both A and C are being fixed, thanks CSCO. But I'm not very happy how they
> > chose to fix it.
> >
> How do they plan on fixing that please?

I'm not sure I'm at liberty to tell. But I don't agree with it, but I
understand rationale and rationale applies to all vendors, not just
Cisco. The problem with fixing things correctly is that short term you
break more, and dividends are paid over time. Commercial software is
driven to ever increasing technological debt, because it's safe and
simple to spot fix specific issue's symptoms, rather than to address
the architectural shortcomings that lead to it.
I don't have solution, if I'd provide commercial software, I'd almost
certainly end up in same situation as software is mature enough.

-- 
  ++ytti