[j-nsp] BGP output queue priorities between RIBs/NLRIs

Tue Nov 10 15:34:44 EST 2020

On Tue, 10 Nov 2020, Robert Raszuk wrote:

> But what seems wired is last statement: 
> 
> "This has problems with blackholing traffic for long periods in several
> cases,..." 
> 
> We as the industry have solved this problem many years ago, by clearly
> decoupling connectivity restoration term from protocol convergence term. 

Fundamentally, yes -- but not for EVPN DF elections.  Each PE making its 
own decisions about who wins without any round-trip handshake agreement is 
the root of the problem, at least when coupled with all of the fun that 
comes with layer 2 flooding.

There's also no binding between whether a PE has actually converged and 
when it brings up IRBs and starts announcing those routes, which leads to 
a different sort of blackholing.  Or in the single-active case, whether 
the IRB should even be brought up at all, which leads to some really dumb 
traffic paths.  (Think layer 3 via P -> inactive PE -> same P, different 
encapsulation -> active PE -> layer 2 segment, for an example.)

> I think this would be a recommended direction not so much to mangle BGP code
> to optimize here and in the same time cause new maybe more severe issues
> somewhere else. Sure per SAFI refresh should be the norm, but I don't think
> this is the main issue here. 

Absolutely.  The reason for the concern here is that the output queue 
priorities would be sufficient to work around the more fundamental flaws, 
if not for the fact that they're largely ineffective in this exact case.

-Rob