[j-nsp] MX punting packets to RE - why?

Tue Feb 2 19:18:42 EST 2016

Hello,

> > If I am understanding what you guys are saying correctly, this would cause everything to get punted to the CPU until a new hardware shortcut is created, and in the meantime - since our entire channel lineup is in there - this would hammer the DoS protection mechanism?
>
> Yes, if ingress interface does not match, they will be punted.

Okay, thanks

> > Can the rate at which the joins are sent out be slowed? I can live with a bit of a delay on the channels coming back to life, but not with the entire slot getting blackholed...
>
> What do you mean 'entire slot being blackholed', do you mean losing unrelated control-plane stuff, like BGP/ARP etc?

Yes, on the entire MPC I will see unrelated control plane protocols bounce, eg. spanning-tree. If I recall correctly some protocols are handled by the TRIO chips, right? I don't see any of my BFD-managed ISIS adjacencies drop.

> If yes, just limit the mcast resolve to something reasonable 100pps should be plenty, provided we're not competing with  actual attack traffic.
>
> I would start with ddos-protection fixes and see if it behaves better with more restricted punting.

I assume you're referring to "set forwarding-options multicast resolve-rate", right?

> Further research might involve figuring out if both MX boxes have multicast state with source towards the local EX port, and clients subscribed. So that no convergence is needed.

Interesting concept! Doesn't bother me at all, we like the idea of having our multicast available everywhere anyway.

> It wasn't obvious to me what kind of negative impact you observe when the EX-EX link goes down. How are you now stopping loop in the EX network? You have direct physical link between them, then you have l2circuit as well? But it looks like you're not carrying BPDU over the l2circuit? So if you rely on STP, I'm not entirely sure how the L2 redundancy works, which port is normally being blocked? The actual physical link between switches or the link via l2circuit, since my first guess would be that there would be L2 loop in the topology and nothing to stop it, so I'm not sure I understand why it works at all.

I'm in the habit of running VSTP for everything (the Cisco half of my brain keeps trying to type rapid-pvst+) that isn't a two-port affair. BPDUs are definitely making it through, everything checks out. The paths over the l2circuits are normally blocked via increased interface cost.

One of the VLANs carried as an l2circuit by the MXes between the EXes is actually not spanning-tree controlled, but a "backup" PIM interface. Essentially a clone of the EX-EX direct link, but with higher metric. Unlike the other VLANs this one always has the PIM and BGP adjacency sending traffic over it. The ddos-protection resolve-mcast4 action trips when multicast is slammed over that or one of the VSTP-managed VLANs transitions to a forwarding state.

I can do up a diagram if that would help. I'm really not sure how I'd explain this to JTAC and wanted to get some real-world experience from guys who are working with this stuff.

Thanks for all your help!

Ross