[j-nsp] MX punting packets to RE - why?

Mon Feb 1 22:18:45 EST 2016

Ross-

Change 'fpc0' to 'afeb0' in your failed command.  I got goose eggs, but this lab chassis isn't doing multicast which may play a part.

$user at mx104-lab-re0> request pfe execute target afeb0 command "show nhdb mcast resolve" 
SENT: Ukern command: show nhdb mcast resolve
GOT:
GOT: Nexthop Info:
GOT:    ID      Type    Protocol    Resolve-Rate
GOT: -----  --------  ----------  ---------------
LOCAL: End of file

-Michael

-----Original Message-----
From: juniper-nsp [mailto:juniper-nsp-bounces at puck.nether.net] On Behalf Of Ross Halliday
Sent: Monday, February 1, 2016 2:38 PM
To: Dragan Jovicic <draganj84 at gmail.com>; Saku Ytti <saku at ytti.fi>
Cc: juniper-nsp at puck.nether.net
Subject: Re: [j-nsp] MX punting packets to RE - why?

Hi Saku and Dragan,

Thank you for the responses, and apologies for the ambiguity.

The EXes are our video source switches. PIM RP is shared with MSDP to an anycast address. The MXes connect to the EXes at L3 via BGP - MX1/EX1 link is de-prioritized with a metric. Most of our receivers ride off of MX2, with a few further downstream.

Due to some interop issues and our use of VBR we've settled on a single MDT for this VRF. Being the default MDT it is of course joined on all PEs with this VRF. During normal operation, MX1, which doesn't have any active traffic for this VRF, has a full list of mcast routes with the source interface of the MDT.

So, in the first failure scenario - let's say EX2 or MX2 totally dies - MX1 will lose a preferred BGP route to the RP and sources and see everything over the MX1/EX1 link, so all of the S,G entries will need to be updated from mt-0/0/0.1081344 to xe-0/3/0.3812.

If I am understanding what you guys are saying correctly, this would cause everything to get punted to the CPU until a new hardware shortcut is created, and in the meantime - since our entire channel lineup is in there - this would hammer the DoS protection mechanism?

Can the rate at which the joins are sent out be slowed? I can live with a bit of a delay on the channels coming back to life, but not with the entire slot getting blackholed... I am also open to tweaking the DoS protection settings but it seems to me that a 10x increase would be opening myself up to really slamming the RE and causing even bigger problems. I come from SUP720 world, and I rather like having a box that can process BFD and BGP updates at the same time LOL

The other failure scenario is when the EX1/EX2 link goes down. When this happens, all devices are still up, so as far as BGP or really anything on the MX "knows", nothing has changed. Metric and next-hops are identical to the PEs. Instead of pulling video from the direct link, EX1 & EX2 can only see each other through VLANs that the MXes carry as EoMPLS l2circuits. This is what truly baffles me, as none of what you guys mentioned with regards to should apply to an l2circuit.

Also,
	> request pfe execute target fpc0 command "show nhdb mcast resolve"
	error: command is not valid on the mx104

:(

Thanks for your help guys!

Ross

From: Dragan Jovicic [mailto:draganj84 at gmail.com] 
Sent: Sunday, January 31, 2016 7:44 AM
To: Saku Ytti
Cc: Ross Halliday; juniper-nsp at puck.nether.net
Subject: Re: [j-nsp] MX punting packets to RE - why?

Correct me if I'm wrong, this looks like MX doesn't have multicast cache for all those S,G routes (in inet.1).
So first packet of each S,G entry must first be resolved by kernel and downloaded to PFE.
DDOS feature is activated because large influx of unresolved packets are passing trough the router. You could change default DDOS setting for this type of traffic on your FPC.
Another thing that comes to mind is that kernel itself has limited number of resolves per second, which is 66. That is, 66 different NH S,G entries will be resolved per second.

dj at mx-re0> request pfe execute target fpc0 command "show nhdb mcast resolve" 
SENT: Ukern command: show nhdb mcast resolve
GOT:
GOT: Nexthop Info:
GOT:    ID      Type    Protocol    Resolve-Rate
GOT: -----  --------  ----------  ---------------
GOT:  1927   Resolve        IPv6               66
GOT:  1962   Resolve        IPv4               66
LOCAL: End of file
This is modified by (hidden) knob:

dj at mx-re0# set forwarding-options multicast resolve-rate ?  
Possible completions:
  <resolve-rate>       Multicast resolve rate (100..1000 per second)
{master}[edit]
Mind you, I haven't tested this.
HTH,
Regards

On Sat, Jan 30, 2016 at 12:04 PM, Saku Ytti <saku at ytti.fi> wrote:
Hey Ross,

It's not clear to me if the mcast is only inside the EX or if it's
also on the MX's. And it's not clear to me how the faults impact the
multicast distribution tree. On stable state, do both MX80's have
mcast states for groups? Or only one of them?

Trio maps each multicast group into an input interface, if mismatch
occurs, that is group ingresses from other input interface than the
specified, I believe this causes host punt.

Alas DDoS-protection limits are quite insane, like 20kpps for many
protocols, that's more than NPU=>LC_PCU punting allows for, so it'll
kill pretty much everything. I'd set protocols I don't need to
10-100pps, non-critical protocols I need to 4kpps and critical
protocols I need to 8kpps.
And yes, configure each and every ddos-protocol, it'll inflate the
config quite a bit, but there is always 'set apply-flags omit', which
can be useful way to reduce config cruft about standard-configs you
don't really want to review in normally.

On 29 January 2016 at 23:36, Ross Halliday
<ross.halliday at wtccommunications.ca> wrote:
> Hi list,
>
> I've run into an oddity that's been causing us some issues. First, a diagram!
>
> EX1----EX2
>  |      |
>  |      |
> MX1----MX2
>
> EX1 and EX2 are independent switches (not VC) that run a ton of video traffic. EX4200 on 12.3R8.7
> MX1 and MX2 are MPLS PEs that ingest video and send it out to our network. MX104 on 13.3R4.6
> Several VLANs span EX1 and EX2 as each switch has a server that requires Layer 2 to the other unit. (active/active middleware)
> EX1-EX2 link is direct fiber carrying VLANs
> MX1-MX2 link is MPLS
>
> The MX ports facing the EXes terminate L3 as well as hauling L2:
>
> MX1:
>
>     xe-0/3/0 {
>         description "EX1 xe-3/1/0";
>         flexible-vlan-tagging;
>         hold-time up 5000 down 0;
>         encapsulation flexible-ethernet-services;
>         unit 3810 {
>             description "Backup link between TV switches";
>             encapsulation vlan-ccc;
>             vlan-id-list [ 304 810-811 3810 3813 3821-3822 ];
>         }
>         unit 3812 {
>             description "Video feed 2/2 from head end switch";
>             vlan-id 3812;
>             family inet {
>                 address MX1/31;
>             }
>         }
>     }
>     l2circuit {
>         neighbor MX2 {
>             interface xe-0/3/0.3810 {
>                 virtual-circuit-id 3810;
>                 description "IPTV switch redundant link";
>                 no-control-word;
>             }
>         }
>     }
>
> MX2:
>
>     xe-0/3/0 {
>         description "EX1 xe-0/1/0";
>         flexible-vlan-tagging;
>         hold-time up 5000 down 0;
>         encapsulation flexible-ethernet-services;
>         unit 3810 {
>             description "Backup link between TV switches";
>             encapsulation vlan-ccc;
>             vlan-id-list [ 304 810-811 3813 3821-3822 ];
>         }
>         unit 3811 {
>             description "Video feed 1/2 from head end switch";
>             vlan-id 3811;
>             family inet {
>                 address MX2/31;
>             }
>         }
>     }
>     l2circuit {
>         neighbor MX1 {
>             interface xe-0/3/0.3810 {
>                 virtual-circuit-id 3810;
>                 description "IPTV switch redundant link";
>                 no-control-word;
>             }
>         }
>     }
>
> We have dual L3 feeds from "the switches" to "the routers", and VLANs are carried over an l2circuit should the direct link between EX1 & EX2 bite the dust. It should be noted that MX1 is basically a "backup" - traffic normally flows EX1-EX2-MX2. The goal of this setup is so that we can take out any link and still have our video working.
>
> It works... eventually.
>
> The problem I am running into is that when a fail occurs, or I simply pull a VLAN from the EX1-EX2 link, multicast is suddenly slammed either across or into the MXes. When that happens, I get this lovely message:
>
> jddosd[1527]: DDOS_PROTOCOL_VIOLATION_SET: Protocol resolve:mcast-v4 is violated at fpc 0 for 38 times, started at 2016-01-27 04:59:55 EST
> jddosd[1527]: DDOS_PROTOCOL_VIOLATION_CLEAR: Protocol resolve:mcast-v4 has returned to normal. Violated at fpc 0 for 38 times, from 2016-01-27 04:59:55 EST to 2016-01-27 04:59:55 EST
>
> ...and traffic (maybe of just offending class) on that slot is dumped for a little while.
>
>> show ddos-protection protocols resolve statistics
>
>   Packet type: mcast-v4
>     System-wide information:
>       Bandwidth is no longer being violated
>         No. of FPCs that have received excess traffic: 1
>         Last violation started at: 2016-01-27 04:59:55 EST
>         Last violation ended at:   2016-01-27 04:59:55 EST
>         Duration of last violation: 00:00:00 Number of violations: 38
>       Received:  4496939             Arrival rate:     0 pps
>       Dropped:   2161644             Max arrival rate: 45877 pps
>     Routing Engine information:
>       Policer is never violated
>       Received:  130584              Arrival rate:     0 pps
>       Dropped:   0                   Max arrival rate: 1 pps
>         Dropped by aggregate policer: 0
>     FPC slot 0 information:
>       Policer is no longer being violated
>         Last violation started at: 2016-01-27 04:59:57 EST
>         Last violation ended at:   2016-01-27 04:59:57 EST
>         Duration of last violation: 00:00:00 Number of violations: 38
>       Received:  4496939             Arrival rate:     0 pps
>       Dropped:   2161644             Max arrival rate: 45877 pps
>         Dropped by this policer:      2161644
>         Dropped by aggregate policer: 0
>         Dropped by flow suppression:  0
>       Flow counts:
>         Aggregation level     Current       Total detected   State
>         Subscriber            0             0                Active
>
> Once the thing recovers, everything works again. But I cannot change a VLAN, a spanning tree topology, or work on anything without risking serious impact to my network!
>
> I understand that the 'resolve' protocol means these packets are being sent to the RE.
>
> ...why the hell are they being sent to the RE? Even when there's a change on traffic that gets sent into that l2circuit - shouldn't this just be punted? Who gives a crap what the content is!
>
> Please tell me I am doing something wrong and that the MX104 can actually handle multicast without temporarily disabling an entire slot.
>
> ANY feedback is appreciated!
>
> Thank you
> Ross
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

--
  ++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp

_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp