[j-nsp] MX punting packets to RE - why?

Sat Jan 30 06:04:05 EST 2016

Hey Ross,

It's not clear to me if the mcast is only inside the EX or if it's
also on the MX's. And it's not clear to me how the faults impact the
multicast distribution tree. On stable state, do both MX80's have
mcast states for groups? Or only one of them?

Trio maps each multicast group into an input interface, if mismatch
occurs, that is group ingresses from other input interface than the
specified, I believe this causes host punt.

Alas DDoS-protection limits are quite insane, like 20kpps for many
protocols, that's more than NPU=>LC_PCU punting allows for, so it'll
kill pretty much everything. I'd set protocols I don't need to
10-100pps, non-critical protocols I need to 4kpps and critical
protocols I need to 8kpps.
And yes, configure each and every ddos-protocol, it'll inflate the
config quite a bit, but there is always 'set apply-flags omit', which
can be useful way to reduce config cruft about standard-configs you
don't really want to review in normally.

On 29 January 2016 at 23:36, Ross Halliday
<ross.halliday at wtccommunications.ca> wrote:
> Hi list,
>
> I've run into an oddity that's been causing us some issues. First, a diagram!
>
> EX1----EX2
>  |      |
>  |      |
> MX1----MX2
>
> EX1 and EX2 are independent switches (not VC) that run a ton of video traffic. EX4200 on 12.3R8.7
> MX1 and MX2 are MPLS PEs that ingest video and send it out to our network. MX104 on 13.3R4.6
> Several VLANs span EX1 and EX2 as each switch has a server that requires Layer 2 to the other unit. (active/active middleware)
> EX1-EX2 link is direct fiber carrying VLANs
> MX1-MX2 link is MPLS
>
> The MX ports facing the EXes terminate L3 as well as hauling L2:
>
> MX1:
>
>     xe-0/3/0 {
>         description "EX1 xe-3/1/0";
>         flexible-vlan-tagging;
>         hold-time up 5000 down 0;
>         encapsulation flexible-ethernet-services;
>         unit 3810 {
>             description "Backup link between TV switches";
>             encapsulation vlan-ccc;
>             vlan-id-list [ 304 810-811 3810 3813 3821-3822 ];
>         }
>         unit 3812 {
>             description "Video feed 2/2 from head end switch";
>             vlan-id 3812;
>             family inet {
>                 address MX1/31;
>             }
>         }
>     }
>     l2circuit {
>         neighbor MX2 {
>             interface xe-0/3/0.3810 {
>                 virtual-circuit-id 3810;
>                 description "IPTV switch redundant link";
>                 no-control-word;
>             }
>         }
>     }
>
> MX2:
>
>     xe-0/3/0 {
>         description "EX1 xe-0/1/0";
>         flexible-vlan-tagging;
>         hold-time up 5000 down 0;
>         encapsulation flexible-ethernet-services;
>         unit 3810 {
>             description "Backup link between TV switches";
>             encapsulation vlan-ccc;
>             vlan-id-list [ 304 810-811 3813 3821-3822 ];
>         }
>         unit 3811 {
>             description "Video feed 1/2 from head end switch";
>             vlan-id 3811;
>             family inet {
>                 address MX2/31;
>             }
>         }
>     }
>     l2circuit {
>         neighbor MX1 {
>             interface xe-0/3/0.3810 {
>                 virtual-circuit-id 3810;
>                 description "IPTV switch redundant link";
>                 no-control-word;
>             }
>         }
>     }
>
> We have dual L3 feeds from "the switches" to "the routers", and VLANs are carried over an l2circuit should the direct link between EX1 & EX2 bite the dust. It should be noted that MX1 is basically a "backup" - traffic normally flows EX1-EX2-MX2. The goal of this setup is so that we can take out any link and still have our video working.
>
> It works... eventually.
>
> The problem I am running into is that when a fail occurs, or I simply pull a VLAN from the EX1-EX2 link, multicast is suddenly slammed either across or into the MXes. When that happens, I get this lovely message:
>
> jddosd[1527]: DDOS_PROTOCOL_VIOLATION_SET: Protocol resolve:mcast-v4 is violated at fpc 0 for 38 times, started at 2016-01-27 04:59:55 EST
> jddosd[1527]: DDOS_PROTOCOL_VIOLATION_CLEAR: Protocol resolve:mcast-v4 has returned to normal. Violated at fpc 0 for 38 times, from 2016-01-27 04:59:55 EST to 2016-01-27 04:59:55 EST
>
> ...and traffic (maybe of just offending class) on that slot is dumped for a little while.
>
>> show ddos-protection protocols resolve statistics
>
>   Packet type: mcast-v4
>     System-wide information:
>       Bandwidth is no longer being violated
>         No. of FPCs that have received excess traffic: 1
>         Last violation started at: 2016-01-27 04:59:55 EST
>         Last violation ended at:   2016-01-27 04:59:55 EST
>         Duration of last violation: 00:00:00 Number of violations: 38
>       Received:  4496939             Arrival rate:     0 pps
>       Dropped:   2161644             Max arrival rate: 45877 pps
>     Routing Engine information:
>       Policer is never violated
>       Received:  130584              Arrival rate:     0 pps
>       Dropped:   0                   Max arrival rate: 1 pps
>         Dropped by aggregate policer: 0
>     FPC slot 0 information:
>       Policer is no longer being violated
>         Last violation started at: 2016-01-27 04:59:57 EST
>         Last violation ended at:   2016-01-27 04:59:57 EST
>         Duration of last violation: 00:00:00 Number of violations: 38
>       Received:  4496939             Arrival rate:     0 pps
>       Dropped:   2161644             Max arrival rate: 45877 pps
>         Dropped by this policer:      2161644
>         Dropped by aggregate policer: 0
>         Dropped by flow suppression:  0
>       Flow counts:
>         Aggregation level     Current       Total detected   State
>         Subscriber            0             0                Active
>
> Once the thing recovers, everything works again. But I cannot change a VLAN, a spanning tree topology, or work on anything without risking serious impact to my network!
>
> I understand that the 'resolve' protocol means these packets are being sent to the RE.
>
> ...why the hell are they being sent to the RE? Even when there's a change on traffic that gets sent into that l2circuit - shouldn't this just be punted? Who gives a crap what the content is!
>
> Please tell me I am doing something wrong and that the MX104 can actually handle multicast without temporarily disabling an entire slot.
>
> ANY feedback is appreciated!
>
> Thank you
> Ross
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp

-- 
  ++ytti