[j-nsp] QFX DDOS Violations

Saku Ytti saku at ytti.fi
Wed Nov 30 08:54:31 EST 2022


Heh,

That makes sense. So in QFX5k 'VXLAN' classifier can contain anything
inside the VXLAN, like ARP? Instead of it being classified ARP, they
all share VXLAN classifier?

So this could also be VXLAN TTL exceeded? Which would happen every
time you have some kind of convergence event, and you'll get
microloop. Then all these are competing for access to same VXLAN
policer?

This is very broken behaviour :(

22.3R1 appears to address this:
New ARP and NDP packet cѴ-ssbCc-ঞon—We've introduced two CP classes
for ARP and NDP packets received over VTEP interface. When your device
b7;nঞC;s a packet as ARP or NDP, it performs an ingress port check
which v;rbC;s whether the VTEP interface receives these packets. If
VTEP interface receives the packet, datapath re-writes the CP class to
the newly 7;Cn;7 values. Based on this new CP class, the system
performs the remaining packet processing and forwards the packets
toward the host path. The system adds a separate DDoS policer to this
ARP |r-Lcķ which ensures that the ARP |r-Lc is not triggering underlay
ARP DDoS vboѴ-ঞon


I think in order of least to most broken:
a) have separate underlay + overlay policers for every protocol, (ttl,
reject, arp, nd, resolve....)
b) use single overlay policer for all cases of (ttl, reject, arp, nd,
resolve) in and out VXLAN shares same ARP policer
c) don't have any policer
d) collapse multiple different punt reasons under single VXLAN policer



On Wed, 30 Nov 2022 at 15:44, Roger Wiklund <roger.wiklund at gmail.com> wrote:
>
> Hi John
>
> The default DDoS values on QFX5k for EVPN-VXLAN is way too low.
> I recommend these values + very tight storm-control on each applicable port.
>
> RSVP and LDP are not used but share the same queue as BGP so you will see strange triggers if you omit these.
>
> set system ddos-protection protocols rsvp aggregate bandwidth 10000
> set system ddos-protection protocols rsvp aggregate burst 1000
> set system ddos-protection protocols ldp aggregate bandwidth 10000
> set system ddos-protection protocols ldp aggregate burst 1000
> set system ddos-protection protocols bgp aggregate bandwidth 10000
> set system ddos-protection protocols bgp aggregate burst 1000
> set system ddos-protection protocols arp aggregate bandwidth 50000
> set system ddos-protection protocols arp aggregate burst 5000
> set system ddos-protection protocols vxlan aggregate bandwidth 50000
> set system ddos-protection protocols vxlan aggregate burst 5000
>
> The reason you're seeing VXLAN violation is because of EVPN arp/nd suppression.
> Have a look at the VXLAN queue here:
> Detailed information about DDOS queues on QFX5K switches (juniper.net)
>
> Every single ARP/ND packet is proxied by the QFX. If you have an ARP storm the VXLAN DDoS will kick in and rate limit this.
> If this is sustained, arp cache will timeout on each client and eventually break connectivity.
>
> You mitigate this by having a very tight storm-control on each L2 interface.
> We use 1000kbps which translates roughly into 2000pps ARP
>
> set forwarding-options storm-control-profiles sc-1000kbps all bandwidth-level 1000
> set forwarding-options storm-control enhanced
>
> Because storm-control is distributed and mitigated on each port, while the DDoS is aggregated to the RE, this combo works fine and VXLAN is never triggered.
>
> Hope this helps.
>
> Regards
>
> On Wed, Nov 30, 2022 at 2:43 PM Cristian Cardoso via juniper-nsp <juniper-nsp at puck.nether.net> wrote:
>>
>> Hi Johan
>>
>> I experienced a similar issue in my evpn-vxlan environment on QFX5120-48y
>> switches. The DDOS alert occurred whenever a large number of VM migrations
>> occurred simultaneously in my environment, some times there were 20 VM's in
>> simultaneous migration and the DDOS alarmed.
>>
>> To solve this, I set the following value in the configuration:
>>
>> qfx5120> show configuration system ddos-protection protocols
>> vxlan {
>>     aggregate {
>>         bandwidth 10000;
>>         burst 12000;
>>     }
>> }
>>
>>
>>
>> Em qua., 30 de nov. de 2022 às 07:16, john doe via juniper-nsp <
>> juniper-nsp at puck.nether.net> escreveu:
>>
>> > Hi!
>> >
>> > The leaf switches are QFX5k and it seems to be lacking some of the command
>> > you mentioned. We don't have any problem with bgp sessions going down, the
>> > impact is only the payload inside vxlan.
>> >
>> > Protocol Group: VXLAN
>> >
>> >   Packet type: aggregate (Aggregate for vxlan control packets)
>> >     Aggregate policer configuration:
>> >       Bandwidth:        500 pps
>> >       Burst:            200 packets
>> >       Recover time:     300 seconds
>> >       Enabled:          Yes
>> >     Flow detection configuration:
>> >       Flow detection system is off
>> >       Detection mode: Automatic  Detect time:  0 seconds
>> >       Log flows:      Yes        Recover time: 0 seconds
>> >       Timeout flows:  No         Timeout time: 0 seconds
>> >       Flow aggregation level configuration:
>> >         Aggregation level   Detection mode  Control mode  Flow rate
>> >         Subscriber          Automatic       Drop          0  pps
>> >         Logical interface   Automatic       Drop          0  pps
>> >         Physical interface  Automatic       Drop          500 pps
>> >     System-wide information:
>> >       Aggregate bandwidth is no longer being violated
>> >         No. of FPCs that have received excess traffic: 1
>> >         Last violation started at: 2022-11-30 09:08:02 CET
>> >         Last violation ended at:   2022-11-30 09:09:32 CET
>> >         Duration of last violation: 00:01:40 Number of violations: 1508
>> >       Received:  3548252144          Arrival rate:     201 pps
>> >       Dropped:   49294329            Max arrival rate: 160189 pps
>> >     Routing Engine information:
>> >       Bandwidth: 500 pps, Burst: 200 packets, enabled
>> >       Aggregate policer is never violated
>> >       Received:  0                   Arrival rate:     0 pps
>> >       Dropped:   0                   Max arrival rate: 0 pps
>> >         Dropped by individual policers: 0
>> >     FPC slot 0 information:
>> >       Bandwidth: 100% (500 pps), Burst: 100% (200 packets), enabled
>> >       Hostbound queue 255
>> >       Aggregate policer is no longer being violated
>> >         Last violation started at: 2022-11-30 09:08:02 CET
>> >         Last violation ended at:   2022-11-30 09:09:32 CET
>> >         Duration of last violation: 00:01:40 Number of violations: 1508
>> >       Received:  3548252144          Arrival rate:     201 pps
>> >       Dropped:   49294329            Max arrival rate: 160189 pps
>> >         Dropped by individual policers: 0
>> >         Dropped by aggregate policer:   50294227
>> >         Dropped by flow suppression:    0
>> >       Flow counts:
>> >         Aggregation level     Current       Total detected   State
>> >         Subscriber            0             0                Active
>> >
>> > vty)# show ddos scfd proto-states vxlan
>> > (sub|ifl|ifd)-cfg: op-mode:fc-mode:bwidth(pps)
>> > op-mode: a=automatic, o=always-on, x=disabled
>> > fc-mode: d=drop-all, k=keep-all, p=police
>> > d-t: detect time, r-t: recover time, t-t: timeout time
>> > aggr-t: last aggregated/deaggreagated time
>> > idx prot       group        proto mode detect agg flags state   sub-cfg
>> > ifl-cfg   ifd-cfg  d-t  r-t  t-t   aggr-t
>> > --- ----    --------     -------- ---- ------ --- ----- ----- ---------
>> > --------- ---------  ---  ---  ---   ------
>> >  23 6400       vxlan    aggregate auto     no   1     2     0 a:d:    0
>> > a:d:    0 a:d:  500    0    0    0        0
>> >
>> >
>> > Johan
>> >
>> > On Wed, Nov 30, 2022 at 8:53 AM Saku Ytti <saku at ytti.fi> wrote:
>> >
>> > > Hey,
>> > >
>> > > Before any potential trashing, I'd like to say that as far as I am
>> > > aware Juniper (MX) is the only platform on the market which isn't
>> > > trivial to DoS off the network, despite any protection users may have
>> > > tried to configure.
>> > >
>> > > > How do you identify the source problem of DDOS violations that junos
>> > logs
>> > > > for QFX? For example what interface that is causing the problem?
>> > >
>> > > I assume you are talking about QFX10k with Paradise (PE) chipset. I'm
>> > > not very familiar with it, but I know something about it when sold in
>> > > PTX10k quise, but there are significant differences. Answers are from
>> > > the PTX10k perspective. If you are talking about QFX5k many of the
>> > > answers won't apply, but the ukern side answers should help
>> > > troubleshoot it further, certainly with QFX5k the situation is worse
>> > > than it would be on QFX10k.
>> > >
>> > > > DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for
>> > > > protocol/exception  VXLAN:aggregate exceeded its allowed bandwidth at
>> > > fpc 0
>> > > > for 30 times, started at...
>> > > >
>> > > > The configured rate for VXLAN is 500pps, ddos protection is seeing
>> > rates
>> > > > over 150 000pps
>> > >
>> > > Do you mean you've configured:
>> > > 'set system ddos-protection protocols vxlan aggregate bandwidth 500'.
>> > > What exactly are you seeing? What does 'show ddos-protection protocols
>> > > vxlan' say?Also 'start shell pfe network fpcX' + 'show ddos scfd
>> > > proto-states vxlan'
>> > >
>> > > Paradise (unlike Triton and Trio) does not support PPS policing at
>> > > all. So when you configure a PPS policer, what actually gets
>> > > programmed is 500pps*1500B bps. I've tried to argue this is a poor
>> > > default, 64B being superior choice.
>> > > In paradise 500pps would admit 500*(1500/64) or about 12kpps per
>> > > Paradise if those VXLAN packets were small. These would then be
>> > > policed by the LC CPU ukern into 500 pps for all the Paradise chips
>> > > living inside that LC CPU, before sending to RE over bme0.
>> > > After DDoS but before Paradise admits packet to the LC_CPU it goes
>> > > through VoQ, where most packets are classified as VoQ#2 which is
>> > > 10Mbps wide with no burstability (classification, width and
>> > > burstability is being changed on later images). So extremely trivial
>> > > rates will cause congestion on the VoQ#2 and a lot of protocols will
>> > > be competing for 10Mbps access to LC CPU, like BGP, ISIS, OSPF, LDP,
>> > > ND, ARP.
>> > >
>> > > > This is an spine/leaf setup, one theory is that the vxlan traffic that
>> > > most
>> > > > of our QFX boxes are activation ddos protection for is actually vxlan
>> > > > services running inside the vxlans, for example we have kubernetes
>> > > clusters
>> > > > using vxlan. Is that a sane theory?
>> > >
>> > > Not enough information to speculate.
>> > > In many cases ddos classification is wrong. You can review in the PFE,
>> > > 'show filter' => HOSTBOND_IPv4_FILTER then 'show filter index X
>> > > program'. You can also capture punted packets on interface where RE
>> > > meets FPC (I think bme0 here), in the bme0 interface TNP headers are
>> > > in top of the punted packets and in the TNP headers you will see what
>> > > ddos classification was used, you can turn the number into name by
>> > > looking at the 'show ddos scfd proto-statates'.
>> > >
>> > >
>> > > I naively wish I could set my ddos-protocol classification and voq
>> > > classification manually in 'lo0 filter', because the infrastructure
>> > > allows for great protection, but particularly when choosing which VoQ
>> > > packets share there is no obvious single best solution, it depends on
>> > > the environment. Like I could put RSVP, ISIS, LDP on single VoQ, as
>> > > they never compete with customers, BGP in another as they will compete
>> > > with customers and operators for me, and so forth. But of course this
>> > > wish is naive, as the solution the vendor offers is already too
>> > > complex for customers to use and giving more rope would just make the
>> > > mean config worse.
>> > >
>> > > --
>> > >   ++ytti
>> > >
>> > _______________________________________________
>> > juniper-nsp mailing list juniper-nsp at puck.nether.net
>> > https://puck.nether.net/mailman/listinfo/juniper-nsp
>> >
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp



-- 
  ++ytti


More information about the juniper-nsp mailing list