[j-nsp] QFX DDOS Violations
Cristian Cardoso
cristian.cardoso11 at gmail.com
Wed Nov 30 08:41:42 EST 2022
Hi Johan
I experienced a similar issue in my evpn-vxlan environment on QFX5120-48y
switches. The DDOS alert occurred whenever a large number of VM migrations
occurred simultaneously in my environment, some times there were 20 VM's in
simultaneous migration and the DDOS alarmed.
To solve this, I set the following value in the configuration:
qfx5120> show configuration system ddos-protection protocols
vxlan {
aggregate {
bandwidth 10000;
burst 12000;
}
}
Em qua., 30 de nov. de 2022 às 07:16, john doe via juniper-nsp <
juniper-nsp at puck.nether.net> escreveu:
> Hi!
>
> The leaf switches are QFX5k and it seems to be lacking some of the command
> you mentioned. We don't have any problem with bgp sessions going down, the
> impact is only the payload inside vxlan.
>
> Protocol Group: VXLAN
>
> Packet type: aggregate (Aggregate for vxlan control packets)
> Aggregate policer configuration:
> Bandwidth: 500 pps
> Burst: 200 packets
> Recover time: 300 seconds
> Enabled: Yes
> Flow detection configuration:
> Flow detection system is off
> Detection mode: Automatic Detect time: 0 seconds
> Log flows: Yes Recover time: 0 seconds
> Timeout flows: No Timeout time: 0 seconds
> Flow aggregation level configuration:
> Aggregation level Detection mode Control mode Flow rate
> Subscriber Automatic Drop 0 pps
> Logical interface Automatic Drop 0 pps
> Physical interface Automatic Drop 500 pps
> System-wide information:
> Aggregate bandwidth is no longer being violated
> No. of FPCs that have received excess traffic: 1
> Last violation started at: 2022-11-30 09:08:02 CET
> Last violation ended at: 2022-11-30 09:09:32 CET
> Duration of last violation: 00:01:40 Number of violations: 1508
> Received: 3548252144 Arrival rate: 201 pps
> Dropped: 49294329 Max arrival rate: 160189 pps
> Routing Engine information:
> Bandwidth: 500 pps, Burst: 200 packets, enabled
> Aggregate policer is never violated
> Received: 0 Arrival rate: 0 pps
> Dropped: 0 Max arrival rate: 0 pps
> Dropped by individual policers: 0
> FPC slot 0 information:
> Bandwidth: 100% (500 pps), Burst: 100% (200 packets), enabled
> Hostbound queue 255
> Aggregate policer is no longer being violated
> Last violation started at: 2022-11-30 09:08:02 CET
> Last violation ended at: 2022-11-30 09:09:32 CET
> Duration of last violation: 00:01:40 Number of violations: 1508
> Received: 3548252144 Arrival rate: 201 pps
> Dropped: 49294329 Max arrival rate: 160189 pps
> Dropped by individual policers: 0
> Dropped by aggregate policer: 50294227
> Dropped by flow suppression: 0
> Flow counts:
> Aggregation level Current Total detected State
> Subscriber 0 0 Active
>
> vty)# show ddos scfd proto-states vxlan
> (sub|ifl|ifd)-cfg: op-mode:fc-mode:bwidth(pps)
> op-mode: a=automatic, o=always-on, x=disabled
> fc-mode: d=drop-all, k=keep-all, p=police
> d-t: detect time, r-t: recover time, t-t: timeout time
> aggr-t: last aggregated/deaggreagated time
> idx prot group proto mode detect agg flags state sub-cfg
> ifl-cfg ifd-cfg d-t r-t t-t aggr-t
> --- ---- -------- -------- ---- ------ --- ----- ----- ---------
> --------- --------- --- --- --- ------
> 23 6400 vxlan aggregate auto no 1 2 0 a:d: 0
> a:d: 0 a:d: 500 0 0 0 0
>
>
> Johan
>
> On Wed, Nov 30, 2022 at 8:53 AM Saku Ytti <saku at ytti.fi> wrote:
>
> > Hey,
> >
> > Before any potential trashing, I'd like to say that as far as I am
> > aware Juniper (MX) is the only platform on the market which isn't
> > trivial to DoS off the network, despite any protection users may have
> > tried to configure.
> >
> > > How do you identify the source problem of DDOS violations that junos
> logs
> > > for QFX? For example what interface that is causing the problem?
> >
> > I assume you are talking about QFX10k with Paradise (PE) chipset. I'm
> > not very familiar with it, but I know something about it when sold in
> > PTX10k quise, but there are significant differences. Answers are from
> > the PTX10k perspective. If you are talking about QFX5k many of the
> > answers won't apply, but the ukern side answers should help
> > troubleshoot it further, certainly with QFX5k the situation is worse
> > than it would be on QFX10k.
> >
> > > DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for
> > > protocol/exception VXLAN:aggregate exceeded its allowed bandwidth at
> > fpc 0
> > > for 30 times, started at...
> > >
> > > The configured rate for VXLAN is 500pps, ddos protection is seeing
> rates
> > > over 150 000pps
> >
> > Do you mean you've configured:
> > 'set system ddos-protection protocols vxlan aggregate bandwidth 500'.
> > What exactly are you seeing? What does 'show ddos-protection protocols
> > vxlan' say?Also 'start shell pfe network fpcX' + 'show ddos scfd
> > proto-states vxlan'
> >
> > Paradise (unlike Triton and Trio) does not support PPS policing at
> > all. So when you configure a PPS policer, what actually gets
> > programmed is 500pps*1500B bps. I've tried to argue this is a poor
> > default, 64B being superior choice.
> > In paradise 500pps would admit 500*(1500/64) or about 12kpps per
> > Paradise if those VXLAN packets were small. These would then be
> > policed by the LC CPU ukern into 500 pps for all the Paradise chips
> > living inside that LC CPU, before sending to RE over bme0.
> > After DDoS but before Paradise admits packet to the LC_CPU it goes
> > through VoQ, where most packets are classified as VoQ#2 which is
> > 10Mbps wide with no burstability (classification, width and
> > burstability is being changed on later images). So extremely trivial
> > rates will cause congestion on the VoQ#2 and a lot of protocols will
> > be competing for 10Mbps access to LC CPU, like BGP, ISIS, OSPF, LDP,
> > ND, ARP.
> >
> > > This is an spine/leaf setup, one theory is that the vxlan traffic that
> > most
> > > of our QFX boxes are activation ddos protection for is actually vxlan
> > > services running inside the vxlans, for example we have kubernetes
> > clusters
> > > using vxlan. Is that a sane theory?
> >
> > Not enough information to speculate.
> > In many cases ddos classification is wrong. You can review in the PFE,
> > 'show filter' => HOSTBOND_IPv4_FILTER then 'show filter index X
> > program'. You can also capture punted packets on interface where RE
> > meets FPC (I think bme0 here), in the bme0 interface TNP headers are
> > in top of the punted packets and in the TNP headers you will see what
> > ddos classification was used, you can turn the number into name by
> > looking at the 'show ddos scfd proto-statates'.
> >
> >
> > I naively wish I could set my ddos-protocol classification and voq
> > classification manually in 'lo0 filter', because the infrastructure
> > allows for great protection, but particularly when choosing which VoQ
> > packets share there is no obvious single best solution, it depends on
> > the environment. Like I could put RSVP, ISIS, LDP on single VoQ, as
> > they never compete with customers, BGP in another as they will compete
> > with customers and operators for me, and so forth. But of course this
> > wish is naive, as the solution the vendor offers is already too
> > complex for customers to use and giving more rope would just make the
> > mean config worse.
> >
> > --
> > ++ytti
> >
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
More information about the juniper-nsp
mailing list