[j-nsp] QFX DDOS Violations
john doe
johan.borch at gmail.com
Wed Nov 30 05:14:51 EST 2022
Hi!
The leaf switches are QFX5k and it seems to be lacking some of the command
you mentioned. We don't have any problem with bgp sessions going down, the
impact is only the payload inside vxlan.
Protocol Group: VXLAN
Packet type: aggregate (Aggregate for vxlan control packets)
Aggregate policer configuration:
Bandwidth: 500 pps
Burst: 200 packets
Recover time: 300 seconds
Enabled: Yes
Flow detection configuration:
Flow detection system is off
Detection mode: Automatic Detect time: 0 seconds
Log flows: Yes Recover time: 0 seconds
Timeout flows: No Timeout time: 0 seconds
Flow aggregation level configuration:
Aggregation level Detection mode Control mode Flow rate
Subscriber Automatic Drop 0 pps
Logical interface Automatic Drop 0 pps
Physical interface Automatic Drop 500 pps
System-wide information:
Aggregate bandwidth is no longer being violated
No. of FPCs that have received excess traffic: 1
Last violation started at: 2022-11-30 09:08:02 CET
Last violation ended at: 2022-11-30 09:09:32 CET
Duration of last violation: 00:01:40 Number of violations: 1508
Received: 3548252144 Arrival rate: 201 pps
Dropped: 49294329 Max arrival rate: 160189 pps
Routing Engine information:
Bandwidth: 500 pps, Burst: 200 packets, enabled
Aggregate policer is never violated
Received: 0 Arrival rate: 0 pps
Dropped: 0 Max arrival rate: 0 pps
Dropped by individual policers: 0
FPC slot 0 information:
Bandwidth: 100% (500 pps), Burst: 100% (200 packets), enabled
Hostbound queue 255
Aggregate policer is no longer being violated
Last violation started at: 2022-11-30 09:08:02 CET
Last violation ended at: 2022-11-30 09:09:32 CET
Duration of last violation: 00:01:40 Number of violations: 1508
Received: 3548252144 Arrival rate: 201 pps
Dropped: 49294329 Max arrival rate: 160189 pps
Dropped by individual policers: 0
Dropped by aggregate policer: 50294227
Dropped by flow suppression: 0
Flow counts:
Aggregation level Current Total detected State
Subscriber 0 0 Active
vty)# show ddos scfd proto-states vxlan
(sub|ifl|ifd)-cfg: op-mode:fc-mode:bwidth(pps)
op-mode: a=automatic, o=always-on, x=disabled
fc-mode: d=drop-all, k=keep-all, p=police
d-t: detect time, r-t: recover time, t-t: timeout time
aggr-t: last aggregated/deaggreagated time
idx prot group proto mode detect agg flags state sub-cfg
ifl-cfg ifd-cfg d-t r-t t-t aggr-t
--- ---- -------- -------- ---- ------ --- ----- ----- ---------
--------- --------- --- --- --- ------
23 6400 vxlan aggregate auto no 1 2 0 a:d: 0
a:d: 0 a:d: 500 0 0 0 0
Johan
On Wed, Nov 30, 2022 at 8:53 AM Saku Ytti <saku at ytti.fi> wrote:
> Hey,
>
> Before any potential trashing, I'd like to say that as far as I am
> aware Juniper (MX) is the only platform on the market which isn't
> trivial to DoS off the network, despite any protection users may have
> tried to configure.
>
> > How do you identify the source problem of DDOS violations that junos logs
> > for QFX? For example what interface that is causing the problem?
>
> I assume you are talking about QFX10k with Paradise (PE) chipset. I'm
> not very familiar with it, but I know something about it when sold in
> PTX10k quise, but there are significant differences. Answers are from
> the PTX10k perspective. If you are talking about QFX5k many of the
> answers won't apply, but the ukern side answers should help
> troubleshoot it further, certainly with QFX5k the situation is worse
> than it would be on QFX10k.
>
> > DDOS_PROTOCOL_VIOLATION_SET: Warning: Host-bound traffic for
> > protocol/exception VXLAN:aggregate exceeded its allowed bandwidth at
> fpc 0
> > for 30 times, started at...
> >
> > The configured rate for VXLAN is 500pps, ddos protection is seeing rates
> > over 150 000pps
>
> Do you mean you've configured:
> 'set system ddos-protection protocols vxlan aggregate bandwidth 500'.
> What exactly are you seeing? What does 'show ddos-protection protocols
> vxlan' say?Also 'start shell pfe network fpcX' + 'show ddos scfd
> proto-states vxlan'
>
> Paradise (unlike Triton and Trio) does not support PPS policing at
> all. So when you configure a PPS policer, what actually gets
> programmed is 500pps*1500B bps. I've tried to argue this is a poor
> default, 64B being superior choice.
> In paradise 500pps would admit 500*(1500/64) or about 12kpps per
> Paradise if those VXLAN packets were small. These would then be
> policed by the LC CPU ukern into 500 pps for all the Paradise chips
> living inside that LC CPU, before sending to RE over bme0.
> After DDoS but before Paradise admits packet to the LC_CPU it goes
> through VoQ, where most packets are classified as VoQ#2 which is
> 10Mbps wide with no burstability (classification, width and
> burstability is being changed on later images). So extremely trivial
> rates will cause congestion on the VoQ#2 and a lot of protocols will
> be competing for 10Mbps access to LC CPU, like BGP, ISIS, OSPF, LDP,
> ND, ARP.
>
> > This is an spine/leaf setup, one theory is that the vxlan traffic that
> most
> > of our QFX boxes are activation ddos protection for is actually vxlan
> > services running inside the vxlans, for example we have kubernetes
> clusters
> > using vxlan. Is that a sane theory?
>
> Not enough information to speculate.
> In many cases ddos classification is wrong. You can review in the PFE,
> 'show filter' => HOSTBOND_IPv4_FILTER then 'show filter index X
> program'. You can also capture punted packets on interface where RE
> meets FPC (I think bme0 here), in the bme0 interface TNP headers are
> in top of the punted packets and in the TNP headers you will see what
> ddos classification was used, you can turn the number into name by
> looking at the 'show ddos scfd proto-statates'.
>
>
> I naively wish I could set my ddos-protocol classification and voq
> classification manually in 'lo0 filter', because the infrastructure
> allows for great protection, but particularly when choosing which VoQ
> packets share there is no obvious single best solution, it depends on
> the environment. Like I could put RSVP, ISIS, LDP on single VoQ, as
> they never compete with customers, BGP in another as they will compete
> with customers and operators for me, and so forth. But of course this
> wish is naive, as the solution the vendor offers is already too
> complex for customers to use and giving more rope would just make the
> mean config worse.
>
> --
> ++ytti
>
More information about the juniper-nsp
mailing list