[j-nsp] MX/SCB-E fabric saturation ?

Tue Mar 26 07:51:29 EDT 2013

Hi!

During DDoS attack targeted to one of our customers we experienced
serious drops on one of our MX960/SCB-E routers. 

Topology: DDoS mostly came to this router via one of backbone links
(10x10Gbit aggregate-ethernet, "ingress" ports are distributed among 
six MPC-3D-16XGE) and had to egress towards destination router via port 
on one of MPCs (let's call it "egress" fpc). Well, it's expected that 
there will be packet drops on egress port, may be even on fabric towards 
egress FPC, but what I did not expected is that fabric drops were observed 
between _every_ FPCs. 

For example, drops on FPC0, handling one of "ingress" links shows 
drops not only towards FPC7 (egress), but to other FPCs too: 

Destination FPC Index: 0, Source FPC Index: 0
 Drop statistics:    High priority           Low priority
    Packets:                     0              100640048
    Bytes  :                     0            44394325857
    Pps    :                     0                      0
    Bps    :                     0                      0
Destination FPC Index: 2, Source FPC Index: 0
 Drop statistics:    High priority           Low priority
    Packets:                     0              168074394
    Bytes  :                     0            77508987723
    Pps    :                     0                      0
    Bps    :                     0                      0
Destination FPC Index: 7, Source FPC Index: 0
 Drop statistics:    High priority           Low priority
    Packets:                     0             1419251793
    Bytes  :                     0          1287328070805
    Pps    :                     0                      0
    Bps    :                     0                      0

During this incident, show pfe statistics traffic reported only about 
29Mpps forwarded by this router, while normal load is about 55Mpps. 
After isolating DDoS destination and filtering it out at borders
situation returned back to normal. 

Questions are obvious: are we missing something in configuration ? 
(CoS settings for fabric are default ones, Scheduler: default-fabric,
drop-profile: default-drop-profile). 
Can 11.4R7 with configured chassis fabric redundancy increased-bandwidth 
help in such situations ? 

JunOS version: 11.4R6, if that matters. 

-- 
In theory, there is no difference between theory and practice. 
But, in practice, there is.