[j-nsp] FPC<->SFM capacity

Wed Feb 22 18:11:06 EST 2006

Question: If you have an M160 with less than 4 SFMs running, is there a 
reduced capacity to any one individual FPC or is the capacity reduction 
only across the capacity of the entire system.

I've asked this about a dozen times, of a dozen different people at 
Juniper, and they have always said "the capacity reduction is only across 
the entire system, if you lose 1 SFM you go down from 160Mpps to 120Mpps 
and that it is, the DX chip takes care of everything else". I never really 
believed that answer, since it didn't seem to jive with how the switch 
fabric should work, but I heard it enough that I stopped arguing it.

But, today I saw a box where a 3rd SFM had failed (if you have enough 
M160s you'll see the sram/sdram on the SFMs go bad on an alarmingly 
regular basis) and traffic off a single FPC2 was being bottlenecked. After 
replacing an SFM and bringing a second one back online, traffic 
immediately shot up, confirming the problem. There were 5 ports (two 2xGE 
and one 1xOC48 PICs) on the affected FPC2, which had the following traffic 
utilization post SFM restoration:

Port 1:  600M in /  250M out
Port 2:   50M in /  300M out
Port 3:  750M in /  280M out
Port 4:  100M in /  800M out
Port 5:  550M in /  250M out
        -----      -----
        2050M in / 1880M out = 3930M in+out

The way that I would have figured this would work is, an FPC1 has a single 
channel of 3.2Gbps to the switch fabric, and an FPC2 has 4x channels to 4 
individual switch fabric modules. Thus while each SFM would have 40Mpps of 
lookup capacity and 25.6Gbps (3.2Gbps * 8 slots from the original M40) of 
switching capacity, each individual SFM<->FPC link would have a limit of 
3.2Gbps, and an M160 running 1 active SFM would have as much switching 
capacity to an FPC2 as it does to an FPC1 (hence explaining why 3 of the 
slots on an M40e FPC2 are forcably blocked).

After examining the profile of the traffic that was being bottlenecked 
while only one SFM was online, the only traffic that appeared to be really 
limited (hard flatline, as opposed to just reduced in utilization because 
of the reduction in traffic coming in to the system) was the INGRESS on 
the GE ports (ports 1 and 3 in this case). During the bottleneck, these 
ports were not accepting one bit past 500Mbps, with a cooresponding 
dropoff in outbound traffic that brought the in+out total to roughly 
3.2Gbps. Port was was in the middle of a normal traffic slope down from 
1000M to 600M during the bottlenecked period, so you can really see that 
the only bottleneck was on the ingress of the GE's.

So, this would appear to confirm the theory about the 3.2Gbps per 
FPC<->SFM channel limitation, but it doesn't answer every question. It was 
my understanding that the 3.2Gbps channel was bidirectional (3.2Gbps each 
way), so that the bottleneck should not have been hit with the above 
traffic configuration. Is the reality just that since the packets have to 
be sprayed across all the FPC's for buffering, realistically the 
limitation WILL be in+out? Since there isn't any active way to measure the 
utilization on these channels, it would be nice if someone could explain 
exactly what the limitations are in a little more detail. It would be 
nicer still if those answers were accurate. :)

And just to add mystery to the whole thing, the show bchip output from an 
SFM, for an FPC2 (pic 2 == 2GE, pic 3 == OC48):

    Pic 2: 8 bit stream @ 125 MHz.
           Stream  8: 155520 Kbits/sec, enabled.
           Stream  9: 155520 Kbits/sec, enabled.
           Stream 10: Not present, disabled.
           Stream 11: Not present, disabled.
    Pic 3: 8 bit stream @ 125 MHz.
           Stream 12: 622080 Kbits/sec, enabled.
           Stream 13: Not present, disabled.
           Stream 14: Not present, disabled.
           Stream 15: Not present, disabled.

And from an FPC1 (pic 1 == 1GE):

    Pic 1: 8 bit stream @ 125 MHz.
           Stream  4: 622080 Kbits/sec, enabled.
           Stream  5: Not present, disabled.
           Stream  6: Not present, disabled.
           Stream  7: Not present, disabled.

Is the bandwidth limitation actually per PIC not per FPC? How the heck do 
those numbers jive with the actual capacity of the PIC? Someone, please 
shed some light on this. :)

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)