[j-nsp] routing updates between PFEs and Kernal

Wed Nov 3 15:31:37 EDT 2010

On Wed, Nov 03, 2010 at 11:34:59PM +0500, Good One wrote:
> 
> Thanks for an useful information, Richard. Well, a DPC has a 1G ram 
> inside and if each PFE has a complete copy of the routing table (even 
> the best route) and you are receiving a full feed of internet and a 
> thousands of your own routes, then all the 4 PFEs should occupy the 1G 
> RAM (I assume all 4 PFEs are using/sharing the DPC 1G ram to store 
> routing table) ... not sure how to connect to a PFE individually, all 
> I can do is 'start shell pfe network fpc0' which connects you to a DPC 
> and not the PFEs sitting somewhere on the DPC :)

Not quite. There are 3 different types of memory on the DPC:

Slot 0 information:
  State                                 Online    
  Temperature                        29 degrees C / 84 degrees F
  Total CPU DRAM                   1024 MB <--- 1
  Total RLDRAM                      256 MB <--- 2
  Total DDR DRAM                   4096 MB <--- 3

The CPU DRAM is just general purpose RAM like you'd find on any PC. This 
is where the microkernel runs (which is what you're talking to when you 
do a "start shell pfe network fpc#"), on a 1.2GHz PowerPC embedded 
processor. It also handles things like IP options, TTL expiration, ICMP 
generation from the data plane, and the like.

The RLDRAM (reduced latency DRAM) is the "lookup memory", this is where 
the final copy of the routing table used for forwarding (called the FIB) 
is stored, along with information about firewall rules, etc. This memory 
is directly accessed by the forwarding ASICs, and needs to be low 
latency in order to keep up with the number of lookups/sec required on a 
high speed router.

On older platforms this would typically have been done with SRAM, which 
is very fast but also very expensive. On an old M/T box you might have 
seen 8MB or 16MB of SRAM per PFE, which could do 20Gbps PFEs handling 
2xOC192 (50Mpps+ lookups/sec), but with a capacity of well under 500k 
routes in the FIB. The MX (and M120) introduced a new model for doing 
routing lookups using RLDRAM, which is much cheaper, and thus you can 
put a lot more of it on the PFE.

Each DPC PFE actually has 4x32MB RLDRAM chips, but they run as 2 banks 
of 2x32MB mirrored blocks. The first bank holds your routing 
information, the second bank holds your firewall information. The 
mirroring of 2x32MB is necessary to meet the performance requirements 
using the slower RLDRAM, since you can do twice as many lookups/sec if 
you have 2 banks to query from. 

The MX architecture also makes this easier, since it uses a larger 
number of relatively low speed PFEs (4 PFEs of 10G/ea), and is ethernet 
only. To support 10GE or 10x1GE you only need to do 14.8Mpps per PFE, 
which is a lot easier than the older 20G PFEs on T-series which needed 
to do 50Mpps+ to support 2xOC192. This is how the MX is implemented 
economically, and still manages to deliver support for well over 1 
million FIB entries. The 256MB being reported in the show chassis fpc 
output is your 4 PFEs * 64MB worth of available memory, which is really 
mirrored banks of 2x32MB each.

Finally, the DDR DRAM is the "packet buffering" memory, which holds the 
copy of the packet as it moves through the system. When you receive a 
packet, its contents are stored in the packet buffer memory while the 
headers of the packet are sent to the I-chip for routing/firewall 
lookups. After the result is returned, the egress interface actually 
goes out and gathers up all the fragments of the packet necessary to 
reassemble and transmit it.

So, your kernel pushes the selected FIB down to the DPC CPU, which in 
turn programs all 4 PFEs on the DPC, and then each PFE has its own copy 
of the routing table (in a highly optimized form that is directly 
accessed by the ASICs) to make decisions from. Also this is completely 
different from how the new MX (Trio/3D) cards work. :)

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)