[j-nsp] routing updates between PFEs and Kernal
Richard A Steenbergen
ras at e-gerbil.net
Wed Nov 3 15:31:37 EDT 2010
On Wed, Nov 03, 2010 at 11:34:59PM +0500, Good One wrote:
>
> Thanks for an useful information, Richard. Well, a DPC has a 1G ram
> inside and if each PFE has a complete copy of the routing table (even
> the best route) and you are receiving a full feed of internet and a
> thousands of your own routes, then all the 4 PFEs should occupy the 1G
> RAM (I assume all 4 PFEs are using/sharing the DPC 1G ram to store
> routing table) ... not sure how to connect to a PFE individually, all
> I can do is 'start shell pfe network fpc0' which connects you to a DPC
> and not the PFEs sitting somewhere on the DPC :)
Not quite. There are 3 different types of memory on the DPC:
Slot 0 information:
State Online
Temperature 29 degrees C / 84 degrees F
Total CPU DRAM 1024 MB <--- 1
Total RLDRAM 256 MB <--- 2
Total DDR DRAM 4096 MB <--- 3
The CPU DRAM is just general purpose RAM like you'd find on any PC. This
is where the microkernel runs (which is what you're talking to when you
do a "start shell pfe network fpc#"), on a 1.2GHz PowerPC embedded
processor. It also handles things like IP options, TTL expiration, ICMP
generation from the data plane, and the like.
The RLDRAM (reduced latency DRAM) is the "lookup memory", this is where
the final copy of the routing table used for forwarding (called the FIB)
is stored, along with information about firewall rules, etc. This memory
is directly accessed by the forwarding ASICs, and needs to be low
latency in order to keep up with the number of lookups/sec required on a
high speed router.
On older platforms this would typically have been done with SRAM, which
is very fast but also very expensive. On an old M/T box you might have
seen 8MB or 16MB of SRAM per PFE, which could do 20Gbps PFEs handling
2xOC192 (50Mpps+ lookups/sec), but with a capacity of well under 500k
routes in the FIB. The MX (and M120) introduced a new model for doing
routing lookups using RLDRAM, which is much cheaper, and thus you can
put a lot more of it on the PFE.
Each DPC PFE actually has 4x32MB RLDRAM chips, but they run as 2 banks
of 2x32MB mirrored blocks. The first bank holds your routing
information, the second bank holds your firewall information. The
mirroring of 2x32MB is necessary to meet the performance requirements
using the slower RLDRAM, since you can do twice as many lookups/sec if
you have 2 banks to query from.
The MX architecture also makes this easier, since it uses a larger
number of relatively low speed PFEs (4 PFEs of 10G/ea), and is ethernet
only. To support 10GE or 10x1GE you only need to do 14.8Mpps per PFE,
which is a lot easier than the older 20G PFEs on T-series which needed
to do 50Mpps+ to support 2xOC192. This is how the MX is implemented
economically, and still manages to deliver support for well over 1
million FIB entries. The 256MB being reported in the show chassis fpc
output is your 4 PFEs * 64MB worth of available memory, which is really
mirrored banks of 2x32MB each.
Finally, the DDR DRAM is the "packet buffering" memory, which holds the
copy of the packet as it moves through the system. When you receive a
packet, its contents are stored in the packet buffer memory while the
headers of the packet are sent to the I-chip for routing/firewall
lookups. After the result is returned, the egress interface actually
goes out and gathers up all the fragments of the packet necessary to
reassemble and transmit it.
So, your kernel pushes the selected FIB down to the DPC CPU, which in
turn programs all 4 PFEs on the DPC, and then each PFE has its own copy
of the routing table (in a highly optimized form that is directly
accessed by the ASICs) to make decisions from. Also this is completely
different from how the new MX (Trio/3D) cards work. :)
--
Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
More information about the juniper-nsp
mailing list