[j-nsp] TCAM full on EX8200?

Paul WALL pauldotwall at gmail.com
Thu Oct 13 15:53:54 EDT 2011


On Wed, Oct 12, 2011 at 11:40 AM, Michele Bergonzoni <bergonz at labs.it> wrote:
> THE SHORT QUESTION:
>
> How can I see how full my IPv4 FIB is, on an EX8200 with EX8200-40XS
> linecards and 11.3R2.4 ? I can connect to fpc and give the show commands,
> but I need help interpreting the results.
>
> If it actually turns out to be full, is there something I should do to
> support the 500K routes that I see in the data sheet? I mean e.g. something
> like the sdm templates or "mls cef maximum-routes" in the cisco boxes?

You're looking in the wrong place, TCAM is only used for firewall
filters on the EX. The command you're looking for is "show shim route
lpm-dmm-stats".

On EX8200 the FIB is split across three 2MB blocks of SRAM. The IPv6
table is split across all 3, but the IPv4 table is only split over the
last 2. The problem you're seeing is with the distribution of the routes
across these different blocks, because they're using the stupidest route
distribution algorithm you could possibly imagine. "How stupid?", you
ask? Well, it's really quite simple:

* Prefixes of length /0 through /16 go into the first block
* Prefixes of length /17 through /32 go into the second block

Guess which one it turns out there are more of on the Internet? Ding
ding ding.

Yes, because more than 50% of the global routing table is /24s, the
second block fills up MUCH quicker than the first one. The "512k route
capacity" numbers they quote came from testing on artificially generated
prefixes of equal length distribution, they never bothered to test it
against a real routing table. Oh and by the way, all of your directly
connected hosts with ARP entries are installed a /32s, which further
fills up the block in question.

They can't fix this, the distribution algorithm is integrated into the
ASIC and can't be updated. The EX ASIC isn't actually made by Juniper,
it's third party silicon made by Marvell that they just made JUNOS talk
to via an API, so they can't get it re-spun. The only "fix" is to
increase the amount of SRAM, which they did when they released the -ES
cards a few months ago. These bring you up to 3 blocks of 4MB of SRAM,
with the same absurdly broken distribution algorithm.

When the SRAM is full, you'll start to get your logs flooded with the
"cannot add route to the fib" messages that you just pointed out. Also,
because the ARP entries that become /32s only get installed on the local
linecard where they map to, it is very possible to have one linecard
fill up sooner than the others. This will very quickly cause routing
loops and customer blackholing. Oh and because the RE generated packets
don't have an "ingress PFE" to hit, you won't be able to debug any of
this from CLI commands.

You'll never be able to get a full table on your current cards, they're
defective, and will never be able to perform as advertised. The only
solution is to buy all new (and more expensive) cards, or stop carrying
full tables.

Oh and by the way, Juniper has known about all of this for around 1.5
years now, and has been replacing the cards of only a select few 'big
customers" in order to buy their silence. If they didn't replace yours,
or promised to do so but then backed out of their promise, you're now
fucked.

Enjoy.


More information about the juniper-nsp mailing list