[c-nsp] N7K TCAM utilization WRT full tables
Ryan Rawdon
ryan at u13.net
Tue Sep 4 13:05:25 EDT 2018
Hello list,
With the IPv4 and IPv6 full table sizes increasing at a steady pace, the number of TCAM single-width entries on Nexus 7K for full route installation is over 800,000 (700k-715k for IPv4, and 110k for IPv6), out of 1M entries available.
We have recently seen increasing hardware programming (prefix TCAM insertion) failures on 7009s with M224 (and other) line cards. 800k or 850k entries is below the platform’s capabilities, however TAC confirms that the utilization may not reach the theoretical maximums due to the allocation mechanisms used.
The allocation mechanism (“Spanslogic”) seems to work approximately as follows:
- TCAM is allocated in segments (blocks of TCAM entries) to a given resource need (IPv4, IPv6, mcast, etc)
- As segments fill up, additional segments are allocated to that purpose
- Entries are hashed across segments, so utilization is not even
- Due to hashing, entries cannot be forced into less-utilized segments
Which leads to this failure mode:
- Once there are no more unallocated segments, hardware programming failures will be reported in logging on the default VDC once a TCAM entry is attempted to be programmed to any full segment
- This shows up in the results of the forwarding inconsistency checker (“show forwarding inconsistency” after “test forwarding inconsistency”) those prefixes “[…] missing in FIB Software”
- IPv6 will also be affected, if it has run out of entries in any segments and there are no segments available for allocation
Exacerbated by these additional facets, as we understand them:
- We began filtering some specific routes from IPv4 transit. However, those entries are freed up across all segments based on where they were allocated — so no segments end up being entirely freed up
- There is no way to force TCAM repopulation (e.g. free up all IPv4 TCAM segments and reallocate according to the original allocation strategy, so that more restrictive import policies can lead to more efficient utilization)
- Linecards must be rebooted to force this reprogramming
- Default routes can fail to program, just like any other prefix - and like other prefixes, there is no way to force it into a segment with free entries.
For now our fix is to take a v4 default route, filter ~100k specific prefixes at our edges then reboot all affected line cards in the region; then re-run “test forwarding inconsistency” and verify no failures.
We are not seeing this on our 7004s with M224s yet, however they are close. We have not yet dug into why the 7004s would see this issue onset slightly later — possibly something related to the physical topology of the chassis leading to a slightly different allocation strategy (number of modules, supervisors, fabric, etc), or our slightly different VDC strategy on 7004s (although entire modules are allocated per VDC, so there should not be increased TCAM contention)
Key commands:
show forwarding system internal forwarding ipv4 route summary (look at Insert Fail)
show system internal forwarding info spans summary (Look at the Util Summary section for pool 0 and 1. If the Free column in the Total row is 0, then no segments are available for allocation when any existing segment is full. The histogram at the bottom, “Graph of segments use distribution” can be interesting too, as it shows the number of entries/keys per segment. A bimodal distribution seems to correlate with the failure mode, due to segments filling and not be able to be split into new segments)
Can anyone else using 7K for edge routing share their experience with these constraints? Particularly as it may relate to 2018’s table growth starting to test the limits of these systems. Please mention the chassis, modules and number of VDCs you are using in your reply.
We are uncertain if our N77Ks with M3s would be impacted. Theoretically they have 2M hardware entries available, however various show utilization commands indicate 1M entries currently supported (and therefore similar utilization percentages to our M224s)
Thanks,
Ryan
More information about the cisco-nsp
mailing list