[nsp] MSFC2 128,000 route limitation

Ian Cox icox@cisco.com
Thu, 22 Aug 2002 17:15:25 -0700


The bug you site has nothing to do with filling up the FIB tcam, it has to 
do with filling up the memory that contains the adjacency rewrite 
information. Filling up either of these two resources will cause problems. 
If the problem you are running into is the DDTs you refer to then an 
accurate description of the problem is:

[snip]

only with a large network configuration in which many prefixes have 
multiple paths to them resulting in adjacency exception condition 
(condition where the NMP runs of adjacency table ) on the NMP. This happens 
because we do not share adjacencies among prefixes even if they have the 
same multiple paths ... Because of constant network updates, some of the 
adjacencies get deleted and when the NMP comes of out adj exception, we 
issue a reload of FIB/ADJ table . This cycles go on and on resulting in the 
high CPU on NMP.

[end snip]

If you only have 50k routes, then it only consume 50k TCAM entries. The 
structure of the forwarding information looks like this:

FIB TCAM               Adjacency Table
+-----------+          +------------------+
| 1.0.0.0/8 | --1:N--< | path1  rewrite 1 |
| 2.1.0.0/16|          | path2  rewrite 2 |
|           |          | path3  rewrite 3 |
+-----------+          +------------------+
256k entries            256k entries

If you have two parallel paths and you had 50k entries with CatOS you would 
consume 50k x 2 entries in the adjacency table. In IOS for the Catalyst 
6000 this is done differently and you would consume just 2 entries since 
the prefixes can share adjacency table entries.


I have not dealt with CatOS on the platform for over 18 months, only IOS 
for the Catalyst 6000. IOS for the Catalyst 6000 does not have this 
problem, it handles the adjacency table programming by allowing prefixes to 
share adjacencies entries. The way to show how much fib tcam has been 
consumed on IOS for the Catalyst 6000 is:

tromso#sh mls cef summary

tromso-sp#
Total CEF switched packets:  0000019424896725
Total CEF switched bytes:    0001857776306791
Total routes:                110568
     IP unicast routes:       110555
     IPX routes:              0
     IP multicast routes:     13
tromso#


Looking at the manual the command for CatOS is "show mls cef" to get the 
equivalent information.

To get the number of entries used in the adjacency table for IOS for 
Catalyst 6000 use:

tromso#sh mls cef adjacency count

tromso-sp#
Total adjacencies:           24
tromso#


The only similar count for adjacency usage I can see for a system running 
CatOS is under "sh polaris fibmgr usage"


Ian



At 06:24 PM 8/22/2002 -0400, Matt Buford wrote:
>On Wed May 15 2002 - 07:53:32 EDT, Ian Cox wrote:
>
> > The TCAM that holds the FIB table is capable of holding 256,000 entries.
> > Without unicast RPF checking turned on the maximum number of unicast
> > entries that can be held in the hardware FIB table is 244,000. The
> > remaining 12,000 entries are reserved for multicast routes. If unicast RPF
> > checking is enabled then the number of routes that are held in the TCAM is
> > halved.
>
> > You can exceed the capacity of the hardware forwarding table, and the
> > consequences are that the routes that are not programmed into the TCAM
>that
> > holds the FIB table will be switched in software by the MSFC2 / RP.
>
>I have apparently ran into this limitation, with much worse consequences
>(running Sup2/MSFC2 hybrid).  The supervisor CPU shot up to 100%, and all
>updates from the MSFC to the supervisor/PFC stopped.  This happened in both
>of a pair of redundant 6500s, bringing both down and leaving me unable to
>bring them back up with a full routing table.
>
>Cisco TAC found bug cscdw89942, and said the internal notes recommend using
>the "set mls cef per-prefix-stats disable" to reduce the number of entries.
>
>It appears that at this point the limitiation is not something to take
>lightly.  Reaching it (at least under Hybrid) apparently brings everything
>down.  There is no software yet available that fixes this, and the only
>workaround is to take measures to reduce your CEF table size (such as
>turning off per-prefix-stats).
>
>For perspective, the routers that failed each see two BGP feeds of full
>Internet routes, as well as about 12 OSPF routes (each of which has 2 or 3
>paths to get there).  This doesn't seem like a particularly large number of
>routes to me, however it certainly passes the limit listed in the bug of
>50,000 routes with dual paths.
>
>Is there anywhere I can get a count of the actual current number of entries
>and/or space free, or is the only way to tell to show the cef table size and
>manually figure out if you need to multiply it if you have unicast RPF on,
>then make sure that is less than 244,000?  I want to go through all my 6500s
>and make sure I'm not about to hit the limit on any of them (some are hybrid
>and some are native).  The thought of  all my 6500s falling over at once and
>staying down because I reached the maximum limit on routes scares me
>greatly.