[nsp] MSFC2 128,000 route limitation

Matt Buford matt@overloaded.net
Mon, 26 Aug 2002 15:27:39 -0400


Thanks again.  Your detailed technical explanations have been invaluable,
and are exactly what I have been trying to get out of TAC.  If only
cisco.com had this detail on the web...  :)

I now have a much better handle on what happened.  As I brought up the new
network, there were 3 equal OSPF paths to the BGP next hop.  Then, load in
the BGP routes and you have ~110000 routes * 3 paths = 330000 adjacencies.

So, between the large routing effeciency gain and the cef concistancy
problems I've been having, I've decided to convert these switches to native
this weekend.  I do have a couple major native bug cases open as well but it
sounds to me like the native architecture is more suited towards a full BGP
view with redundant paths.  Hopefully this will give us reasonably reliable
routing and we'll just live without the advanced HA features for now.

For those that asked, the supervisors in question here have 512 megs of RAM,
are sup2/pfc2/msfc2, and are running CatOS 7.3(1) and IOS 12.1(11b)E4.

Here are some current stats (all taken recently - after running the "set mls
cef per-prefix-stats disable" command):

Total FIB entries:        262144
Allocated FIB entries:    112600
Free FIB entries:         149544
FIB entries used for IP ucast:  112599
FIB entries used for IPX     :       1
FIB entries used for IP mcast:       0

Total adjacencies:        262144
Allocated adjacencies:      1140
Free adjacencies:         261004

Looks like I'm nowhere near filling the adjacency or FIB tables at this
point.  Oh well, I'll convert anyway.

----- Original Message -----
From: "Ian Cox" <icox@cisco.com>
To: "Matt Buford" <matt@overloaded.net>; <cisco-nsp@puck.nether.net>
Sent: Friday, August 23, 2002 7:56 PM
Subject: Re: [nsp] MSFC2 128,000 route limitation


> At 04:36 PM 8/23/2002 -0400, Matt Buford wrote:
> >Excellent.  Thank you.  This type of detailed technical explanation is
> >exactly what I needed.  TAC always gave me descriptions that weren't
> >detailed enough for me to figure out exactly what was happening.  It was
> >clear that something was filling up related to too many routes, but that
was
> >as detailed as it got.
> >
> >Is there any way to see how much memory is used (or free) for the
adjacency
> >table?  As I mentioned before, these boxes didn't really have anything
but
> >dual bgp feeds (of full Internet routes) and OSPF providing two equal
cost
> >paths to each BGP next-hop.  This doesn't seem like a particularly large
or
> >unusual situation to me.  If anything I see this as more of a starting
point
> >configuration for initial deployment - that will only get larger with
time
> >and additional customers.  I understand that the command "set mls cef
> >per-prefix-stats disable" has reduced this (by half?) but I'd sleep
better
> >if I knew how close I was to hitting this limitation again.
>
> It was at the end of my last email "sh polaris fibmgr usage" is the CatOS
> command that has this information.
>
> [snip]
>
> ...
> Total FIB entries:        262144
> Allocated FIB entries:    107840
> Free FIB entries:         154304
> FIB entries used for IP ucast:  107839
>
> ...
>
> Total adjacencies:        262144
> Allocated adjacencies:    216332
> Free adjacencies:          45812
>
> ...
>
> [end snip]
>
>
>
> >These specific switches are running hybrid.  This decision was made
because
> >this situation calls for mostly switching - with very few router
interfaces.
> >Basically there's just two upstream vlan interfaces (two backbones) then
a
> >downstream customer interface.  Other than those interfaces, its all
> >switching.  CatOS's "hitless" upgrades plus the ability to upgrade the
> >routers without affecting switching seemed a big plus.  This allows
router
> >upgrades without affecting even customers single-homed directly to these
> >switches.  Would you say the architectural improvements (such as the
change
> >in adjacency table handling you mentioned) are such that you recommend
> >native IOS in all situations?
>
> IOS for Catalyst 6000 handles large routing tables with parallel paths
much
> more efficiently than CatOS does today. HA is much better in CatOS but
over
> the next year IOS will end on at the same level. Asking me which one to
use
> is not really fair because I'm totally biased towards IOS for Catalyst
6000 :)
>
>
> Ian
>
> >----- Original Message -----
> >From: "Ian Cox" <icox@cisco.com>
> >To: "Matt Buford" <matt@overloaded.net>; <cisco-nsp@puck.nether.net>
> >Sent: Thursday, August 22, 2002 8:15 PM
> >Subject: Re: [nsp] MSFC2 128,000 route limitation
> >
> >
> > >
> > > The bug you site has nothing to do with filling up the FIB tcam, it
has to
> > > do with filling up the memory that contains the adjacency rewrite
> > > information. Filling up either of these two resources will cause
problems.
> > > If the problem you are running into is the DDTs you refer to then an
> > > accurate description of the problem is:
> > >
> > > [snip]
> > >
> > > only with a large network configuration in which many prefixes have
> > > multiple paths to them resulting in adjacency exception condition
> > > (condition where the NMP runs of adjacency table ) on the NMP. This
> >happens
> > > because we do not share adjacencies among prefixes even if they have
the
> > > same multiple paths ... Because of constant network updates, some of
the
> > > adjacencies get deleted and when the NMP comes of out adj exception,
we
> > > issue a reload of FIB/ADJ table . This cycles go on and on resulting
in
> >the
> > > high CPU on NMP.
> > >
> > > [end snip]
> > >
> > > If you only have 50k routes, then it only consume 50k TCAM entries.
The
> > > structure of the forwarding information looks like this:
> > >
> > > FIB TCAM               Adjacency Table
> > > +-----------+          +------------------+
> > > | 1.0.0.0/8 | --1:N--< | path1  rewrite 1 |
> > > | 2.1.0.0/16|          | path2  rewrite 2 |
> > > |           |          | path3  rewrite 3 |
> > > +-----------+          +------------------+
> > > 256k entries            256k entries
> > >
> > > If you have two parallel paths and you had 50k entries with CatOS you
> >would
> > > consume 50k x 2 entries in the adjacency table. In IOS for the
Catalyst
> > > 6000 this is done differently and you would consume just 2 entries
since
> > > the prefixes can share adjacency table entries.
> > >
> > >
> > > I have not dealt with CatOS on the platform for over 18 months, only
IOS
> > > for the Catalyst 6000. IOS for the Catalyst 6000 does not have this
> > > problem, it handles the adjacency table programming by allowing
prefixes
> >to
> > > share adjacencies entries. The way to show how much fib tcam has been
> > > consumed on IOS for the Catalyst 6000 is:
> > >
> > > tromso#sh mls cef summary
> > >
> > > tromso-sp#
> > > Total CEF switched packets:  0000019424896725
> > > Total CEF switched bytes:    0001857776306791
> > > Total routes:                110568
> > >      IP unicast routes:       110555
> > >      IPX routes:              0
> > >      IP multicast routes:     13
> > > tromso#
> > >
> > >
> > > Looking at the manual the command for CatOS is "show mls cef" to get
the
> > > equivalent information.
> > >
> > > To get the number of entries used in the adjacency table for IOS for
> > > Catalyst 6000 use:
> > >
> > > tromso#sh mls cef adjacency count
> > >
> > > tromso-sp#
> > > Total adjacencies:           24
> > > tromso#
> > >
> > >
> > > The only similar count for adjacency usage I can see for a system
running
> > > CatOS is under "sh polaris fibmgr usage"
> > >
> > >
> > > Ian
> > >
> > >
> > >
> > > At 06:24 PM 8/22/2002 -0400, Matt Buford wrote:
> > > >On Wed May 15 2002 - 07:53:32 EDT, Ian Cox wrote:
> > > >
> > > > > The TCAM that holds the FIB table is capable of holding 256,000
> >entries.
> > > > > Without unicast RPF checking turned on the maximum number of
unicast
> > > > > entries that can be held in the hardware FIB table is 244,000. The
> > > > > remaining 12,000 entries are reserved for multicast routes. If
unicast
> >RPF
> > > > > checking is enabled then the number of routes that are held in the
> >TCAM is
> > > > > halved.
> > > >
> > > > > You can exceed the capacity of the hardware forwarding table, and
the
> > > > > consequences are that the routes that are not programmed into the
TCAM
> > > >that
> > > > > holds the FIB table will be switched in software by the MSFC2 /
RP.
> > > >
> > > >I have apparently ran into this limitation, with much worse
consequences
> > > >(running Sup2/MSFC2 hybrid).  The supervisor CPU shot up to 100%, and
all
> > > >updates from the MSFC to the supervisor/PFC stopped.  This happened
in
> >both
> > > >of a pair of redundant 6500s, bringing both down and leaving me
unable to
> > > >bring them back up with a full routing table.
> > > >
> > > >Cisco TAC found bug cscdw89942, and said the internal notes recommend
> >using
> > > >the "set mls cef per-prefix-stats disable" to reduce the number of
> >entries.
> > > >
> > > >It appears that at this point the limitiation is not something to
take
> > > >lightly.  Reaching it (at least under Hybrid) apparently brings
> >everything
> > > >down.  There is no software yet available that fixes this, and the
only
> > > >workaround is to take measures to reduce your CEF table size (such as
> > > >turning off per-prefix-stats).
> > > >
> > > >For perspective, the routers that failed each see two BGP feeds of
full
> > > >Internet routes, as well as about 12 OSPF routes (each of which has 2
or
> >3
> > > >paths to get there).  This doesn't seem like a particularly large
number
> >of
> > > >routes to me, however it certainly passes the limit listed in the bug
of
> > > >50,000 routes with dual paths.
> > > >
> > > >Is there anywhere I can get a count of the actual current number of
> >entries
> > > >and/or space free, or is the only way to tell to show the cef table
size
> >and
> > > >manually figure out if you need to multiply it if you have unicast
RPF
> >on,
> > > >then make sure that is less than 244,000?  I want to go through all
my
> >6500s
> > > >and make sure I'm not about to hit the limit on any of them (some are
> >hybrid
> > > >and some are native).  The thought of  all my 6500s falling over at
once
> >and
> > > >staying down because I reached the maximum limit on routes scares me
> > > >greatly.
> > >
>