[c-nsp] Hardware limitations on SUP32 with LDP and full routing table

Marcus.Gerdon Marcus.Gerdon at versatel.de
Fri Jan 23 03:55:44 EST 2009


Hi Rodney,

it might be a problem that one customer wants to behave it in one way and the next customer in another.

I tend to think about this a bit different.

If RIB and FIB differ and by that forwarding isn't done along the paths the protocols selected it is a bug.
Either there's a 'fail-over' mode into software when TCAM is full - which is what documentation and error message say - or there isn't.

Reorganization of the TCAM seriously affects traffic forwarding - I've already verified that by the various 'cef inconsistency' options. So that's no real option to implement.

Compressing the TCAM entries... that might provide a way to put more prefixes into the TCAM.

But in my opinion that's not really necessary. The TCAM serves roughly 250k entries. That's what it is designed and built for. All I expect is that fallback to work correctly. 

What about this: create a global configuration command like 'mls cef min-prefix-len' that allows for limitiation of the minimum prefix length which is entered into the TCAM. Limiting the TCAM for example to /16 all the blocks required for 15 and less would be available for more specifics. And all the 15's and shorter prefixes would be handled in software. That way the order of most-specific first could be kept.

Most customers aren't complaining about the hardware forwarding to being to support a DFZ table, do they ? They complain about fallback into software not working in a stable manner. How many cases have already been raised with Cisco due to that ?


kind regards,

Marcus
 

> -----Ursprüngliche Nachricht-----
> Von: Rodney Dunn [mailto:rodunn at cisco.com] 
> Gesendet: Donnerstag, 22. Januar 2009 21:51
> An: Oliver Dewdney
> Cc: Rodney Dunn; Marcus.Gerdon; cisco-nsp at puck.nether.net
> Betreff: Re: [c-nsp] Hardware limitations on SUP32 with LDP 
> and full routing table
> 
> On Thu, Jan 22, 2009 at 07:07:13PM +0000, Oliver Dewdney wrote:
> > 
> > 
> > >-----Original Message-----
> > >From: cisco-nsp-bounces at puck.nether.net [mailto:cisco-nsp-
> > >bounces at puck.nether.net] On Behalf Of Rodney Dunn
> > >
> > >I'm by no means a TCAM expert but it seems you are asking for the
> > >excpetion state when the TCAM gets full to allow the more specific
> > >in somehow so that the longest prefix is matched?
> > 
> > I believe this is what the documentation says will happen.
> 
> Can you point me to that so I can learn what it says?
> 
> > 
> > 
> > >These exception cases come with all kinds of problems. One customer
> > >wants it one way and the next customer wants it another so 
> you never
> > >win.
> > 
> > If the FIB says that it will routed one way then the 
> implementation in the RIB should do that.
> > 
> > >I suspect the punt is coming from the next hop not being resolved
> > >to a valid adjacency or if there is no hit at all. If the search
> > >is done in hardware and it has no knowledge, due to TCAM full, of
> > >the less specific it can't forward based on that.
> > 
> > So the problem is inserting a new TCAM entry into a area 
> that is already full, I guess it needs to rebuild the whole 
> TCAM allocating more space to the area where the insert needs 
> to me - maybe loading a whole TCAM means that there is a time 
> where there is an incomplete RIB so what does the hardware 
> forwarder do with the packets? - punt them all to the CPU!?
> >
> 
> Some customers want what to punt on TCAM full some don't. If 
> it's a single
> point of failure they want it sometimes to do the best it 
> can, the next
> customer says bring it down so I know about it, some say bring it down
> to reroute in a dual path scenario. There is no "best" answer.
> 
>  
> > 
> > >I'm not sure there is a way to solve the problem you 
> describe without
> > >more TCAM space. We don't do any kind of RIB compression.
> > 
> > Moderate compression will make a few full internet feeds 
> fit into the smaller TCAM.
> >
> 
> I pushed for that once years ago but what was found was that the
> code to do it was so complex it was decided against given the newer
> hw forwarding engines could handle it. One can always say 
> that is a self
> serving answer but after number threads about it I myself 
> realized it's
> much more complex than it appears on the surface.
> 
> Rodney
> 
>  
> > >Rodney
> > >
> > >
> > >On Thu, Jan 22, 2009 at 06:15:13PM +0100, Marcus.Gerdon wrote:
> > >> Hi Jose,
> > >> Hi Marek,
> > >>
> > >> I'm facing the same symptom with loosing connectivity on 
> a couple of
> > >machines for quite some time. With a DFZ table the TCAM's 
> are simply
> > >overloaded.
> > >>
> > >> I've been able to track that down but for some weeks now 
> Cisco can't
> > >provide any solution.
> > >>
> > >> The problem itself isn't that complex:
> > >>
> > >> When FIB is built (powered up or routing protocols come 
> up; 'clear ip
> > >route *' also works - no reload required) the forwarding 
> entries are
> > >created in ordered sequence in the TCAM, longest prefixes first.
> > >>
> > >> Each packets destination is first looked up in the TCAM. 
> Only if the
> > >TCAM doesn't provide a next hop the software FIB is 
> queried. As TCAM is
> > >walked sequentially, the longest match is found first and 
> next-hop is
> > >successfully determined. Only if no TCAM entry is found 
> it's swictehd
> > >over to look into the software-only tables.
> > >>
> > >> Think about a /16 being available at startup and entered into the
> > >TCAM. At some later time a more-specific /24 shows up in 
> the routing
> > >tables. Whilst trying to create a forwarding entry it is 
> determined that
> > >the TCAM isn't capable of holding the additional /24 as the area
> > >organzied for 24's at time of the initial population is 
> full ('sh mls
> > >cef masks' and some investigations shows this and it's even
> > >reproducable).
> > >>
> > >> Due to that only a software entry (you can check with 
> 'sh ip cef') is
> > >created, but none in the TCAM (check with 'sh mls cef').
> > >>
> > >> Now we have a TCAM and software entry for /16 and the 
> overlapping /24
> > >only in software.
> > >>
> > >> When looking up an address within the /24 TCAM is 
> queried first and
> > >finds the /16 record. As a match is found, software isn't 
> queried at
> > >all.
> > >>
> > >> Seems like somewhere the process inserting a prefix in 
> the middle of
> > >the TCAM and reordering it if needed is broken. I've tried to work
> > >around using the cef consistency checks, but although 
> they're working at
> > >large, a few hundered ms jitter is produced each time TCAM 
> is ordered.
> > >I've disabled it again soon after as customers got to 
> complain regarding
> > >applications disconnecting due to the introduced jitter.
> > >>
> > >> If someones has an idea or even better a solution (or gets some
> > >definitive answer from Cisco - my case is open for some 
> time now and the
> > >engineer told to going for reproducing this in the lab) 
> please let me
> > >know.
> > >>
> > >>
> > >>
> > >>
> > >> kind regards,
> > >>
> > >> Marcus
> > >>
> > >>
> > >> > -----Urspr?ngliche Nachricht-----
> > >> > Von: cisco-nsp-bounces at puck.nether.net
> > >> > [mailto:cisco-nsp-bounces at puck.nether.net] Im Auftrag von Marek
> > >Tyban
> > >> > Gesendet: Donnerstag, 22. Januar 2009 15:25
> > >> > An: Jose
> > >> > Cc: cisco-nsp at puck.nether.net
> > >> > Betreff: Re: [c-nsp] Hardware limitations on SUP32 with LDP
> > >> > and full routing table
> > >> >
> > >> >
> > >> > Hi Jose,
> > >> >
> > >> > I think that generally SUP32 isn't suitable for todays 
> full internet
> > >> > routing table. It's due to the hardware limitations 
> (as you wrote).
> > >> >
> > >> > When you have full routes on SUP32 you should see log output as
> > >below
> > >> >
> > >> > %MLSCEF-SP-4-FIB_EXCEPTION_THRESHOLD: Hardware CEF entry
> > >> > usage is at 95%
> > >> > capacity for IPv4 unicast protocol.
> > >> >
> > >> > %CFIB-SP-7-CFIB_EXCEPTION: FIB TCAM exception, Some 
> entries will be
> > >> > software switched
> > >> >
> > >> > I have seen similar troubles with some sites/networks weren't
> > >> > reachable
> > >> > throught SUP720-3B (non XL) routers, but the routing and CEF
> > >> > table were
> > >> > correct.
> > >> >
> > >> > Regards,
> > >> > Marek
> > >> >
> > >> > On Wed, 21 Jan 2009, Jose wrote:
> > >> >
> > >> > > I was wondering if I could get some additional opinions on
> > >> > a case I have open
> > >> > > with Cisco.  We have recently started turning up LDP on
> > >> > various links out
> > >> > > towards some routers that are being converted to act as
> > >> > PEs.  The core is all
> > >> > > connected together and has been running LDP on those
> > >> > particular links for
> > >> > > over 8 months.
> > >> > >
> > >> > > This past weekend we turned up LDP on a link to one of our
> > >> > remote cities and
> > >> > > we received sporadic complaints that some customers
> > >> > couldn't access any
> > >> > > sites/addresses if the path was via one of our P routers.
> > >> > If traffic was
> > >> > > through any other path on the network it was fine.
> > >> > Traceroutes to & from
> > >> > > this P router showed were unsuccessful even though the
> > >> > routing table and LFIB
> > >> > > all showed the correct information.  Turning off LDP across
> > >> > this link
> > >> > > resolved the problem for the customers.
> > >> > >
> > >> > > After opening up the TAC case and lots of troubleshooting
> > >> > they showed us
> > >> > > this:
> > >> > >
> > >> > > Without LDP on:
> > >> > > frort04#sh ip cef exact-route 172.17.0.254 68.179.73.86
> > >> > > 172.17.0.254    -> 68.179.73.86   : Vlan2210 (next hop
> > >> > 67.226.181.110)
> > >> > > <<<<<< the next hop is correct
> > >> > >
> > >> > > With LDP on:
> > >> > > frort04#sh mls cef exact-route 172.17.0.254 68.179.73.86
> > >> > Interface: Vl2210,
> > >> > > Next Hop: 224.0.0.168, Vlan: 2210, Destination Mac:
> > >> > 00b0.4a5e.7419  <<<<<<<<
> > >> > > next hop can't be a multicast IP
> > >> > >
> > >> > > the CEF entry and MLS CEF entry are different, after
> > >> > consulting the LAN-SW
> > >> > > team, it is found this router had issue of overloaded
> > >> > routes causing mls cef
> > >> > > table become corrupted.
> > >> > >
> > >> > > So basically we were told that because the SUP32 has a
> > >> > hardware limitation of
> > >> > > 250K routes that it can hardware cef, we were getting
> > >> > corruption in our
> > >> > > tables and in turn corrupting how LDP was building its
> > >> > forwarding table.  The
> > >> > > core P routers currently hold the entire internet routing
> > >> > tables so yes they
> > >> > > technically are pretty full in terms of the number of
> > >> > routes they can hold.
> > >> > > They want us to reload our router to clear the tables but
> > >> > they can't
> > >> > > guarantee that this problem won't resurface again down the
> > >> > road or right
> > >> > > away.  I'm more curious if there is some kind of IOS bug we
> > >> > might be hitting
> > >> > > which I'm hoping one of you might know but they're supposed
> > >> > to be doing a bug
> > >> > > scrub as well.
> > >> > >
> > >> > > Any thoughts on what we're experiencing?  Should we bite
> > >> > the bullet and
> > >> > > upgrade to SUP720-3BXLs?
> > >> > >
> > >> > > Thanks.
> > >> > >
> > >> > > Jose
> > >> > >
> > >> > > _______________________________________________
> > >> > > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > >> > > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > >> > > archive at http://puck.nether.net/pipermail/cisco-nsp/
> > >> > >
> > >> >
> > >> > _______________________________________________
> > >> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > >> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > >> > archive at http://puck.nether.net/pipermail/cisco-nsp/
> > >> >
> > >> _______________________________________________
> > >> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > >> https://puck.nether.net/mailman/listinfo/cisco-nsp
> > >> archive at http://puck.nether.net/pipermail/cisco-nsp/
> > >_______________________________________________
> > >cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > >https://puck.nether.net/mailman/listinfo/cisco-nsp
> > >archive at http://puck.nether.net/pipermail/cisco-nsp/
> > 
> > LBi. The global marketing and technology agency. Winner: 
> Media Guardian Design Innovation Award 2008
> > 
> > LBi Ltd is registered in England and Wales, the registered 
> number and address are 03080409, Truman Brewery, 146 Brick 
> Lane, London, E1 6RU.  This email may contain confidential 
> and/or privileged information. If you are not the intended 
> recipient (or have received this e-mail in error) please 
> notify the sender immediately and destroy this e-mail. Any 
> unauthorised copying, disclosure or distribution of the 
> material in this e-mail is strictly forbidden.
> 


More information about the cisco-nsp mailing list