[c-nsp] Hardware limitations on SUP32 with LDP and full routing table

Rodney Dunn rodunn at cisco.com
Thu Jan 22 15:51:25 EST 2009


On Thu, Jan 22, 2009 at 07:07:13PM +0000, Oliver Dewdney wrote:
> 
> 
> >-----Original Message-----
> >From: cisco-nsp-bounces at puck.nether.net [mailto:cisco-nsp-
> >bounces at puck.nether.net] On Behalf Of Rodney Dunn
> >
> >I'm by no means a TCAM expert but it seems you are asking for the
> >excpetion state when the TCAM gets full to allow the more specific
> >in somehow so that the longest prefix is matched?
> 
> I believe this is what the documentation says will happen.

Can you point me to that so I can learn what it says?

> 
> 
> >These exception cases come with all kinds of problems. One customer
> >wants it one way and the next customer wants it another so you never
> >win.
> 
> If the FIB says that it will routed one way then the implementation in the RIB should do that.
> 
> >I suspect the punt is coming from the next hop not being resolved
> >to a valid adjacency or if there is no hit at all. If the search
> >is done in hardware and it has no knowledge, due to TCAM full, of
> >the less specific it can't forward based on that.
> 
> So the problem is inserting a new TCAM entry into a area that is already full, I guess it needs to rebuild the whole TCAM allocating more space to the area where the insert needs to me - maybe loading a whole TCAM means that there is a time where there is an incomplete RIB so what does the hardware forwarder do with the packets? - punt them all to the CPU!?
>

Some customers want what to punt on TCAM full some don't. If it's a single
point of failure they want it sometimes to do the best it can, the next
customer says bring it down so I know about it, some say bring it down
to reroute in a dual path scenario. There is no "best" answer.

 
> 
> >I'm not sure there is a way to solve the problem you describe without
> >more TCAM space. We don't do any kind of RIB compression.
> 
> Moderate compression will make a few full internet feeds fit into the smaller TCAM.
>

I pushed for that once years ago but what was found was that the
code to do it was so complex it was decided against given the newer
hw forwarding engines could handle it. One can always say that is a self
serving answer but after number threads about it I myself realized it's
much more complex than it appears on the surface.

Rodney

 
> >Rodney
> >
> >
> >On Thu, Jan 22, 2009 at 06:15:13PM +0100, Marcus.Gerdon wrote:
> >> Hi Jose,
> >> Hi Marek,
> >>
> >> I'm facing the same symptom with loosing connectivity on a couple of
> >machines for quite some time. With a DFZ table the TCAM's are simply
> >overloaded.
> >>
> >> I've been able to track that down but for some weeks now Cisco can't
> >provide any solution.
> >>
> >> The problem itself isn't that complex:
> >>
> >> When FIB is built (powered up or routing protocols come up; 'clear ip
> >route *' also works - no reload required) the forwarding entries are
> >created in ordered sequence in the TCAM, longest prefixes first.
> >>
> >> Each packets destination is first looked up in the TCAM. Only if the
> >TCAM doesn't provide a next hop the software FIB is queried. As TCAM is
> >walked sequentially, the longest match is found first and next-hop is
> >successfully determined. Only if no TCAM entry is found it's swictehd
> >over to look into the software-only tables.
> >>
> >> Think about a /16 being available at startup and entered into the
> >TCAM. At some later time a more-specific /24 shows up in the routing
> >tables. Whilst trying to create a forwarding entry it is determined that
> >the TCAM isn't capable of holding the additional /24 as the area
> >organzied for 24's at time of the initial population is full ('sh mls
> >cef masks' and some investigations shows this and it's even
> >reproducable).
> >>
> >> Due to that only a software entry (you can check with 'sh ip cef') is
> >created, but none in the TCAM (check with 'sh mls cef').
> >>
> >> Now we have a TCAM and software entry for /16 and the overlapping /24
> >only in software.
> >>
> >> When looking up an address within the /24 TCAM is queried first and
> >finds the /16 record. As a match is found, software isn't queried at
> >all.
> >>
> >> Seems like somewhere the process inserting a prefix in the middle of
> >the TCAM and reordering it if needed is broken. I've tried to work
> >around using the cef consistency checks, but although they're working at
> >large, a few hundered ms jitter is produced each time TCAM is ordered.
> >I've disabled it again soon after as customers got to complain regarding
> >applications disconnecting due to the introduced jitter.
> >>
> >> If someones has an idea or even better a solution (or gets some
> >definitive answer from Cisco - my case is open for some time now and the
> >engineer told to going for reproducing this in the lab) please let me
> >know.
> >>
> >>
> >>
> >>
> >> kind regards,
> >>
> >> Marcus
> >>
> >>
> >> > -----Urspr?ngliche Nachricht-----
> >> > Von: cisco-nsp-bounces at puck.nether.net
> >> > [mailto:cisco-nsp-bounces at puck.nether.net] Im Auftrag von Marek
> >Tyban
> >> > Gesendet: Donnerstag, 22. Januar 2009 15:25
> >> > An: Jose
> >> > Cc: cisco-nsp at puck.nether.net
> >> > Betreff: Re: [c-nsp] Hardware limitations on SUP32 with LDP
> >> > and full routing table
> >> >
> >> >
> >> > Hi Jose,
> >> >
> >> > I think that generally SUP32 isn't suitable for todays full internet
> >> > routing table. It's due to the hardware limitations (as you wrote).
> >> >
> >> > When you have full routes on SUP32 you should see log output as
> >below
> >> >
> >> > %MLSCEF-SP-4-FIB_EXCEPTION_THRESHOLD: Hardware CEF entry
> >> > usage is at 95%
> >> > capacity for IPv4 unicast protocol.
> >> >
> >> > %CFIB-SP-7-CFIB_EXCEPTION: FIB TCAM exception, Some entries will be
> >> > software switched
> >> >
> >> > I have seen similar troubles with some sites/networks weren't
> >> > reachable
> >> > throught SUP720-3B (non XL) routers, but the routing and CEF
> >> > table were
> >> > correct.
> >> >
> >> > Regards,
> >> > Marek
> >> >
> >> > On Wed, 21 Jan 2009, Jose wrote:
> >> >
> >> > > I was wondering if I could get some additional opinions on
> >> > a case I have open
> >> > > with Cisco.  We have recently started turning up LDP on
> >> > various links out
> >> > > towards some routers that are being converted to act as
> >> > PEs.  The core is all
> >> > > connected together and has been running LDP on those
> >> > particular links for
> >> > > over 8 months.
> >> > >
> >> > > This past weekend we turned up LDP on a link to one of our
> >> > remote cities and
> >> > > we received sporadic complaints that some customers
> >> > couldn't access any
> >> > > sites/addresses if the path was via one of our P routers.
> >> > If traffic was
> >> > > through any other path on the network it was fine.
> >> > Traceroutes to & from
> >> > > this P router showed were unsuccessful even though the
> >> > routing table and LFIB
> >> > > all showed the correct information.  Turning off LDP across
> >> > this link
> >> > > resolved the problem for the customers.
> >> > >
> >> > > After opening up the TAC case and lots of troubleshooting
> >> > they showed us
> >> > > this:
> >> > >
> >> > > Without LDP on:
> >> > > frort04#sh ip cef exact-route 172.17.0.254 68.179.73.86
> >> > > 172.17.0.254    -> 68.179.73.86   : Vlan2210 (next hop
> >> > 67.226.181.110)
> >> > > <<<<<< the next hop is correct
> >> > >
> >> > > With LDP on:
> >> > > frort04#sh mls cef exact-route 172.17.0.254 68.179.73.86
> >> > Interface: Vl2210,
> >> > > Next Hop: 224.0.0.168, Vlan: 2210, Destination Mac:
> >> > 00b0.4a5e.7419  <<<<<<<<
> >> > > next hop can't be a multicast IP
> >> > >
> >> > > the CEF entry and MLS CEF entry are different, after
> >> > consulting the LAN-SW
> >> > > team, it is found this router had issue of overloaded
> >> > routes causing mls cef
> >> > > table become corrupted.
> >> > >
> >> > > So basically we were told that because the SUP32 has a
> >> > hardware limitation of
> >> > > 250K routes that it can hardware cef, we were getting
> >> > corruption in our
> >> > > tables and in turn corrupting how LDP was building its
> >> > forwarding table.  The
> >> > > core P routers currently hold the entire internet routing
> >> > tables so yes they
> >> > > technically are pretty full in terms of the number of
> >> > routes they can hold.
> >> > > They want us to reload our router to clear the tables but
> >> > they can't
> >> > > guarantee that this problem won't resurface again down the
> >> > road or right
> >> > > away.  I'm more curious if there is some kind of IOS bug we
> >> > might be hitting
> >> > > which I'm hoping one of you might know but they're supposed
> >> > to be doing a bug
> >> > > scrub as well.
> >> > >
> >> > > Any thoughts on what we're experiencing?  Should we bite
> >> > the bullet and
> >> > > upgrade to SUP720-3BXLs?
> >> > >
> >> > > Thanks.
> >> > >
> >> > > Jose
> >> > >
> >> > > _______________________________________________
> >> > > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> >> > > https://puck.nether.net/mailman/listinfo/cisco-nsp
> >> > > archive at http://puck.nether.net/pipermail/cisco-nsp/
> >> > >
> >> >
> >> > _______________________________________________
> >> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> >> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> >> > archive at http://puck.nether.net/pipermail/cisco-nsp/
> >> >
> >> _______________________________________________
> >> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> >> https://puck.nether.net/mailman/listinfo/cisco-nsp
> >> archive at http://puck.nether.net/pipermail/cisco-nsp/
> >_______________________________________________
> >cisco-nsp mailing list  cisco-nsp at puck.nether.net
> >https://puck.nether.net/mailman/listinfo/cisco-nsp
> >archive at http://puck.nether.net/pipermail/cisco-nsp/
> 
> LBi. The global marketing and technology agency. Winner: Media Guardian Design Innovation Award 2008
> 
> LBi Ltd is registered in England and Wales, the registered number and address are 03080409, Truman Brewery, 146 Brick Lane, London, E1 6RU.  This email may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden.


More information about the cisco-nsp mailing list