[c-nsp] MPLS "Tag Control" process - what does this do?

Fri Aug 3 08:44:15 EDT 2007

On 3/08/2007 10:33 PM, Rodney Dunn wrote:
>>> 'sh ip route summ'.
>> 370flinders-r200.32-pe01#show ip route summary
>> IP routing table name is Default-IP-Routing-Table(0)
>> Route Source    Networks    Subnets     Overhead    Memory (bytes)
>> connected       0           58          4228        8816
>> static          1           79          5808        14200
>> eigrp 9176      0           0           0           0
>> bgp 9176        42568       8972        6430852     7838160
>>   External: 51540 Internal: 0 Local: 0
>> ospf 9176       24          573         38336       94824
>>   Intra-area: 8 Inter-area: 5 External-1: 583 External-2: 1
>>   NSSA External-1: 0 NSSA External-2: 0
>> internal        92                                  107824
>> Total           42685       9682        6479224     8063824
>> 370flinders-r200.32-pe01#
> 
> Ok...not that many IGP routes that labels would have to be allocated
> for.
> 
> Now, if you learned the full BGP feed over a MPLS path you would
> have to recurse all those BGP prefixes to the IGP next hop which
> would put it in the MPLS path. That recursion can sometimes drive
> some of the CEF and TAC control processes up.
> But if the BGP feed is over an IP path it would not impact the MPLS side.
> 
>> There's a full BGP internet feed coming in but it's filtered to the 42000 
>> routes as seen above (I don't know why, it's an ISP network we have 
>>  recently taken over management of so there are still some unanswered 
>> questions).  The 7200 has 1G of DRAM.
> 
> Ok..
> 
>> LDP.  There are two routers attached to the far end peer which are running 
>> TDP still but these are not directly connected to this router I sent the 
>> logs from.
>>
>> I am intending to change these over to LDP soon so that the whole network 
>> is just using LDP throughout and not a mixture of the two.
>>
>>> What is the peer and code?
>> It is also a 7200 NPE-G1 with 12.3(19).  There are three core routers - all 
>> the same - linked via ATM in a triangle/meshed MPLS topology.  The other 
>> two seemed to be OK.
> 
> 
> hmm...I wonder if they changed it in LDP and no TDP to make sure a withdraw
> is sent before a new advertisement. But the above message seems to imply
> you got a label and the route was missing.
> 
> Sounds like the routes may have been flapping possible.
> 'sh ip route' would tell you how old they were.

>>> I looked around a bit to try and understand that messge.
>>> Seems it has to do with getting a label for a prefix we don't have.
>> One of the support guys said something about the router dropping it's CEF 
>> table. Unfortunately I didn't check that at the time, my concern was more 
>>  on why so much CPU was being burnt up on that one process.
>>
>>
>>> Problem Description:
>>> ====================
>>> Problem: LDP does not withdraw label before announcing a new label for 
>>> 	 the same FEC.
>>>
>>> Solution:
>>> To fix this problem a new config command is introduced:
>>>  [no] mpls ldp neighbor A.B.C.D implicit-withdraw-label
>>>
>>> default Behavior:
>>> It will follow the LDP standard i.e. LDP will withdraw previously
>>> advertised label before advertising a new label for a FEC. When
>>> "mpls ldp neighbor A.B.C.D implicit-withdraw-label" is configured
>>> LDP will not withdraw the previous label before advertising a new 
>>> label.
>>>
>>> default behavior is changed. Now when there is a need to change
>>> label for a FEC:
>>> 1. LDP will send a Label withdraw and then after receiving Label
>>> Release, it will advertise the new binding with a Label Mapping. If
>>> after sending Label Withdraw, no Label Release is received from a peer(s)
>>> and sufficient time (Currently set to 5 minutes) has passed, then LDP
>>> will assume that peer(s) is not capable of sending a label release and it
>>> will send Label Mapping to the peer.
>>>
>>> 2. LDP maintains a list of previous labels for which a Label Release is
>>> awaited from any peer.
>>> A new Label Mapping for a FEC is not announced to a peer if a Label
>>> release for the same FEC is pending from the peer.
>>>
>>> that was changes that went in under:
>>>
>>> CSCdv74248
>>> Externally found enhancement defect: Resolved (R)
>>> LDP session drop after receiving a new label for the same FEC
>>>
>>> that you would have in 12.3(19). But what about the peering router?
>> See above.
>>
>>> Rodney
>> I've also just noticed that the 3550 behind this router and another one 
>> connected to that via an ethernet link to that switch also is pretty sick, 
>> and both nearly ran out of memory today.  Both have logged a lot of 
>> messages about "Aug  3 11:51:39: %FIB-2-FIBDOWN: CEF has been disabled due 
>> to a low memory condition. It can be re-enabled by configuring "ip cef 
>> [distributed]"
> 
> ooppss....that means you probably are low or have a leak. And during the
> convergence event you used enough transient memory to tip it over the
> edge. That's a problem you need to get fixed.

One of the 3550s has 12.2(25)SED on it, the other has 12.1(22)EA8a.

Incidentally, is there a way to limit the number of routes learnt via OSPF? 
Much like the bgp neighbor 'maximum-prefix' command?

>> 'show ip cef' on these switches shows that CEF is now not running.  My 
>> thinking is that this is a rather big problem in itself (!) and will try 
>> get these switches reloaded later tonight.
> 
> Yeah..after you reload look at the free memory. If it's low right after
> a reload you don't have enough memory. If it decreases over time you have
> a leak that needs to be debugged.

It hasn't been reloaded yet, but:

199city-rB2-dcs01#show mem summary
                 Head    Total(b)     Used(b)     Free(b)   Lowest(b)  Largest(b)
Processor     CA17A0    53854304     8625208    45229096       39756     2489720
       I/O   80000000     8388608     3832080     4556528     4551892     4544080

>> The reason I am bringing this up is that I'm considering if the problem may 
>> not have been directly caused by this 7200, but may be caused by some other 
>> external factor.  The only thing which strikes me as a possibility is that 
>> someone or something flooded/redistributed an entire BGP feed into OSPF.  
>> Does that sound like a possibility?
> 
> Yep. Been there seen that more than once. :)
> 
> Without snapshots of the routing table it's hard to say.

I know :-(

> If you didn't reboot this 72xx was does the loweest show in 'sh mem stat'?

370flinders-r200.32-pe01#show mem statistics
                 Head    Total(b)     Used(b)     Free(b)   Lowest(b)  Largest(b)
Processor   63739FC0   948723776   167593200   781130576   623354028   278449760
       I/O    C000000    67108864     6060664    61048200    59514824    58316348
370flinders-r200.32-pe01#

> That would show you if you ran it really low at some point.
> 
>> Still doesn't answer quite why the 7200 was chewing so much cpu though 
>> <scratches head>.
> 
> Got a ton of routes or the routes were churning a lot. Or you got a slew
> of label advertisements from the tdp/ldp peers most likely.

Looks like all the OSPF routes are only 7h:50m old.  That's about the time it 
was all going on.  All the exterior/BGP routes are relatively old, all some days 
old.

Looking inside one of the VRFs I can see that most, but not all, routes are of 
about that age too.

Reuben