[c-nsp] MPLS "Tag Control" process - what does this do?
Reuben Farrelly
reuben-cisco-nsp at reub.net
Fri Aug 3 08:44:15 EDT 2007
On 3/08/2007 10:33 PM, Rodney Dunn wrote:
>>> 'sh ip route summ'.
>> 370flinders-r200.32-pe01#show ip route summary
>> IP routing table name is Default-IP-Routing-Table(0)
>> Route Source Networks Subnets Overhead Memory (bytes)
>> connected 0 58 4228 8816
>> static 1 79 5808 14200
>> eigrp 9176 0 0 0 0
>> bgp 9176 42568 8972 6430852 7838160
>> External: 51540 Internal: 0 Local: 0
>> ospf 9176 24 573 38336 94824
>> Intra-area: 8 Inter-area: 5 External-1: 583 External-2: 1
>> NSSA External-1: 0 NSSA External-2: 0
>> internal 92 107824
>> Total 42685 9682 6479224 8063824
>> 370flinders-r200.32-pe01#
>
> Ok...not that many IGP routes that labels would have to be allocated
> for.
>
> Now, if you learned the full BGP feed over a MPLS path you would
> have to recurse all those BGP prefixes to the IGP next hop which
> would put it in the MPLS path. That recursion can sometimes drive
> some of the CEF and TAC control processes up.
> But if the BGP feed is over an IP path it would not impact the MPLS side.
>
>> There's a full BGP internet feed coming in but it's filtered to the 42000
>> routes as seen above (I don't know why, it's an ISP network we have
>> recently taken over management of so there are still some unanswered
>> questions). The 7200 has 1G of DRAM.
>
> Ok..
>
>> LDP. There are two routers attached to the far end peer which are running
>> TDP still but these are not directly connected to this router I sent the
>> logs from.
>>
>> I am intending to change these over to LDP soon so that the whole network
>> is just using LDP throughout and not a mixture of the two.
>>
>>> What is the peer and code?
>> It is also a 7200 NPE-G1 with 12.3(19). There are three core routers - all
>> the same - linked via ATM in a triangle/meshed MPLS topology. The other
>> two seemed to be OK.
>
>
> hmm...I wonder if they changed it in LDP and no TDP to make sure a withdraw
> is sent before a new advertisement. But the above message seems to imply
> you got a label and the route was missing.
>
> Sounds like the routes may have been flapping possible.
> 'sh ip route' would tell you how old they were.
>>> I looked around a bit to try and understand that messge.
>>> Seems it has to do with getting a label for a prefix we don't have.
>> One of the support guys said something about the router dropping it's CEF
>> table. Unfortunately I didn't check that at the time, my concern was more
>> on why so much CPU was being burnt up on that one process.
>>
>>
>>> Problem Description:
>>> ====================
>>> Problem: LDP does not withdraw label before announcing a new label for
>>> the same FEC.
>>>
>>> Solution:
>>> To fix this problem a new config command is introduced:
>>> [no] mpls ldp neighbor A.B.C.D implicit-withdraw-label
>>>
>>> default Behavior:
>>> It will follow the LDP standard i.e. LDP will withdraw previously
>>> advertised label before advertising a new label for a FEC. When
>>> "mpls ldp neighbor A.B.C.D implicit-withdraw-label" is configured
>>> LDP will not withdraw the previous label before advertising a new
>>> label.
>>>
>>> default behavior is changed. Now when there is a need to change
>>> label for a FEC:
>>> 1. LDP will send a Label withdraw and then after receiving Label
>>> Release, it will advertise the new binding with a Label Mapping. If
>>> after sending Label Withdraw, no Label Release is received from a peer(s)
>>> and sufficient time (Currently set to 5 minutes) has passed, then LDP
>>> will assume that peer(s) is not capable of sending a label release and it
>>> will send Label Mapping to the peer.
>>>
>>> 2. LDP maintains a list of previous labels for which a Label Release is
>>> awaited from any peer.
>>> A new Label Mapping for a FEC is not announced to a peer if a Label
>>> release for the same FEC is pending from the peer.
>>>
>>> that was changes that went in under:
>>>
>>> CSCdv74248
>>> Externally found enhancement defect: Resolved (R)
>>> LDP session drop after receiving a new label for the same FEC
>>>
>>> that you would have in 12.3(19). But what about the peering router?
>> See above.
>>
>>> Rodney
>> I've also just noticed that the 3550 behind this router and another one
>> connected to that via an ethernet link to that switch also is pretty sick,
>> and both nearly ran out of memory today. Both have logged a lot of
>> messages about "Aug 3 11:51:39: %FIB-2-FIBDOWN: CEF has been disabled due
>> to a low memory condition. It can be re-enabled by configuring "ip cef
>> [distributed]"
>
> ooppss....that means you probably are low or have a leak. And during the
> convergence event you used enough transient memory to tip it over the
> edge. That's a problem you need to get fixed.
One of the 3550s has 12.2(25)SED on it, the other has 12.1(22)EA8a.
Incidentally, is there a way to limit the number of routes learnt via OSPF?
Much like the bgp neighbor 'maximum-prefix' command?
>> 'show ip cef' on these switches shows that CEF is now not running. My
>> thinking is that this is a rather big problem in itself (!) and will try
>> get these switches reloaded later tonight.
>
> Yeah..after you reload look at the free memory. If it's low right after
> a reload you don't have enough memory. If it decreases over time you have
> a leak that needs to be debugged.
It hasn't been reloaded yet, but:
199city-rB2-dcs01#show mem summary
Head Total(b) Used(b) Free(b) Lowest(b) Largest(b)
Processor CA17A0 53854304 8625208 45229096 39756 2489720
I/O 80000000 8388608 3832080 4556528 4551892 4544080
>> The reason I am bringing this up is that I'm considering if the problem may
>> not have been directly caused by this 7200, but may be caused by some other
>> external factor. The only thing which strikes me as a possibility is that
>> someone or something flooded/redistributed an entire BGP feed into OSPF.
>> Does that sound like a possibility?
>
> Yep. Been there seen that more than once. :)
>
> Without snapshots of the routing table it's hard to say.
I know :-(
> If you didn't reboot this 72xx was does the loweest show in 'sh mem stat'?
370flinders-r200.32-pe01#show mem statistics
Head Total(b) Used(b) Free(b) Lowest(b) Largest(b)
Processor 63739FC0 948723776 167593200 781130576 623354028 278449760
I/O C000000 67108864 6060664 61048200 59514824 58316348
370flinders-r200.32-pe01#
> That would show you if you ran it really low at some point.
>
>> Still doesn't answer quite why the 7200 was chewing so much cpu though
>> <scratches head>.
>
> Got a ton of routes or the routes were churning a lot. Or you got a slew
> of label advertisements from the tdp/ldp peers most likely.
Looks like all the OSPF routes are only 7h:50m old. That's about the time it
was all going on. All the exterior/BGP routes are relatively old, all some days
old.
Looking inside one of the VRFs I can see that most, but not all, routes are of
about that age too.
Reuben
More information about the cisco-nsp
mailing list