[c-nsp] slow convergence for full bgp table on a Cisco7613/SUP720-3BXL

Tue Mar 13 14:12:24 EST 2007

We have noticed peers which have many bestpath prefixes into the FIB 
take a LONG time to converge after reset, but when they have few 
bestpaths, it converges rapidly.  Most time is spent in BGP and CEF 
processes.  I assume this is due to the router computing the bestpath 
for what its learned so far, then doing it again, and again, etc.  I 
thought BGP read-only mode was supposed to help, but I can't find much 
on it.

Thanks
Chris

Rodney Dunn wrote:
> Get a sniffer trace when you clear the session.
> 
> It's a very hard problem to debug without extensive work because
> it could be in so many places.
> 
> You could run a debug ip packet against an ACL for the peers to
> match up with the sniffer trace. That would eliminate CoPP if you
> see all the packets in the debug that are in the trace.
> 
> Or span the port going to the RP and compare to the trace (I forgot
> how to do that).
> 
> On Tue, Mar 13, 2007 at 07:41:17PM +0200, Emanuel Popa wrote:
>> We can clear the bgp session only tomorrow morning when traffic level
>> is pretty low. This means 14 hours from now. We will monitor SPD drops
>> in the morning but i don't think we are going to notice anything
>> interesting.
>>
>> Regarding tcp stats, do you mean:
>>
>> br01.frankfurt#sh tcp stat
>> Rcvd: 71476208 Total, 2530 no port
>>       385 checksum error, 18 bad offset, 0 too short
>>       44865801 packets (1625121834 bytes) in sequence
>>       1113216 dup packets (38655517 bytes)
>>       982 partially dup packets (341189 bytes)
>>       153829 out-of-order packets (131849235 bytes)
>>       2 packets (1882 bytes) with data after window
>>       145 packets after close
>>       1 window probe packets, 73202 window update packets
>>       3955 dup ack packets, 0 ack packets with unsend data
>>       24945059 ack packets (1360941754 bytes)
>> Sent: 71782281 Total, 1 urgent packets
>>       2023467 control packets (including 1014567 retransmitted)
>>       25824879 data packets (1360984359 bytes)
>>       287631 data packets (19095511 bytes) retransmitted
>>       244 data packets (93857 bytes) fastretransmitted
>>       43188396 ack only packets (38453293 delayed)
>>       7 window probe packets, 457732 window update packets
>> 337116 Connections initiated, 4909 connections accepted, 3852
>> connections established
>> 342321 Connections closed (including 946 dropped, 336762 embryonic dropped)
>> 1302198 Total rxmt timeout, 0 connections dropped in rxmt timeout
>> 99 Keepalive timeout, 9488 keepalive probe, 0 Connections dropped in keepalive
>>
>> Both peers changed everything on their ends: equipment, vendor,
>> interface etc. One of them changed from Juniper to Cisco and this
>> becomes pretty confusing. It would be a hell of a coincidence that
>> they both have the same problem with the config towards our machine.
>> I'm positive that the issue is generated on our gear. I just don't
>> know how to deal with it. Me and my colleagues have tried everything.
>> Now we are waiting for the case to reach cisco TAC.
>>
>> Good evening,
>> Emanuel
>>
>>
>> On 3/13/07, Oliver Boehmer (oboehmer) <oboehmer at cisco.com> wrote:
>>> Can you find out if you indeed see any SPD drops when you converge, or
>>> if those SPD drops where from something else (i.e. Internet background
>>> noise or something like this).
>>> But I don't think this is an input/SPD drop issue, if you had this
>>> problem, you would have noticed it with 2x1GE already.
>>> Can you check the TCP stats at both sides? Did your peer change
>>> something on his end except the interface? It's really weird.
>>>
>>>         oli
>>>
>>> Emanuel Popa <mailto:emanuel.popa at gmail.com> wrote on Tuesday, March 13,
>>> 2007 6:03 PM:
>>>
>>>> the headromm has the default value.
>>>>
>>>> br01.frankfurt#sh ip spd
>>>> Current mode: normal.
>>>> Queue min/max thresholds: 73/74, Headroom: 1000, Extended Headroom: 10
>>>> IP normal queue: 1, priority queue: 0.
>>>> SPD special drop mode: none
>>>>
>>>> please tell me in what scenario whould your commands help me with my
>>>> issue?
>>>>
>>>> regards,
>>>> emanuel
>>>>
>>>> On 3/13/07, Oliver Boehmer (oboehmer) <oboehmer at cisco.com> wrote:
>>>>> Emanuel Popa <> wrote on Tuesday, March 13, 2007 3:33 PM:
>>>>>
>>>>>> Ytti,
>>>>>>
>>>>>> Here is the output:
>>>>>> br01.frankfurt#sh int te 10/3 | i Input queue
>>>>>>   Input queue: 0/75/109/109 (size/max/drops/flushes); Total output
>>>>>> drops: 0
>>>>>>
>>>>>> But:
>>>>>>
>>>>>> - routing protocol packets are not dropped when default hold queue
>>>>>> of 75 is full; they are considered priority packets and they are
>>>>>> dropped after headroom of 1000 is full; please see
>>>>>>
>>> http://www.cisco.com/en/US/products/hw/routers/ps167/products_tech_note0
>>>>> 9186a008012fb87.shtml
>>>>>> for more details
>>>>>>
>>>>> how's your headroom? What does "show spd" tell you?
>>>>>
>>>>> ip spd queue max-threshold 999
>>>>> ip spd queue min-threshold 998
>>>>>
>>>>> might help..
>>>>>
>>>>>         oli
>> _______________________________________________
>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/

-- 
Chris Griffin                           cgriffin at ufl.edu
Sr. Network Engineer - CCNP             Phone: (352) 392-2061
CNS - Network Services                  Fax:   (352) 392-9440
University of Florida/FLR               Gainesville, FL 32611