[c-nsp] slow convergence for full bgp table on aCisco7613/SUP720-3BXL

Wed Mar 14 04:25:58 EST 2007

Hi everybody,

Thanks for your feedback on this issue. It appears that we may have
found a solution for our problem. It seems that the TCP session was
throttled indeed and removing cef receive rate-limit did it for us:

"no mls rate-limit unicast cef receive 1500"

We installed this rate-limit about an year ago when we defined the
control-plane policing on our Cisco 7600 gear because we often had
problems with IGP and EGP flapping. We are also using TTL failure
rate-limiting.

Thanks again,
Emanuel

On 3/13/07, lee.e.rian at census.gov <lee.e.rian at census.gov> wrote:
> Are you sure the 10gig port is working correctly?  Have you tried a
> different port on the card or moving the card to a different slot in the
> chassis?
>
> I've had at least two TAC cases where the switch was dropping packets and
> the interface counters showed hardly any errors.  The explanation I got
> from TAC was that only errors that are attributable to a specific port
> increment the port error counters.  Errors that happened elsewhere - like
> going between asics or to the backplane couldn't be attributed to any
> specific port so they were basically ignored.
>
> A 7600 is about the same as a 6500 - right?  If so and your 10gig port is
> on a WS-X6704-10GE card try doing
>   remote command switch show platform hardware asicreg titan slot 10 port 3
> error
>   remote command switch show platform hardware asicreg super slot 10 port 3
> error
>
> If anything comes back with a non-zero value include the output from the
> command in your TAC case.
>
> Regards,
> Lee
>
>
> "Church, Charles" <cchurch at multimax.com> wrote on 03/13/2007 03:12:23 PM:
>
> > Are you sure the problem isn't on the other end?  You've sent about 2
> > million control packets, and half of those have been retransmits.
> > Almost all your acks you're sending are being delayed too.  It sure
> > seems like TCP is the issue.  Are these 'bad' counters still
> > incrementing?  Is the other side doing any policing?
> >
> > Chuck
> >
> > -----Original Message-----
> > From: cisco-nsp-bounces at puck.nether.net
> > [mailto:cisco-nsp-bounces at puck.nether.net] On Behalf Of Emanuel Popa
> > Sent: Tuesday, March 13, 2007 1:41 PM
> > To: Oliver Boehmer (oboehmer)
> > Cc: cisco-nsp at puck.nether.net
> > Subject: Re: [c-nsp] slow convergence for full bgp table on
> > aCisco7613/SUP720-3BXL
> >
> > We can clear the bgp session only tomorrow morning when traffic level is
> > pretty low. This means 14 hours from now. We will monitor SPD drops in
> > the morning but i don't think we are going to notice anything
> > interesting.
> >
> > Regarding tcp stats, do you mean:
> >
> > br01.frankfurt#sh tcp stat
> > Rcvd: 71476208 Total, 2530 no port
> >       385 checksum error, 18 bad offset, 0 too short
> >       44865801 packets (1625121834 bytes) in sequence
> >       1113216 dup packets (38655517 bytes)
> >       982 partially dup packets (341189 bytes)
> >       153829 out-of-order packets (131849235 bytes)
> >       2 packets (1882 bytes) with data after window
> >       145 packets after close
> >       1 window probe packets, 73202 window update packets
> >       3955 dup ack packets, 0 ack packets with unsend data
> >       24945059 ack packets (1360941754 bytes)
> > Sent: 71782281 Total, 1 urgent packets
> >       2023467 control packets (including 1014567 retransmitted)
> >       25824879 data packets (1360984359 bytes)
> >       287631 data packets (19095511 bytes) retransmitted
> >       244 data packets (93857 bytes) fastretransmitted
> >       43188396 ack only packets (38453293 delayed)
> >       7 window probe packets, 457732 window update packets
> > 337116 Connections initiated, 4909 connections accepted, 3852
> > connections established
> > 342321 Connections closed (including 946 dropped, 336762 embryonic
> > dropped)
> > 1302198 Total rxmt timeout, 0 connections dropped in rxmt timeout
> > 99 Keepalive timeout, 9488 keepalive probe, 0 Connections dropped in
> > keepalive
> >
> > Both peers changed everything on their ends: equipment, vendor,
> > interface etc. One of them changed from Juniper to Cisco and this
> > becomes pretty confusing. It would be a hell of a coincidence that they
> > both have the same problem with the config towards our machine.
> > I'm positive that the issue is generated on our gear. I just don't know
> > how to deal with it. Me and my colleagues have tried everything.
> > Now we are waiting for the case to reach cisco TAC.
> >
> > Good evening,
> > Emanuel
> >
> >
> > On 3/13/07, Oliver Boehmer (oboehmer) <oboehmer at cisco.com> wrote:
> > > Can you find out if you indeed see any SPD drops when you converge, or
> >
> > > if those SPD drops where from something else (i.e. Internet background
> >
> > > noise or something like this).
> > > But I don't think this is an input/SPD drop issue, if you had this
> > > problem, you would have noticed it with 2x1GE already.
> > > Can you check the TCP stats at both sides? Did your peer change
> > > something on his end except the interface? It's really weird.
> > >
> > >         oli
> > >
> > > Emanuel Popa <mailto:emanuel.popa at gmail.com> wrote on Tuesday, March
> > > 13,
> > > 2007 6:03 PM:
> > >
> > > > the headromm has the default value.
> > > >
> > > > br01.frankfurt#sh ip spd
> > > > Current mode: normal.
> > > > Queue min/max thresholds: 73/74, Headroom: 1000, Extended Headroom:
> > > > 10 IP normal queue: 1, priority queue: 0.
> > > > SPD special drop mode: none
> > > >
> > > > please tell me in what scenario whould your commands help me with my
> >
> > > > issue?
> > > >
> > > > regards,
> > > > emanuel
> > > >
> > > > On 3/13/07, Oliver Boehmer (oboehmer) <oboehmer at cisco.com> wrote:
> > > >> Emanuel Popa <> wrote on Tuesday, March 13, 2007 3:33 PM:
> > > >>
> > > >>> Ytti,
> > > >>>
> > > >>> Here is the output:
> > > >>> br01.frankfurt#sh int te 10/3 | i Input queue
> > > >>>   Input queue: 0/75/109/109 (size/max/drops/flushes); Total output
> > > >>> drops: 0
> > > >>>
> > > >>> But:
> > > >>>
> > > >>> - routing protocol packets are not dropped when default hold queue
> >
> > > >>> of 75 is full; they are considered priority packets and they are
> > > >>> dropped after headroom of 1000 is full; please see
> > > >>>
> > > >>
> > > http://www.cisco.com/en/US/products/hw/routers/ps167/products_tech_not
> > > e0
> > > >> 9186a008012fb87.shtml
> > > >>> for more details
> > > >>>
> > > >>
> > > >> how's your headroom? What does "show spd" tell you?
> > > >>
> > > >> ip spd queue max-threshold 999
> > > >> ip spd queue min-threshold 998
> > > >>
> > > >> might help..
> > > >>
> > > >>         oli
> > >
> > _______________________________________________
> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > archive at http://puck.nether.net/pipermail/cisco-nsp/
> >
> > _______________________________________________
> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > archive at http://puck.nether.net/pipermail/cisco-nsp/
>
>