[c-nsp] "Keepalives are temporarily in throttle due to closed TCP window" ASR1002 - IOS-XE

Florin Florian florin.florian at gmail.com
Tue Nov 6 16:02:37 EST 2018


Hi Florin

Use the show ip bgp neighbors | include max data   command to display the
MSS of BGP peers:

I suppose max data segment is 536 bytes.

The disadvantage of that is that smaller packets increase the amount of
bandwidth used to transport overhead.

Since BGP builds a TCP connection to all peers, a 536 byte MSS affects BGP
convergence times.

The solution is to enable the Path MTU (PMTU) feature, using the ip
tcp path-mtu-discovery
command. You can use this feature to dynamically determine how large the
MSS value can be without creating packets that need to be fragmented. PMTU
allows TCP to determine the smallest MTU size among all links in a TCP
session.

The increase in MSS from 536 to 1460 or 4430  bytes reduces TCP/IP
overhead, which helps BGP converge faster.

You can also configure BFD, which could detect the failure in the access in
milliseconds.

Regards

On Fri, Nov 2, 2018 at 8:46 AM Florin Vlad Olariu <
florinvlad.olariu at gmail.com> wrote:

> Hi,
>
> I've recently had a very odd problem. We have two different POPs which are
> connected via Equinex Cross Connect at L2. These two boxes, 2 ASR1002-X
> version 15.6(1)S1, have an iBGP session made via the loopbacks interfaces.
> The routers have many ways to reach each-other indirectly and 2 ways to
> reach each-other directly via the ECX.
>
> So at the moment from each router's perspective they have 2 equal cost
> paths to each-other's loopbacks, so the iBGP session is gonna establish
> over one of the links, depending on how the router picks the link.
>
> The session between these two peers started flapping continuously in an
> interval as big as the hold-timer (15s). In the "show ip bgp neighbor <ip>
> " I saw the "Keepalives are temporarily in throttle due to closed TCP
> window" message. Googling this seemed to be an MTU-related issue (which is
> weird, since this session has been in place for some time now, why the MTU
> problem now?). Just to check, I pinged from both ends with the DF bit set
> and with packets 7000B + (we have jumbos) and there were no problems in
> getting packets through.
>
> When doing a more prolonged ping though we did notice several packets lost
> at sporadic intervals. The MTU size had no bearing on whether or not we
> would experience packet loss or not, so I am assuming something wrong on
> the circuit itself. The interesting discovery though is that that BGP
> session was going via this specific "bad" link. Once we took that link out
> of the IGP, the session established via another link with no issues
> what-so-ever.
>
> I couldn't remember if TCP would close the window in the case in which it
> experiences packet loss, and from my research it seems that the TCP window
> size is not affected by packet loss, it's just an indication of the size of
> the buffer at the receiver end. The congestion window size is what actually
> changes in cases of packet losses. Considering I lost in general 1 packet
> out of 10-15 or so, let's say that the CWD never got that big, it still
> wouldn't explain that message I saw on the router.
>
> Why would the window size be 0 due to packet loss? I don't get it. There's
> either a problem with that message or with my understanding of how
> everything works. Again, MTU is not at fault as I have tested this.
>
> Any info or insight you might be able to provide would be deeply
> appreciated.
>
> Thanks,
>
> Vlad.
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>


More information about the cisco-nsp mailing list