[c-nsp] "Keepalives are temporarily in throttle due to closed TCP window" ASR1002 - IOS-XE

Florin Vlad Olariu florinvlad.olariu at gmail.com
Fri Nov 2 08:45:38 EDT 2018


Hi,

I've recently had a very odd problem. We have two different POPs which are connected via Equinex Cross Connect at L2. These two boxes, 2 ASR1002-X version 15.6(1)S1, have an iBGP session made via the loopbacks interfaces. The routers have many ways to reach each-other indirectly and 2 ways to reach each-other directly via the ECX. 

So at the moment from each router's perspective they have 2 equal cost paths to each-other's loopbacks, so the iBGP session is gonna establish over one of the links, depending on how the router picks the link.

The session between these two peers started flapping continuously in an interval as big as the hold-timer (15s). In the "show ip bgp neighbor <ip> " I saw the "Keepalives are temporarily in throttle due to closed TCP window" message. Googling this seemed to be an MTU-related issue (which is weird, since this session has been in place for some time now, why the MTU problem now?). Just to check, I pinged from both ends with the DF bit set and with packets 7000B + (we have jumbos) and there were no problems in getting packets through.

When doing a more prolonged ping though we did notice several packets lost at sporadic intervals. The MTU size had no bearing on whether or not we would experience packet loss or not, so I am assuming something wrong on the circuit itself. The interesting discovery though is that that BGP session was going via this specific "bad" link. Once we took that link out of the IGP, the session established via another link with no issues what-so-ever. 

I couldn't remember if TCP would close the window in the case in which it experiences packet loss, and from my research it seems that the TCP window size is not affected by packet loss, it's just an indication of the size of the buffer at the receiver end. The congestion window size is what actually changes in cases of packet losses. Considering I lost in general 1 packet out of 10-15 or so, let's say that the CWD never got that big, it still wouldn't explain that message I saw on the router.

Why would the window size be 0 due to packet loss? I don't get it. There's either a problem with that message or with my understanding of how everything works. Again, MTU is not at fault as I have tested this.

Any info or insight you might be able to provide would be deeply appreciated.

Thanks,

Vlad.


More information about the cisco-nsp mailing list