[c-nsp] Random BGP Drops

Fri Jul 24 09:33:51 EDT 2015

I checked this and the MSS matches on both sides:

Juniper side:
   sndsbcc:          0 sndsbmbcnt:          0  sndsbmbmax:     262144
sndsblowat:       2048 sndsbhiwat:      32768
   rcvsbcc:          0 rcvsbmbcnt:          0  rcvsbmbmax:     262144
rcvsblowat:          1 rcvsbhiwat:      32768
   proc id:       3283  proc name:        rpd
       iss: 1163062337      sndup: 1163062397
    snduna: 1163097242     sndnxt: 1163097242      sndwnd:      15130
    sndmax: 1163097242    sndcwnd:      65535 sndssthresh: 1073725440
       irs: 3033053077      rcvup: 3033087402
    rcvnxt: 3033087402     rcvadv: 3033069519      rcvwnd:      16384
       rtt:          0       srtt:          0        rttv:      12000
    rxtcur:       3000   rxtshift:          0       rtseq:          0
    rttmin:       1000  mss:       1460
     flags: ACKNOW [0x1]

Cisco Side:

Enqueued packets for retransmit: 0, input: 0  mis-ordered: 0 (0 bytes)

Event Timers (current time is 0xEAB00ACB8):
Timer          Starts    Wakeups            Next
Retrans          1813         10             0x0
TimeWait            0          0             0x0
AckHold          1821       1788             0x0
SendWnd             0          0             0x0
KeepAlive           0          0             0x0
GiveUp              0          0             0x0
PmtuAger       156412     156411     0xEAB00ADCB
DeadWait            0          0             0x0

iss: 3033053077  snduna: 3033087421  sndnxt: 3033087421     sndwnd:  16384
irs: 1163062337  rcvnxt: 1163097261  rcvwnd:      15111  delrcvwnd:   1273

SRTT: 300 ms, RTTO: 303 ms, RTV: 3 ms, KRTT: 0 ms
minRTT: 0 ms, maxRTT: 8700 ms, ACK hold: 200 ms
Flags: higher precedence, nagle, path mtu capable

Datagrams (max data segment is 1460 bytes):
Rcvd: 3611 (out of order: 0), with data: 1821, total data bytes: 34923
Sent: 3616 (retransmit: 10), with data: 1803, total data bytes: 34343

Another thing is the path-mtu is enabled, so TCP should negotiate the
correct MSS. Am I wrong?

Kind regards,

Catalin Dominte
Senior Network Consultant
+44(0)1628302007
Nocsult Ltd
www.nocsult.net

On Fri, Jul 24, 2015 at 1:56 PM, Mark Tinka <mark.tinka at seacom.mu> wrote:

>
>
> On 24/Jul/15 14:48, Catalin Dominte wrote:
>
> Hi Mark,
>
>  Thanks for getting back to me.
>
>  This affects only a handful of customers and a handful of LINX peers.
> They have always been stable, not had any issues with them.
>
>  As far as I know they have not changed much in terms of hardware, but in
> software configuration they could have changed stuff. I can control what
> advertisements I receive from a customer and BGP policies, but I don't have
> a lot of visibility into what the customers are doing during their normal
> day to day operations.
>
>  Besides it would be too much of a coincidence if say 5 peering sessions
> get disconnected at random times, but all of them every time.
>
>
> We've had issues like this across LINX peering sessions, where it turns
> out to be an MTU issue.
>
> We have a standard TCP MSS of 1,500 bytes. We've generally solved this by
> having the peer fix their MTU or MSS accordingly.
>
> I've always found it strange especially if the peer is physically
> terminated on the LINX switch, and not coming in via a remote partner. But
> fixing the MTU/MSS always works.
>
> Mark.
>