[c-nsp] Random BGP Drops

Mark Tinka mark.tinka at seacom.mu
Fri Jul 24 08:43:14 EDT 2015



On 24/Jul/15 12:49, Catalin Dominte wrote:
> Hi everyone,
>
> Over the past two weeks we have been experiencing a few instances where
> some BGP sessions drop randomly.
>
> The router on our side is a 6500 Sup 2T XL version, with 1 x Full BGP
> Transit, a few downstream customers and 30 BGP sessions at LINX, and OSPF
> as the IGP. The setup has been very stable for the last couple of years
> without any issues.
>
> Looking at the logs on our side we can see the hold time expired.
>
> On the customer side we can see the following message in their logs, in
> particular the "hold timer remain" messages. The customer logs come from a
> Juniper router and it clearly shows that there is still lots of hold time
> remaining before the session should be torn down.
>
> Jul 24 00:33:04  rt1 rpd[1396]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer
> A.B.C.D (External AS *****) changed state from Established to Idle (event
> RecvNotify) (instance master)
> Jul 24 00:33:04  rt1 rpd[1396]: bgp_read_v4_message:10656: NOTIFICATION
> received from A.B.C.D (External AS *****): code 4 (Hold Timer Expired
> Error), socket buffer sndcc: 57 rcvcc: 0 TCP state: 4, snd_una: 3040466763
> snd_nxt: 3040466801 snd_wnd: 16194 rcv_nxt: 3738492361 rcv_adv: 3738508724,
> hold timer out 90s, hold timer remain 1:07.779687s
> Jul 24 00:33:12  rt1 rpd[1396]: bgp_pp_recv: rejecting connection from
> A.B.C.D (External AS *****), peer in state Idle
> Jul 24 00:33:12  rt1 rpd[1396]: bgp_pp_recv:3286: NOTIFICATION sent to
> A.B.C.D+29266 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
> Jul 24 00:33:36  rt1 rpd[1396]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer
> A.B.C.D (External AS *****) changed state from OpenConfirm to Established
> (event RecvKeepAlive) (instance master)
>
> Has anyone else seen the same error before?

Does this affect some or all of your BGP sessions.

Does it affect any of your LINX peers or just your downstream customers?
If they affect your customers, have these been stable in the past or
have they always flapped? Furthermore, do you know whether your
customers have changed hardware, changed software version, changed
configurations, e.t.c.?

Mark.


More information about the cisco-nsp mailing list