[c-nsp] Random BGP Drops

Catalin Dominte catalin.dominte at nocsult.net
Fri Jul 24 08:48:04 EDT 2015


Hi Mark,

Thanks for getting back to me.

This affects only a handful of customers and a handful of LINX peers. They
have always been stable, not had any issues with them.

As far as I know they have not changed much in terms of hardware, but in
software configuration they could have changed stuff. I can control what
advertisements I receive from a customer and BGP policies, but I don't have
a lot of visibility into what the customers are doing during their normal
day to day operations.

Besides it would be too much of a coincidence if say 5 peering sessions get
disconnected at random times, but all of them every time.

Kind regards,

Catalin Dominte
Senior Network Consultant
+44(0)1628302007
Nocsult Ltd
www.nocsult.net


On Fri, Jul 24, 2015 at 1:43 PM, Mark Tinka <mark.tinka at seacom.mu> wrote:

>
>
> On 24/Jul/15 12:49, Catalin Dominte wrote:
> > Hi everyone,
> >
> > Over the past two weeks we have been experiencing a few instances where
> > some BGP sessions drop randomly.
> >
> > The router on our side is a 6500 Sup 2T XL version, with 1 x Full BGP
> > Transit, a few downstream customers and 30 BGP sessions at LINX, and OSPF
> > as the IGP. The setup has been very stable for the last couple of years
> > without any issues.
> >
> > Looking at the logs on our side we can see the hold time expired.
> >
> > On the customer side we can see the following message in their logs, in
> > particular the "hold timer remain" messages. The customer logs come from
> a
> > Juniper router and it clearly shows that there is still lots of hold time
> > remaining before the session should be torn down.
> >
> > Jul 24 00:33:04  rt1 rpd[1396]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer
> > A.B.C.D (External AS *****) changed state from Established to Idle (event
> > RecvNotify) (instance master)
> > Jul 24 00:33:04  rt1 rpd[1396]: bgp_read_v4_message:10656: NOTIFICATION
> > received from A.B.C.D (External AS *****): code 4 (Hold Timer Expired
> > Error), socket buffer sndcc: 57 rcvcc: 0 TCP state: 4, snd_una:
> 3040466763
> > snd_nxt: 3040466801 snd_wnd: 16194 rcv_nxt: 3738492361 rcv_adv:
> 3738508724,
> > hold timer out 90s, hold timer remain 1:07.779687s
> > Jul 24 00:33:12  rt1 rpd[1396]: bgp_pp_recv: rejecting connection from
> > A.B.C.D (External AS *****), peer in state Idle
> > Jul 24 00:33:12  rt1 rpd[1396]: bgp_pp_recv:3286: NOTIFICATION sent to
> > A.B.C.D+29266 (proto): code 6 (Cease) subcode 5 (Connection Rejected)
> > Jul 24 00:33:36  rt1 rpd[1396]: RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer
> > A.B.C.D (External AS *****) changed state from OpenConfirm to Established
> > (event RecvKeepAlive) (instance master)
> >
> > Has anyone else seen the same error before?
>
> Does this affect some or all of your BGP sessions.
>
> Does it affect any of your LINX peers or just your downstream customers?
> If they affect your customers, have these been stable in the past or
> have they always flapped? Furthermore, do you know whether your
> customers have changed hardware, changed software version, changed
> configurations, e.t.c.?
>
> Mark.
>


More information about the cisco-nsp mailing list