[j-nsp] BGP error messages

Daniel Roesen dr@cluenet.de
Mon, 14 Oct 2002 06:36:05 +0200


On Mon, Oct 14, 2002 at 09:55:10AM +0800, Tay Chee Yong wrote:
> My juniper router is facing an ERX router via ATM using ip address 192.168.1.1
> and .2, and we are running bgp between this link.
> 
> However, our bgp is resetting pretty frequently. One of the recent cause of
> resetting is due to a TCP error on both ends. In addition, what does the
> statement "connection collision prefers 192.168.1.2+1926 (proto)" means?
> 
> Error messages from Juniper (JUNOS 5.0R5.1):
> 
> Oct 14 07:36:01 juniper2 rpd[15846]: bgp_traffic_timeout: NOTIFICATION sent to
> 192.154.24.74 (External AS 9989): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 192.168.1.2 (External AS xxxx)
> Oct 14 07:36:01 juniper2 rpd[15846]: bgp_event: peer 192.168.1.2 (External AS
> xxxx) old state Established event HoldTime new state Idle
> Oct 14 07:37:07 juniper2 rpd[15846]: bgp_pp_recv: NOTIFICATION sent to
> 192.168.1.2 (External AS xxxx): code 6 (Cease), Reason: dropping 192.168.1.2
> (External AS xxxx), connection collision prefers 192.168.1.2+1926 (proto)
> Oct 14 07:37:07 juniper2 rpd[15846]: bgp_recv: peer 192.168.1.2 (External AS
> xxxx): received unexpected EOF
> Oct 14 07:37:17 juniper2 rpd[15846]: bgp_event: peer 192.168.1.2 (External AS
> xxxx) old state OpenConfirm event RecvKeepAlive new state Established

What happens is: BGP (TCP) session times out (holdtime expired), then
the session is tried to bring up again, but one side didn't time out
the session yet - so the TCP state of the original session was still
there and it get's confused. The new attempt is let fallen on the floor
("connection collision"). The second re-establishment attempt then works.
(sorry if that's wrong, I'm pre-caffeein).

We had once the very same problem of Junipers running JunOS 5.x (not
seen with 4.4!) dropping BGP sessions to ERXes frequently. We were
after long weeks of examination able to prove that there was an
implementation bug in the TCP stack used by ERX for BGP with handling
retransmits. This bug was fixed with later ERX software releases.

So try using newer ERX software, I know of no work-around. Drop me
a mail if you need further info (bug ID etc.) if you need them and
I try to find out.


Best regards,
Daniel