[j-nsp] bgp flap

Rasheeduddin, Tariq Tariq.Rasheeduddin at allstream.com
Fri Feb 18 20:29:09 EST 2005


Janto

Could you make sure that from one end to the other end:
- IP MTU > TCP-MSS (less IP header length) ... TCP-MSS should 40 bytes less than IP MTU
- in case of BGP MD5 add 20 bytes as TCP options in the TCP header... so TCP-MSS should be corrected 60 bytes less than IP MTU

If you're not fulfilling this, and any of the router prevents ip unreachables, this is will make your BGP KeepAlives drop (except the Keepalives not sent with the BGP updates) and thus the session flaps after every 3rd Keepalive missed.

Also note that Juniper negotiates BGP tcp-mss @ 1440 starting JunOS 6.0 as per RFC2385 (instead of 1460), so this is usually safe now. Cisco is probably negotiate tcp-mss @ 536 bytes which should work but a little slow.

For any reason you can't figure this out, packet sniffing is a good tool. Hope this helps!

Tariq

-----Original Message-----
From: juniper-nsp-bounces at puck.nether.net
[mailto:juniper-nsp-bounces at puck.nether.net]On Behalf Of Hyunseog Ryu
Sent: February 17, 2005 1:40 AM
To: Janto Cin
Cc: Juniper-Nsp
Subject: Re: [j-nsp] bgp flap


I think first you need to gather the data from the router and other 
sources to find what causes this problem.
It may be some problem with physical circuit.
It may be circuit over-utilization.
It may be some software bug handling buffers.
If you don't know the real root cause of this problem,
you can't tell how to fix it.
You need to find out what's the root cause of this problem.

Hyun


Janto Cin wrote:
> Hi,
> Is the BGP keepalive packet by default classified to network control queue?
> Should I change the priority of default network control queue to high 
> priority?
> But all of this only make sure the router can send the keepalive packet 
> right?
> Thanks and Regards,
> Janto
> ----- Original Message ----- From: "Hyunseog Ryu" <r.hyunseog at ieee.org>
> To: "Janto Cin" <jantocin at datacomm.co.id>
> Cc: "Juniper-Nsp" <juniper-nsp at puck.nether.net>
> Sent: Thursday, February 17, 2005 12:06 PM
> Subject: Re: [j-nsp] bgp flap
> 
> 
>> Hi there,
>>
>> It seems too vague.
>> I think first step to prevent the problem is to find what cause the 
>> problem.
>> Hold timer expired can be happened when there is keepalive packet 
>> delivery problem between two peering routers.
>>
>> If I were you, I will check the interface to see whether there is 
>> packet drop/CRC/errors or not.
>> You can use "show interface extensive" to see the detail about 
>> Interface counters.
>> You may check the link utilization, too.
>> I believe from JUNOS CLI, it is "monitor interface traffic" you can use.
>> Sometimes because of buffering congestion, and/or bursty data packets, 
>> the circuit may be used at 100%, so the keepalive packet may be 
>> dropped to expire hold timer.
>>
>> If the interface counters and link utilizatin from both peering 
>> routers are o.k., you may check each router's CPU utilization history.
>>
>> If receiver's CPU is too busy to handle keepalive packet, it may not 
>> replied to keepalive packet within allowed period.
>>
>> I think this gives you brief idea where you can start.
>>
>> Hyun
>>
>> It is well-known
>>
>> Janto Cin wrote:
>>
>>> Dear All,
>>>
>>> Need help to know what are the causes for bgp flap with hold timer 
>>> expired error.
>>> The link itself didn't flapped.
>>> How to prevent this?
>>>
>>> Feb 17 01:27:32  GW rpd[2342]: bgp_traffic_timeout: NOTIFICATION sent 
>>> to 10.1.1.1 (External AS 9999): code 4 (Hold Timer Expired Error), 
>>> Reason: holdtime expired for 10.1.1.1 (External AS 9999)
>>> Feb 17 02:41:54  GW rpd[2342]: bgp_read_v4_update: NOTIFICATION 
>>> received from 10.1.1.1 (External AS 9999): code 4 (Hold Timer Expired 
>>> Error)
>>> Feb 17 03:02:00  GW rpd[2342]: bgp_traffic_timeout: NOTIFICATION sent 
>>> to 10.1.1.1 (External AS 9999): code 4 (Hold Timer Expired Error), 
>>> Reason: holdtime expired for 10.1.1.1 (External AS 9999)
>>> Feb 17 03:04:07  GW rpd[2342]: bgp_traffic_timeout: NOTIFICATION sent 
>>> to 10.1.1.1 (External AS 9999): code 4 (Hold Timer Expired Error), 
>>> Reason: holdtime expired for 10.1.1.1 (External AS 4761)
>>> Feb 17 03:49:27  GW rpd[2342]: bgp_read_v4_update: NOTIFICATION 
>>> received from 10.1.1.1 (External AS 9999): code 4 (Hold Timer Expired 
>>> Error)
>>>
>>> Many Thanks and Regards,
>>> Janto
>>> _______________________________________________
>>> juniper-nsp mailing list juniper-nsp at puck.nether.net
>>> http://puck.nether.net/mailman/listinfo/juniper-nsp
>>>
>>>
>>
> 
> 
> 


_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
http://puck.nether.net/mailman/listinfo/juniper-nsp



More information about the juniper-nsp mailing list