[j-nsp] MX960 JunOS recommendations

Mon Nov 23 05:59:48 EST 2009

Hello!

(If anyone interested)

It was a PR463989

p.s.
It took me almost month(!!) to extract _existing_ PR number from JTAC !

/angry

Krzysztof Szarkowicz wrote:
> JNPR send notification because of hold timer expired (meaning no BGP messages are received from the
> neighbor) - this is correct behavior from BGP perspective.
> 
> Do you have logs on CSCO side for the same event? I assume you will see retransmission of UPDATE
> message (not Keepalive message). This Update message is dropped somewhere on the path between CSCO
> and JNPR. And CSCO retrsmits this message. Since UPDATE message is sent within Keepalive timer, no
> Keepalives are sent.
> 
> The most common cause of dropping is mismatch of MPLS MTU, or L2 device with misconfigured MTUs
> somewhere in between.
> 
> You have to figure out (debugs, traceoptions, tcpdumps, whats ever) which device on the path is
> dropping.
> 
> //Krzysztof
> 
> -----Original Message-----
> From: Tima Maryin [mailto:tima at transtelecom.net] 
> Sent: Thursday, 12 November, 2009 9:07
> To: kszarkowicz at gmail.com
> Cc: juniper-nsp at puck.nether.net
> Subject: Re: [j-nsp] MX960 JunOS recommendations
> 
> First of all thanks to all who cares :)
> I'll reply one by one
> 
> 
> Derick Winkworth wrote:
>  > How about some debugs or traceoptions?
>  >
>  >
> 
> traceoptions at last Jun says that box dosen't receive bgp notifications some 
> times. haven't tried any more yet
> 
> 
> 
> sthaug at nethelp.no wrote:
>  >
>  > Make sure that your IP MTU is the same on both Cisco and Juniper sides.
>  > If you run IS-IS, make sure your CLNS MTU is the same on both Cisco and
>  > Juniper sides.
> 
> 
> IP mtu are the same, otherwise ospf do not come up
> 
> 
>  > People have been running interoperable Cisco and Juniper networks for
>  > many years. This is not rocket science.
> 
> 
> Yeah, we installed several Juns into our network several months ago and this is 
> the only problem which we couldn't solve and rolled back to previous software
> 
> (well i do not count some rpd crashes on box with aggregated interfaces which we 
> can avoid for now. jtac evetually said that its PR439627. I can't read this 
> hidden PR, but its supposed to be fixed in 10.x and 9.3Rnextrelease )
> 
> 
> 
> Krzysztof Szarkowicz wrote:
>> With MTUs around 9000 configured on ALL links in the network there should be no problem with BGP,
>> since as per RFC4271, section 4:
>>
>> The maximum message size is 4096 octets.  All implementations are required to support this maximum
>> message size.
>>
>> So even if MPLS and IP MTUs slightly differ, with sizes around 9000 it doesn't matter from BGP
>> perspective.
>>
>> The only thing that comes in my mind, that there are some L2 switches in between and there is
>> something wrong with MTU on those switches. Worth to check.
> 
> 
> There are no switches between them
> its
> 7301-geoptic-7606-tengig-t1600-tengig-mx960
> 
> Its lab setup. On the real network it was slightly different, but actually its 
> the same from this problem point of view
> 
> 
>> Could you paste from the log the Notification message generated when the BGP session is tear down?
> 
> 
> I didn't find any dependance from interfaces load or anything else.
> It can be 3-4 gig load  (like it was on real network) or empty (like its in 
> lab), bgp session  may drop once per minute or stay up for 30 - 60 mins.
> Cisco can be either GSR or 7301, Juniper can be mx or T.
> 
> There is nothing special  in logs.
> Thats the one from mx960:
> Nov 12 06:18:31  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
> rcvcc: 0 TCP state: 4, snd_una: 307818660 snd_nxt: 307818660 snd_wnd: 16230 
> rcv_nxt: 614682635 rcv_adv: 614699019, hold timer 0
> Nov 12 06:20:48  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
> rcvcc: 0 TCP state: 4, snd_una: 1301747029 snd_nxt: 1301747029 snd_wnd: 16211 
> rcv_nxt: 732160622 rcv_adv: 732177006, hold timer 0
> Nov 12 06:22:53  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
> rcvcc: 0 TCP state: 4, snd_una: 2024212109 snd_nxt: 2024212109 snd_wnd: 16230 
> rcv_nxt: 3950965686 rcv_adv: 3950982070, hold timer 0
> Nov 12 06:24:56  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
> rcvcc: 0 TCP state: 4, snd_una: 2363347692 snd_nxt: 2363347692 snd_wnd: 16230 
> rcv_nxt: 1449362513 rcv_adv: 1449378897, hold timer 0
> Nov 12 06:59:09  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
> rcvcc: 0 TCP state: 4, snd_una: 3704141975 snd_nxt: 3704141975 snd_wnd: 15985 
> rcv_nxt: 2261397920 rcv_adv: 2261414304, hold timer 0
> Nov 12 07:01:19  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
> rcvcc: 0 TCP state: 4, snd_una: 1379635866 snd_nxt: 1379635866 snd_wnd: 16230 
> rcv_nxt: 612357774 rcv_adv: 612374158, hold timer 0
> Nov 12 07:04:06  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
> rcvcc: 0 TCP state: 4, snd_una: 3377139997 snd_nxt: 3377139997 snd_wnd: 16211 
> rcv_nxt: 544711184 rcv_adv: 544727568, hold timer 0
> Nov 12 07:20:37  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
> rcvcc: 0 TCP state: 4, snd_una: 3633708680 snd_nxt: 3633708680 snd_wnd: 16175 
> rcv_nxt: 1216109422 rcv_adv: 1216125806, hold timer 0
> Nov 12 07:22:54  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
> rcvcc: 0 TCP state: 4, snd_una: 4034247055 snd_nxt: 4034247055 snd_wnd: 16211 
> rcv_nxt: 2010186633 rcv_adv: 2010203017, hold timer 0
> Nov 12 07:25:00  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 38 
> rcvcc: 0 TCP state: 4, snd_una: 3122195868 snd_nxt: 3122195868 snd_wnd: 16268 
> rcv_nxt: 209999860 rcv_adv: 210016244, hold timer 0
> 
> 
>> Thanks,
>> Krzysztof
>>
>>
>>
>> -----Original Message-----
>> From: Tima Maryin [mailto:tima at transtelecom.net] 
>> Sent: Wednesday, 11 November, 2009 15:12
>> To: kszarkowicz at gmail.com
>> Cc: juniper-nsp at puck.nether.net
>> Subject: Re: [j-nsp] MX960 JunOS recommendations
>>
>> Uhm, i see your point here.
>> We indeed have cisco - cisco - Jun - Jun setup
>>
>>
>> My cisco interface mtu = ip mtu = mpls mtu =9000
>> But i reeeealy doubt that bgp keepalive packet size can come close to that mtu.
>>
>>
>> On Juniper i set interface mtu = cisco mtu +14 and it works fine!
>> And! As you say, it reports different mpls mtu value:
>>
>>  > show interfaces xe-1/0/0 | match MTU
>>    Link-level type: Ethernet, MTU: 9014, LAN-PHY mode, Speed: 10Gbps, Loopback: 
>> None, Source filtering: Disabled,
>>      Protocol inet, MTU: 9000
>>      Protocol mpls, MTU: 8988
>>      Protocol multiservice, MTU: Unlimited
>>
>>
>> As far as i understand "default mpls mtu" term (not sure that i _fully_ 
>> understand it though) it seems, Juniper supposes 3 labels maximum.
>> I dont see any reasons for device to drop packets which has 1 or 2 labels and 
>> bigger than mpls mtu, but still ok from interface mtu point ov view.
>>
>> As per your logic, device should drop all traffic that match such criteria but 
>> it seems only bgp session keepalives and i didn't see any other problems
>>
>>
>>
>> But still, i made an experiment on Juniper and cisco which has bgp session 
>> between them.
>>
>> cisco:
>> #sh mpls interfaces g 0/0 detail  | i MTU
>>          MTU = 9000
>> #sh ip int g 0/0 | i MTU
>>    MTU is 9000 bytes
>> #sh run int g 0/0
>> Building configuration...
>>
>> Current configuration : 212 bytes
>> !
>> interface GigabitEthernet0/0
>>   description --- to 7606-2 ---
>>   mtu 9000
>>   ip address 10.3.13.2 255.255.255.0
>>   load-interval 30
>>   duplex full
>>   speed 1000
>>   media-type gbic
>>   no negotiation auto
>>   tag-switching ip
>> end
>>
>>
>> If i set mtu 9000 under family mpls and commit it, it looks like this:
>>
>>  > show interfaces xe-1/0/0 | match MTU
>>    Link-level type: Ethernet, MTU: 9014, LAN-PHY mode, Speed: 10Gbps, Loopback: 
>> None, Source filtering: Disabled,
>>      Protocol inet, MTU: 9000
>>      Protocol mpls, MTU: 9000
>>        Flags: Is-Primary, User-MTU
>>      Protocol multiservice, MTU: Unlimited
>>
>>
>>
>> and problem still persists
>>
>>
>>
>> please let me know if you have any other ideas :)
>>
>>
>>
>> p.s. Its the same effect if i set tag-sw mtu 8988 on cisco and leave it 
>> 'default' (=8988) on juniper
>>
>>
>>
>>
>>
>>
>>
>>
>> Krzysztof Szarkowicz wrote:
>>> Let me guess.
>>>
>>> Your network is multivendor network (JNPR and CSCO) and some transit devices are CSCO?
>>>
>>> CSCO and JNPR uses different algorithm to calculate default MPLS MTU (if MPLS MTU is not
>> explicitely
>>> configured) which results in 4 byte difference between CSCO side and JNPR side of the same link
>> for
>>> MPLS MTU (the IP MTU is equal on both ends, so no problem with OSPF).
>>>
>>> If on JNPR side your MPLS MTU is say 1500 and on the CSCO side the MPLS MTU is 1504, when the
> CSCO
>>> device send an BGP update packet towards JNPR device with size 1502, this packet is dropped by
>> JNPR
>>> device (as it is to big), and TCP ACK is not sent back. CSCO is keeping by resending this 1502
>> long
>>> packet, and JNPR is constantly dropping. Thus, after hold timer expires, the Notification message
>> is
>>> sent.
>>>
>>> I assume that with 9.3.R3.8 you didn't catched the '1502' packet sizes.
>>>
>>> Could you check with some show commands, what is the MPLS MTU on both ends of the link (which is
>>> terminated on CSCO on one side and JNPR on other side)?
>>>
>>> //Krzysztof
>>>
>>> -----Original Message-----
>>> From: Tima Maryin [mailto:tima at transtelecom.net] 
>>> Sent: Wednesday, 11 November, 2009 9:57
>>> To: kszarkowicz at gmail.com
>>> Cc: juniper-nsp at puck.nether.net
>>> Subject: Re: [j-nsp] MX960 JunOS recommendations
>>>
>>> What did you mean by "inappropriately configured" ?
>>>
>>> There are the same mtu settings everywhere and traffic passes quite well.
>>> And ospf session goes up without problems.
>>>
>>> And how comes that "inappropriately configured IP and MPLS MTU" work well on 
>>> 9.3R3.8 ?
>>>
>>>
>>> Krzysztof Szarkowicz wrote:
>>>> It is not a nasty bug, but problem of inappropriately configured IP and MPLS MTUs on transit
>>> nodes.
>>>> //Krzysztof
>>>>
>>>> -----Original Message-----
>>>> From: juniper-nsp-bounces at puck.nether.net [mailto:juniper-nsp-bounces at puck.nether.net] On Behalf
>>> Of
>>>> Tima Maryin
>>>> Sent: Wednesday, 11 November, 2009 8:28
>>>> To: juniper-nsp at puck.nether.net
>>>> Subject: Re: [j-nsp] MX960 JunOS recommendations
>>>>
>>>> 9.3R4.4 has a nasty bug which occures in setup when you have bgp session over 
>>>> chain of few routers/links with ospf/ldp
>>>>
>>>> bgp session occasionally goes down with notification timeout. Even when there is 
>>>> no traffic at all and no physical errors
>>>>
>>>> rollback to 9.3r3 helps though
>>>>
>>>>
>>>> JTAC still not confirmed it, but it easlily can be reprodused in lab
>>
>>
> 
> 
>