[j-nsp] MX960 JunOS recommendations
Tima Maryin
tima at transtelecom.net
Mon Nov 23 05:59:48 EST 2009
Hello!
(If anyone interested)
It was a PR463989
p.s.
It took me almost month(!!) to extract _existing_ PR number from JTAC !
/angry
Krzysztof Szarkowicz wrote:
> JNPR send notification because of hold timer expired (meaning no BGP messages are received from the
> neighbor) - this is correct behavior from BGP perspective.
>
> Do you have logs on CSCO side for the same event? I assume you will see retransmission of UPDATE
> message (not Keepalive message). This Update message is dropped somewhere on the path between CSCO
> and JNPR. And CSCO retrsmits this message. Since UPDATE message is sent within Keepalive timer, no
> Keepalives are sent.
>
> The most common cause of dropping is mismatch of MPLS MTU, or L2 device with misconfigured MTUs
> somewhere in between.
>
> You have to figure out (debugs, traceoptions, tcpdumps, whats ever) which device on the path is
> dropping.
>
> //Krzysztof
>
> -----Original Message-----
> From: Tima Maryin [mailto:tima at transtelecom.net]
> Sent: Thursday, 12 November, 2009 9:07
> To: kszarkowicz at gmail.com
> Cc: juniper-nsp at puck.nether.net
> Subject: Re: [j-nsp] MX960 JunOS recommendations
>
> First of all thanks to all who cares :)
> I'll reply one by one
>
>
> Derick Winkworth wrote:
> > How about some debugs or traceoptions?
> >
> >
>
> traceoptions at last Jun says that box dosen't receive bgp notifications some
> times. haven't tried any more yet
>
>
>
> sthaug at nethelp.no wrote:
> >
> > Make sure that your IP MTU is the same on both Cisco and Juniper sides.
> > If you run IS-IS, make sure your CLNS MTU is the same on both Cisco and
> > Juniper sides.
>
>
> IP mtu are the same, otherwise ospf do not come up
>
>
> > People have been running interoperable Cisco and Juniper networks for
> > many years. This is not rocket science.
>
>
> Yeah, we installed several Juns into our network several months ago and this is
> the only problem which we couldn't solve and rolled back to previous software
>
> (well i do not count some rpd crashes on box with aggregated interfaces which we
> can avoid for now. jtac evetually said that its PR439627. I can't read this
> hidden PR, but its supposed to be fixed in 10.x and 9.3Rnextrelease )
>
>
>
> Krzysztof Szarkowicz wrote:
>> With MTUs around 9000 configured on ALL links in the network there should be no problem with BGP,
>> since as per RFC4271, section 4:
>>
>> The maximum message size is 4096 octets. All implementations are required to support this maximum
>> message size.
>>
>> So even if MPLS and IP MTUs slightly differ, with sizes around 9000 it doesn't matter from BGP
>> perspective.
>>
>> The only thing that comes in my mind, that there are some L2 switches in between and there is
>> something wrong with MTU on those switches. Worth to check.
>
>
> There are no switches between them
> its
> 7301-geoptic-7606-tengig-t1600-tengig-mx960
>
> Its lab setup. On the real network it was slightly different, but actually its
> the same from this problem point of view
>
>
>> Could you paste from the log the Notification message generated when the BGP session is tear down?
>
>
> I didn't find any dependance from interfaces load or anything else.
> It can be 3-4 gig load (like it was on real network) or empty (like its in
> lab), bgp session may drop once per minute or stay up for 30 - 60 mins.
> Cisco can be either GSR or 7301, Juniper can be mx or T.
>
> There is nothing special in logs.
> Thats the one from mx960:
> Nov 12 06:18:31 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0
> rcvcc: 0 TCP state: 4, snd_una: 307818660 snd_nxt: 307818660 snd_wnd: 16230
> rcv_nxt: 614682635 rcv_adv: 614699019, hold timer 0
> Nov 12 06:20:48 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0
> rcvcc: 0 TCP state: 4, snd_una: 1301747029 snd_nxt: 1301747029 snd_wnd: 16211
> rcv_nxt: 732160622 rcv_adv: 732177006, hold timer 0
> Nov 12 06:22:53 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0
> rcvcc: 0 TCP state: 4, snd_una: 2024212109 snd_nxt: 2024212109 snd_wnd: 16230
> rcv_nxt: 3950965686 rcv_adv: 3950982070, hold timer 0
> Nov 12 06:24:56 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0
> rcvcc: 0 TCP state: 4, snd_una: 2363347692 snd_nxt: 2363347692 snd_wnd: 16230
> rcv_nxt: 1449362513 rcv_adv: 1449378897, hold timer 0
> Nov 12 06:59:09 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0
> rcvcc: 0 TCP state: 4, snd_una: 3704141975 snd_nxt: 3704141975 snd_wnd: 15985
> rcv_nxt: 2261397920 rcv_adv: 2261414304, hold timer 0
> Nov 12 07:01:19 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0
> rcvcc: 0 TCP state: 4, snd_una: 1379635866 snd_nxt: 1379635866 snd_wnd: 16230
> rcv_nxt: 612357774 rcv_adv: 612374158, hold timer 0
> Nov 12 07:04:06 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0
> rcvcc: 0 TCP state: 4, snd_una: 3377139997 snd_nxt: 3377139997 snd_wnd: 16211
> rcv_nxt: 544711184 rcv_adv: 544727568, hold timer 0
> Nov 12 07:20:37 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0
> rcvcc: 0 TCP state: 4, snd_una: 3633708680 snd_nxt: 3633708680 snd_wnd: 16175
> rcv_nxt: 1216109422 rcv_adv: 1216125806, hold timer 0
> Nov 12 07:22:54 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0
> rcvcc: 0 TCP state: 4, snd_una: 4034247055 snd_nxt: 4034247055 snd_wnd: 16211
> rcv_nxt: 2010186633 rcv_adv: 2010203017, hold timer 0
> Nov 12 07:25:00 mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to
> 10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 38
> rcvcc: 0 TCP state: 4, snd_una: 3122195868 snd_nxt: 3122195868 snd_wnd: 16268
> rcv_nxt: 209999860 rcv_adv: 210016244, hold timer 0
>
>
>> Thanks,
>> Krzysztof
>>
>>
>>
>> -----Original Message-----
>> From: Tima Maryin [mailto:tima at transtelecom.net]
>> Sent: Wednesday, 11 November, 2009 15:12
>> To: kszarkowicz at gmail.com
>> Cc: juniper-nsp at puck.nether.net
>> Subject: Re: [j-nsp] MX960 JunOS recommendations
>>
>> Uhm, i see your point here.
>> We indeed have cisco - cisco - Jun - Jun setup
>>
>>
>> My cisco interface mtu = ip mtu = mpls mtu =9000
>> But i reeeealy doubt that bgp keepalive packet size can come close to that mtu.
>>
>>
>> On Juniper i set interface mtu = cisco mtu +14 and it works fine!
>> And! As you say, it reports different mpls mtu value:
>>
>> > show interfaces xe-1/0/0 | match MTU
>> Link-level type: Ethernet, MTU: 9014, LAN-PHY mode, Speed: 10Gbps, Loopback:
>> None, Source filtering: Disabled,
>> Protocol inet, MTU: 9000
>> Protocol mpls, MTU: 8988
>> Protocol multiservice, MTU: Unlimited
>>
>>
>> As far as i understand "default mpls mtu" term (not sure that i _fully_
>> understand it though) it seems, Juniper supposes 3 labels maximum.
>> I dont see any reasons for device to drop packets which has 1 or 2 labels and
>> bigger than mpls mtu, but still ok from interface mtu point ov view.
>>
>> As per your logic, device should drop all traffic that match such criteria but
>> it seems only bgp session keepalives and i didn't see any other problems
>>
>>
>>
>> But still, i made an experiment on Juniper and cisco which has bgp session
>> between them.
>>
>> cisco:
>> #sh mpls interfaces g 0/0 detail | i MTU
>> MTU = 9000
>> #sh ip int g 0/0 | i MTU
>> MTU is 9000 bytes
>> #sh run int g 0/0
>> Building configuration...
>>
>> Current configuration : 212 bytes
>> !
>> interface GigabitEthernet0/0
>> description --- to 7606-2 ---
>> mtu 9000
>> ip address 10.3.13.2 255.255.255.0
>> load-interval 30
>> duplex full
>> speed 1000
>> media-type gbic
>> no negotiation auto
>> tag-switching ip
>> end
>>
>>
>> If i set mtu 9000 under family mpls and commit it, it looks like this:
>>
>> > show interfaces xe-1/0/0 | match MTU
>> Link-level type: Ethernet, MTU: 9014, LAN-PHY mode, Speed: 10Gbps, Loopback:
>> None, Source filtering: Disabled,
>> Protocol inet, MTU: 9000
>> Protocol mpls, MTU: 9000
>> Flags: Is-Primary, User-MTU
>> Protocol multiservice, MTU: Unlimited
>>
>>
>>
>> and problem still persists
>>
>>
>>
>> please let me know if you have any other ideas :)
>>
>>
>>
>> p.s. Its the same effect if i set tag-sw mtu 8988 on cisco and leave it
>> 'default' (=8988) on juniper
>>
>>
>>
>>
>>
>>
>>
>>
>> Krzysztof Szarkowicz wrote:
>>> Let me guess.
>>>
>>> Your network is multivendor network (JNPR and CSCO) and some transit devices are CSCO?
>>>
>>> CSCO and JNPR uses different algorithm to calculate default MPLS MTU (if MPLS MTU is not
>> explicitely
>>> configured) which results in 4 byte difference between CSCO side and JNPR side of the same link
>> for
>>> MPLS MTU (the IP MTU is equal on both ends, so no problem with OSPF).
>>>
>>> If on JNPR side your MPLS MTU is say 1500 and on the CSCO side the MPLS MTU is 1504, when the
> CSCO
>>> device send an BGP update packet towards JNPR device with size 1502, this packet is dropped by
>> JNPR
>>> device (as it is to big), and TCP ACK is not sent back. CSCO is keeping by resending this 1502
>> long
>>> packet, and JNPR is constantly dropping. Thus, after hold timer expires, the Notification message
>> is
>>> sent.
>>>
>>> I assume that with 9.3.R3.8 you didn't catched the '1502' packet sizes.
>>>
>>> Could you check with some show commands, what is the MPLS MTU on both ends of the link (which is
>>> terminated on CSCO on one side and JNPR on other side)?
>>>
>>> //Krzysztof
>>>
>>> -----Original Message-----
>>> From: Tima Maryin [mailto:tima at transtelecom.net]
>>> Sent: Wednesday, 11 November, 2009 9:57
>>> To: kszarkowicz at gmail.com
>>> Cc: juniper-nsp at puck.nether.net
>>> Subject: Re: [j-nsp] MX960 JunOS recommendations
>>>
>>> What did you mean by "inappropriately configured" ?
>>>
>>> There are the same mtu settings everywhere and traffic passes quite well.
>>> And ospf session goes up without problems.
>>>
>>> And how comes that "inappropriately configured IP and MPLS MTU" work well on
>>> 9.3R3.8 ?
>>>
>>>
>>> Krzysztof Szarkowicz wrote:
>>>> It is not a nasty bug, but problem of inappropriately configured IP and MPLS MTUs on transit
>>> nodes.
>>>> //Krzysztof
>>>>
>>>> -----Original Message-----
>>>> From: juniper-nsp-bounces at puck.nether.net [mailto:juniper-nsp-bounces at puck.nether.net] On Behalf
>>> Of
>>>> Tima Maryin
>>>> Sent: Wednesday, 11 November, 2009 8:28
>>>> To: juniper-nsp at puck.nether.net
>>>> Subject: Re: [j-nsp] MX960 JunOS recommendations
>>>>
>>>> 9.3R4.4 has a nasty bug which occures in setup when you have bgp session over
>>>> chain of few routers/links with ospf/ldp
>>>>
>>>> bgp session occasionally goes down with notification timeout. Even when there is
>>>> no traffic at all and no physical errors
>>>>
>>>> rollback to 9.3r3 helps though
>>>>
>>>>
>>>> JTAC still not confirmed it, but it easlily can be reprodused in lab
>>
>>
>
>
>
More information about the juniper-nsp
mailing list