[j-nsp] MX960 JunOS recommendations

Thu Nov 12 07:45:22 EST 2009

JNPR send notification because of hold timer expired (meaning no BGP messages are received from the
neighbor) - this is correct behavior from BGP perspective.

Do you have logs on CSCO side for the same event? I assume you will see retransmission of UPDATE
message (not Keepalive message). This Update message is dropped somewhere on the path between CSCO
and JNPR. And CSCO retrsmits this message. Since UPDATE message is sent within Keepalive timer, no
Keepalives are sent.

The most common cause of dropping is mismatch of MPLS MTU, or L2 device with misconfigured MTUs
somewhere in between.

You have to figure out (debugs, traceoptions, tcpdumps, whats ever) which device on the path is
dropping.

//Krzysztof

-----Original Message-----
From: Tima Maryin [mailto:tima at transtelecom.net] 
Sent: Thursday, 12 November, 2009 9:07
To: kszarkowicz at gmail.com
Cc: juniper-nsp at puck.nether.net
Subject: Re: [j-nsp] MX960 JunOS recommendations

First of all thanks to all who cares :)
I'll reply one by one

Derick Winkworth wrote:
 > How about some debugs or traceoptions?
 >
 >

traceoptions at last Jun says that box dosen't receive bgp notifications some 
times. haven't tried any more yet

sthaug at nethelp.no wrote:
 >
 > Make sure that your IP MTU is the same on both Cisco and Juniper sides.
 > If you run IS-IS, make sure your CLNS MTU is the same on both Cisco and
 > Juniper sides.

IP mtu are the same, otherwise ospf do not come up

 > People have been running interoperable Cisco and Juniper networks for
 > many years. This is not rocket science.

Yeah, we installed several Juns into our network several months ago and this is 
the only problem which we couldn't solve and rolled back to previous software

(well i do not count some rpd crashes on box with aggregated interfaces which we 
can avoid for now. jtac evetually said that its PR439627. I can't read this 
hidden PR, but its supposed to be fixed in 10.x and 9.3Rnextrelease )

Krzysztof Szarkowicz wrote:
> With MTUs around 9000 configured on ALL links in the network there should be no problem with BGP,
> since as per RFC4271, section 4:
> 
> The maximum message size is 4096 octets.  All implementations are required to support this maximum
> message size.
> 
> So even if MPLS and IP MTUs slightly differ, with sizes around 9000 it doesn't matter from BGP
> perspective.
> 
> The only thing that comes in my mind, that there are some L2 switches in between and there is
> something wrong with MTU on those switches. Worth to check.

There are no switches between them
its
7301-geoptic-7606-tengig-t1600-tengig-mx960

Its lab setup. On the real network it was slightly different, but actually its 
the same from this problem point of view

> Could you paste from the log the Notification message generated when the BGP session is tear down?

I didn't find any dependance from interfaces load or anything else.
It can be 3-4 gig load  (like it was on real network) or empty (like its in 
lab), bgp session  may drop once per minute or stay up for 30 - 60 mins.
Cisco can be either GSR or 7301, Juniper can be mx or T.

There is nothing special  in logs.
Thats the one from mx960:
Nov 12 06:18:31  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
rcvcc: 0 TCP state: 4, snd_una: 307818660 snd_nxt: 307818660 snd_wnd: 16230 
rcv_nxt: 614682635 rcv_adv: 614699019, hold timer 0
Nov 12 06:20:48  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
rcvcc: 0 TCP state: 4, snd_una: 1301747029 snd_nxt: 1301747029 snd_wnd: 16211 
rcv_nxt: 732160622 rcv_adv: 732177006, hold timer 0
Nov 12 06:22:53  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
rcvcc: 0 TCP state: 4, snd_una: 2024212109 snd_nxt: 2024212109 snd_wnd: 16230 
rcv_nxt: 3950965686 rcv_adv: 3950982070, hold timer 0
Nov 12 06:24:56  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
rcvcc: 0 TCP state: 4, snd_una: 2363347692 snd_nxt: 2363347692 snd_wnd: 16230 
rcv_nxt: 1449362513 rcv_adv: 1449378897, hold timer 0
Nov 12 06:59:09  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
rcvcc: 0 TCP state: 4, snd_una: 3704141975 snd_nxt: 3704141975 snd_wnd: 15985 
rcv_nxt: 2261397920 rcv_adv: 2261414304, hold timer 0
Nov 12 07:01:19  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
rcvcc: 0 TCP state: 4, snd_una: 1379635866 snd_nxt: 1379635866 snd_wnd: 16230 
rcv_nxt: 612357774 rcv_adv: 612374158, hold timer 0
Nov 12 07:04:06  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
rcvcc: 0 TCP state: 4, snd_una: 3377139997 snd_nxt: 3377139997 snd_wnd: 16211 
rcv_nxt: 544711184 rcv_adv: 544727568, hold timer 0
Nov 12 07:20:37  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
rcvcc: 0 TCP state: 4, snd_una: 3633708680 snd_nxt: 3633708680 snd_wnd: 16175 
rcv_nxt: 1216109422 rcv_adv: 1216125806, hold timer 0
Nov 12 07:22:54  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 0 
rcvcc: 0 TCP state: 4, snd_una: 4034247055 snd_nxt: 4034247055 snd_wnd: 16211 
rcv_nxt: 2010186633 rcv_adv: 2010203017, hold timer 0
Nov 12 07:25:00  mskl04ra rpd[1080]: bgp_hold_timeout:3571: NOTIFICATION sent to 
10.136.0.13 (Internal AS 20485): code 4 (Hold Timer Expired Error), Reason: 
holdtime expired for 10.136.0.13 (Internal AS 20485), socket buffer sndcc: 38 
rcvcc: 0 TCP state: 4, snd_una: 3122195868 snd_nxt: 3122195868 snd_wnd: 16268 
rcv_nxt: 209999860 rcv_adv: 210016244, hold timer 0

> 
> Thanks,
> Krzysztof
> 
> 
> 
> -----Original Message-----
> From: Tima Maryin [mailto:tima at transtelecom.net] 
> Sent: Wednesday, 11 November, 2009 15:12
> To: kszarkowicz at gmail.com
> Cc: juniper-nsp at puck.nether.net
> Subject: Re: [j-nsp] MX960 JunOS recommendations
> 
> Uhm, i see your point here.
> We indeed have cisco - cisco - Jun - Jun setup
> 
> 
> My cisco interface mtu = ip mtu = mpls mtu =9000
> But i reeeealy doubt that bgp keepalive packet size can come close to that mtu.
> 
> 
> On Juniper i set interface mtu = cisco mtu +14 and it works fine!
> And! As you say, it reports different mpls mtu value:
> 
>  > show interfaces xe-1/0/0 | match MTU
>    Link-level type: Ethernet, MTU: 9014, LAN-PHY mode, Speed: 10Gbps, Loopback: 
> None, Source filtering: Disabled,
>      Protocol inet, MTU: 9000
>      Protocol mpls, MTU: 8988
>      Protocol multiservice, MTU: Unlimited
> 
> 
> As far as i understand "default mpls mtu" term (not sure that i _fully_ 
> understand it though) it seems, Juniper supposes 3 labels maximum.
> I dont see any reasons for device to drop packets which has 1 or 2 labels and 
> bigger than mpls mtu, but still ok from interface mtu point ov view.
> 
> As per your logic, device should drop all traffic that match such criteria but 
> it seems only bgp session keepalives and i didn't see any other problems
> 
> 
> 
> But still, i made an experiment on Juniper and cisco which has bgp session 
> between them.
> 
> cisco:
> #sh mpls interfaces g 0/0 detail  | i MTU
>          MTU = 9000
> #sh ip int g 0/0 | i MTU
>    MTU is 9000 bytes
> #sh run int g 0/0
> Building configuration...
> 
> Current configuration : 212 bytes
> !
> interface GigabitEthernet0/0
>   description --- to 7606-2 ---
>   mtu 9000
>   ip address 10.3.13.2 255.255.255.0
>   load-interval 30
>   duplex full
>   speed 1000
>   media-type gbic
>   no negotiation auto
>   tag-switching ip
> end
> 
> 
> If i set mtu 9000 under family mpls and commit it, it looks like this:
> 
>  > show interfaces xe-1/0/0 | match MTU
>    Link-level type: Ethernet, MTU: 9014, LAN-PHY mode, Speed: 10Gbps, Loopback: 
> None, Source filtering: Disabled,
>      Protocol inet, MTU: 9000
>      Protocol mpls, MTU: 9000
>        Flags: Is-Primary, User-MTU
>      Protocol multiservice, MTU: Unlimited
> 
> 
> 
> and problem still persists
> 
> 
> 
> please let me know if you have any other ideas :)
> 
> 
> 
> p.s. Its the same effect if i set tag-sw mtu 8988 on cisco and leave it 
> 'default' (=8988) on juniper
> 
> 
> 
> 
> 
> 
> 
> 
> Krzysztof Szarkowicz wrote:
>> Let me guess.
>>
>> Your network is multivendor network (JNPR and CSCO) and some transit devices are CSCO?
>>
>> CSCO and JNPR uses different algorithm to calculate default MPLS MTU (if MPLS MTU is not
> explicitely
>> configured) which results in 4 byte difference between CSCO side and JNPR side of the same link
> for
>> MPLS MTU (the IP MTU is equal on both ends, so no problem with OSPF).
>>
>> If on JNPR side your MPLS MTU is say 1500 and on the CSCO side the MPLS MTU is 1504, when the
CSCO
>> device send an BGP update packet towards JNPR device with size 1502, this packet is dropped by
> JNPR
>> device (as it is to big), and TCP ACK is not sent back. CSCO is keeping by resending this 1502
> long
>> packet, and JNPR is constantly dropping. Thus, after hold timer expires, the Notification message
> is
>> sent.
>>
>> I assume that with 9.3.R3.8 you didn't catched the '1502' packet sizes.
>>
>> Could you check with some show commands, what is the MPLS MTU on both ends of the link (which is
>> terminated on CSCO on one side and JNPR on other side)?
>>
>> //Krzysztof
>>
>> -----Original Message-----
>> From: Tima Maryin [mailto:tima at transtelecom.net] 
>> Sent: Wednesday, 11 November, 2009 9:57
>> To: kszarkowicz at gmail.com
>> Cc: juniper-nsp at puck.nether.net
>> Subject: Re: [j-nsp] MX960 JunOS recommendations
>>
>> What did you mean by "inappropriately configured" ?
>>
>> There are the same mtu settings everywhere and traffic passes quite well.
>> And ospf session goes up without problems.
>>
>> And how comes that "inappropriately configured IP and MPLS MTU" work well on 
>> 9.3R3.8 ?
>>
>>
>> Krzysztof Szarkowicz wrote:
>>> It is not a nasty bug, but problem of inappropriately configured IP and MPLS MTUs on transit
>> nodes.
>>> //Krzysztof
>>>
>>> -----Original Message-----
>>> From: juniper-nsp-bounces at puck.nether.net [mailto:juniper-nsp-bounces at puck.nether.net] On Behalf
>> Of
>>> Tima Maryin
>>> Sent: Wednesday, 11 November, 2009 8:28
>>> To: juniper-nsp at puck.nether.net
>>> Subject: Re: [j-nsp] MX960 JunOS recommendations
>>>
>>> 9.3R4.4 has a nasty bug which occures in setup when you have bgp session over 
>>> chain of few routers/links with ospf/ldp
>>>
>>> bgp session occasionally goes down with notification timeout. Even when there is 
>>> no traffic at all and no physical errors
>>>
>>> rollback to 9.3r3 helps though
>>>
>>>
>>> JTAC still not confirmed it, but it easlily can be reprodused in lab
> 
> 
>