[j-nsp] RPD Crash on M320

Niall Donaghy Niall.Donaghy at geant.org
Mon Jan 4 10:04:18 EST 2016


Hi Alireza,

It seemed to me this event could be related to the core dump: Jan  3
00:31:28  apa-rtr-028 /kernel: jsr_prl_recv_ack_msg(): received PRL ACK
message on non-active socket w/handle 0x10046fa0000004e
However upon further investigation
(http://kb.juniper.net/InfoCenter/index?page=content&id=KB18195) I see these
messages are normal/harmless.

Do you have Cacti graphs of CPU utilisation for both REs, before the rpd
crash? Link flapping may be giving rise to CPU hogging, leading to
instability and subsequent rpd crash.
Was the link particularly flappy just before the crash?

Kind regards,
Niall




> -----Original Message-----
> From: juniper-nsp [mailto:juniper-nsp-bounces at puck.nether.net] On Behalf
Of
> Alireza Soltanian
> Sent: 04 January 2016 11:04
> To: juniper-nsp at puck.nether.net
> Subject: [j-nsp] RPD Crash on M320
> 
> Hi everybody
> 
> Recently, we had continuous link flap between our M320 and remote sites.
We
> have a lot of L2Circuits between these sites on our M320. At one point we
had
> crash on RPD process which lead to following log. I must mention the link
flap
> started at 12:10AM and it was continued until 2:30AM. But Crash was
occurred
> at 12:30AM.
> 
> 
> 
> Jan  3 00:31:04  apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session
> 10.237.253.168 is down, reason: received notification from peer
> 
> Jan  3 00:31:05  apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session
> 10.237.254.1 is down, reason: received notification from peer
> 
> Jan  3 00:31:05  apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session
> 10.237.253.120 is down, reason: received notification from peer
> 
> Jan  3 00:31:05  apa-rtr-028 /kernel: jsr_prl_recv_ack_msg(): received PRL
ACK
> message on non-active socket w/handle 0x1008af8000001c6
> 
> Jan  3 00:31:06  apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session
> 10.237.253.192 is down, reason: received notification from peer
> 
> Jan  3 00:31:28  apa-rtr-028 /kernel: jsr_prl_recv_ack_msg(): received PRL
ACK
> message on non-active socket w/handle 0x10046fa0000004e
> 
> 
> 
> Jan  3 00:32:18  apa-rtr-028 init: routing (PID 42128) terminated by
signal
> number 6. Core dumped!
> 
> Jan  3 00:32:18  apa-rtr-028 init: routing (PID 18307) started
> 
> Jan  3 00:32:18  apa-rtr-028 rpd[18307]: L2CKT acquiring mastership for
primary
> 
> Jan  3 00:32:18  apa-rtr-028 rpd[18307]: L2VPN acquiring mastership for
primary
> 
> Jan  3 00:32:20  apa-rtr-028 rpd[18307]: RPD_KRT_KERNEL_BAD_ROUTE: KRT:
> lost ifl 0 for route (null)
> 
> Jan  3 00:32:20  apa-rtr-028 last message repeated 65 times
> 
> Jan  3 00:32:20  apa-rtr-028 rpd[18307]: L2CKT acquiring mastership for
primary
> 
> Jan  3 00:32:20  apa-rtr-028 rpd[18307]: Primary starts deleting all
L2circuit IFL
> Repository
> 
> Jan  3 00:32:20  apa-rtr-028 rpd[18307]: RPD_TASK_BEGIN: Commencing
routing
> updates, version 11.2R2.4, built 2011-09-01 06:53:31 UTC by builder
> 
> 
> 
> Jan  3 00:32:21  apa-rtr-028 mib2d[33413]: SNMP_TRAP_LINK_DOWN: ifIndex
> 1329, ifAdminStatus up(1), ifOperStatus down(2), ifName ae1.1041
> 
> Jan  3 00:32:21  apa-rtr-028 mib2d[33413]: SNMP_TRAP_LINK_DOWN: ifIndex
> 1311, ifAdminStatus up(1), ifOperStatus down(2), ifName ae1.1039
> 
> Jan  3 00:32:21  apa-rtr-028 mib2d[33413]: SNMP_TRAP_LINK_DOWN: ifIndex
> 1312, ifAdminStatus up(1), ifOperStatus down(2), ifName ae1.1038
> 
> 
> 
> The case is we always have this kind of log (except the Crash) on the
device. Is
> there any clue why RPD process crashed? I don't have access to JTAC so I
cannot
> analyze the dump.
> 
> The JunOS version is : 11.2R2.4
> 
> 
> 
> Thank you for your help and support
> 
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp


More information about the juniper-nsp mailing list