[j-nsp] RPD Crash on M320

Alireza Soltanian soltanian at gmail.com
Mon Jan 4 10:14:17 EST 2016


Hi
Yes I checked the CPU graph and there was a spike on CPU load.
The link was flappy 20 minutes before crash. Also it remained flappy two
hours after this crash. During this time we can see LDP sessions go UP DOWN
over and over. But the only time there was a crash was this time and there
is no spike on CPU.
I must mention we had another issue with another M320. Whenever a link
flapped, CPU of RPD went high and all OSPF sessions reset. I found out the
root cause for that. It was traceoption for LDP. For this box we dont use
traceoption.
Is there any way to read the dump?

Thank you
On Jan 4, 2016 6:34 PM, "Niall Donaghy" <Niall.Donaghy at geant.org> wrote:

> Hi Alireza,
>
> It seemed to me this event could be related to the core dump: Jan  3
> 00:31:28  apa-rtr-028 /kernel: jsr_prl_recv_ack_msg(): received PRL ACK
> message on non-active socket w/handle 0x10046fa0000004e
> However upon further investigation
> (http://kb.juniper.net/InfoCenter/index?page=content&id=KB18195) I see
> these
> messages are normal/harmless.
>
> Do you have Cacti graphs of CPU utilisation for both REs, before the rpd
> crash? Link flapping may be giving rise to CPU hogging, leading to
> instability and subsequent rpd crash.
> Was the link particularly flappy just before the crash?
>
> Kind regards,
> Niall
>
>
>
>
> > -----Original Message-----
> > From: juniper-nsp [mailto:juniper-nsp-bounces at puck.nether.net] On Behalf
> Of
> > Alireza Soltanian
> > Sent: 04 January 2016 11:04
> > To: juniper-nsp at puck.nether.net
> > Subject: [j-nsp] RPD Crash on M320
> >
> > Hi everybody
> >
> > Recently, we had continuous link flap between our M320 and remote sites.
> We
> > have a lot of L2Circuits between these sites on our M320. At one point we
> had
> > crash on RPD process which lead to following log. I must mention the link
> flap
> > started at 12:10AM and it was continued until 2:30AM. But Crash was
> occurred
> > at 12:30AM.
> >
> >
> >
> > Jan  3 00:31:04  apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session
> > 10.237.253.168 is down, reason: received notification from peer
> >
> > Jan  3 00:31:05  apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session
> > 10.237.254.1 is down, reason: received notification from peer
> >
> > Jan  3 00:31:05  apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session
> > 10.237.253.120 is down, reason: received notification from peer
> >
> > Jan  3 00:31:05  apa-rtr-028 /kernel: jsr_prl_recv_ack_msg(): received
> PRL
> ACK
> > message on non-active socket w/handle 0x1008af8000001c6
> >
> > Jan  3 00:31:06  apa-rtr-028 rpd[42128]: RPD_LDP_SESSIONDOWN: LDP session
> > 10.237.253.192 is down, reason: received notification from peer
> >
> > Jan  3 00:31:28  apa-rtr-028 /kernel: jsr_prl_recv_ack_msg(): received
> PRL
> ACK
> > message on non-active socket w/handle 0x10046fa0000004e
> >
> >
> >
> > Jan  3 00:32:18  apa-rtr-028 init: routing (PID 42128) terminated by
> signal
> > number 6. Core dumped!
> >
> > Jan  3 00:32:18  apa-rtr-028 init: routing (PID 18307) started
> >
> > Jan  3 00:32:18  apa-rtr-028 rpd[18307]: L2CKT acquiring mastership for
> primary
> >
> > Jan  3 00:32:18  apa-rtr-028 rpd[18307]: L2VPN acquiring mastership for
> primary
> >
> > Jan  3 00:32:20  apa-rtr-028 rpd[18307]: RPD_KRT_KERNEL_BAD_ROUTE: KRT:
> > lost ifl 0 for route (null)
> >
> > Jan  3 00:32:20  apa-rtr-028 last message repeated 65 times
> >
> > Jan  3 00:32:20  apa-rtr-028 rpd[18307]: L2CKT acquiring mastership for
> primary
> >
> > Jan  3 00:32:20  apa-rtr-028 rpd[18307]: Primary starts deleting all
> L2circuit IFL
> > Repository
> >
> > Jan  3 00:32:20  apa-rtr-028 rpd[18307]: RPD_TASK_BEGIN: Commencing
> routing
> > updates, version 11.2R2.4, built 2011-09-01 06:53:31 UTC by builder
> >
> >
> >
> > Jan  3 00:32:21  apa-rtr-028 mib2d[33413]: SNMP_TRAP_LINK_DOWN: ifIndex
> > 1329, ifAdminStatus up(1), ifOperStatus down(2), ifName ae1.1041
> >
> > Jan  3 00:32:21  apa-rtr-028 mib2d[33413]: SNMP_TRAP_LINK_DOWN: ifIndex
> > 1311, ifAdminStatus up(1), ifOperStatus down(2), ifName ae1.1039
> >
> > Jan  3 00:32:21  apa-rtr-028 mib2d[33413]: SNMP_TRAP_LINK_DOWN: ifIndex
> > 1312, ifAdminStatus up(1), ifOperStatus down(2), ifName ae1.1038
> >
> >
> >
> > The case is we always have this kind of log (except the Crash) on the
> device. Is
> > there any clue why RPD process crashed? I don't have access to JTAC so I
> cannot
> > analyze the dump.
> >
> > The JunOS version is : 11.2R2.4
> >
> >
> >
> > Thank you for your help and support
> >
> > _______________________________________________
> > juniper-nsp mailing list juniper-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/juniper-nsp
>


More information about the juniper-nsp mailing list