[j-nsp] MX480 MS-MPC-128G CHASSISD_SNMP_TRAP10 jnxFruOfflineReason 8 but no button press
Michael Gehrmann
mgehrmann at atlassian.com
Wed Feb 8 22:31:06 EST 2017
Hi David,
Might be worth checking for core dumps. I'd also do a PR search for and
check on release notes for later releases. I have previously found on rare
occasion MS cards can get into weird corner cases which normally involve
JTAC to resolve.
Regards
Mike
On 9 February 2017 at 14:14, David B Funk <dbfunk at engineering.uiowa.edu>
wrote:
> We have a MX480 with a pair of MS-MPC-128G service boards that are tied
> together as a 'ams' (mams-2 & mams-3 ) service aggregation for reliability.
>
> Occasionally one of them, for no apparent reason, will go offline and then
> back online while logging in 'chassid' log:
>
> CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on
> (jnxFruContentsIndex 8, jnxFruL1Index 4, jnxFruL2Index 1, jnxFruL3Index 0,
> jnxFruName PIC: MS-MPC-PIC @ 3/0/*, jnxFruType 11, jnxFruSlot 3,
> jnxFruOfflineReason 2, jnxFruLastPowerOff 1052212977, jnxFruLastPowerOn
> 1052213068)
> (as well as a bunch of other stuff).
>
> According to Junos docs, "jnxFruOfflineReason 8" -> "buttonPress(8), --
> offlined by button press"
> But I know that nobody was in the room at the time of those incidents, so
> the button couldn't have been pressed.
>
> I hadn't paid too much attention to this as it was only happening
> occasionally and was either one board or the other. But today there was a
> whole spate of such incidents (20 in less than 45 minutes) and at one point
> it took both MPCs off line at the same time (thus noticable
> service-interruptus ).
>
> In the 'messages' log there are lines that correspond:
>
> /kernel: peer_input_pending_internal:[4506] VKS0 for peer type 22 indx
> 12 reported a sb_state 32 = SBS_CANTRCVMORE
> /kernel: peer_inputs:4766 VKS0 closing connection peer type 22 indx 12
> err 5
> /kernel: pfe_listener_disconnect: conn dropped: listener idx=7,
> tnpaddr=0x13010080, reason: generic peer error
> datapath-traced[3960]: datapath_traced_connection_event_handler:
> Disconnected from MSPMAND
> mspd[3958]: Removed PIC connection state for fpc=3 pic=0
> session=0x827a180
> (FPC Slot 3, PIC Slot 0) ms30 kernel: svcs_ms2_app_sigcore_exit:
> sending UKERN_ST_DOWN (pid=190, td=0xc00000000291f960, sig=6)
> (FPC Slot 3, PIC Slot 0) ms30 mspsmd[178]: mspsmd_connection_shutdown:
> Unexpected shutdown of connection, try reconnecting.
> /kernel: if_pfe_services_health_status: Generating Health status (down)
> msg for ifd : ms-3/0/0
> /kernel: if_pfe_services_health_status: Generating health status (down)
> for AMS member mams-3/0/0
> /kernel: if_pfe_ams_process_single_event: ifd:mams-3/0/0, ev =
> AMS_EV_MEMBER_HSTATUS_DOWN agg_state UP, member_state: ACTIVE,
> member_present_count = 2
> /kernel: if_pfe_ams_process_member_down_event:Starting Discard Timer
> /kernel: aggr_link_op: link mams-3/0/0.1 (lidx=1) detached from bundle
> ams0.1
> /kernel: if_pfe_ams_process_single_event:Done:mams-3/0/0, ev =
> AMS_EV_MEMBER_HSTATUS_DOWN agg_state UP, member_state: DISCARD,
> member_present_count = 2
> /kernel: if_pfe_services_send_lb_options: PEER_BUILD_IPC_SLOT return
> NULL
> last message repeated 4 times
> mib2d[3969]: SNMP_TRAP_LINK_DOWN: ifIndex 641, ifAdminStatus up(1),
> ifOperStatus down(2), ifName ms-3/0/0.0
> mib2d[3969]: SNMP_TRAP_LINK_DOWN: ifIndex 734, ifAdminStatus up(1),
> ifOperStatus down(2), ifName mams-3/0/0.1
> (FPC Slot 3, PIC Slot 0) ms30 kernel: msgring_drain_process: bind
> thread to hwtid (5) cpuid(5)
> (FPC Slot 3, PIC Slot 0) ms30 kernel: Kmernel thread "msgdrainthr5"
> (pid 21832) exited prematurely.
>
> Usually it runs for days at a time with out a single one of these
> incidents.
> So I cannot tell if I've got a hardware flakey or a software bug that is
> being triggered by some external events.
>
> Any suggestions? (other than opening a jtac case).
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://urldefense.proofpoint.com/v2/url?u=https-3A__puck.ne
> ther.net_mailman_listinfo_juniper-2Dnsp&d=DwICAg&c=wBUwXtM9s
> Khff6UeHOQgvw&r=iCARHrCSMVMu5fNENyuQGdvoQJpwI5WIbiqe9jFEMFg&
> m=XA7G1eLizI_SB_PtEfaugLI3dfFDoy-OpLfVObS3k2s&s=8_SDm_
> ZHLrndQoPMH2Xuvf0V2n-l-UiOloc3VthxWHY&e=
--
Michael Gehrmann
Senior Network Engineer - Atlassian
m: +61 407 570 658
More information about the juniper-nsp
mailing list