[j-nsp] m10i Nastiness Friday night
Dan Rautio
drautio at juniper.net
Mon Aug 17 13:19:29 EDT 2009
This message stands out:
> Aug 14 23:38:51 JuniperM10i-HMNDLAMA cfeb mpc106 error detection reg2: ECC multibit
> -----Original Message-----
> From: juniper-nsp-bounces at puck.nether.net [mailto:juniper-nsp-
> bounces at puck.nether.net] On Behalf Of Nilesh Khambal
> Sent: Monday, August 17, 2009 10:57 AM
> To: Clue Store
> Cc: juniper-nsp at puck.nether.net
> Subject: Re: [j-nsp] m10i Nastiness Friday night
>
> It looks like CFEB dumped core and restarted. Please open a JTAC case
> and let me them figure out what went wrong with CFEB. Please gather all
> logs around the time of the problem. Usually following logs should be a
> good start.
>
> - show log messages[.(0-9).gz] (From RE)
> - show syslog messages (from CFEB)
> - show nvram (from CFEB).
> - CFEB coredump file generated under "/var/tmp"
> - Any other surrounding information such temperature, memory, CPU
> information about RE and CFEB around the time of the problem.
>
> Given the old version of code you are running on the box, this may be a
> known issue fixed in later release such as 8.5 which you are running on
> the other box. Let JTAC analyze that.
>
> Thanks,
> Nilesh.
>
> Clue Store wrote:
> > Hi All,
> >
> > Last friday we had some nastiness on one of our m10i's. As I am not a
> > Juniper expert, I was wondering if someone could decipher the log
> messages
> > and determine if is possibly a CFEB issue, or just a fluke Junos issue
> and
> > whether I should do anything or let it be and see if it does it again. I
> > have another m10i running 8.5, so I am thinking of just upgrading this
> box
> > to the same as my other, but i'd like to hear what some of you on the
> list
> > think.
> >
> > TIA,
> > Clue
> >
> > Hostname: JuniperM10i-HMNDLAMA
> > Model: m10i
> > JUNOS Base OS boot [8.0R2.8]
> > JUNOS Base OS Software Suite [8.0R2.8]
> > JUNOS Kernel Software Suite [8.0R2.8]
> > JUNOS Packet Forwarding Engine Support (M7i/M10i) [8.0R2.8]
> > JUNOS Routing Software Suite [8.0R2.8]
> > JUNOS Online Documentation [8.0R2.8]
> >
> >
> > Aug 14 23:38:51 JuniperM10i-HMNDLAMA cfeb mpc106 machine check caused
> by
> > error on the Processor Bus
> > Aug 14 23:38:51 JuniperM10i-HMNDLAMA cfeb mpc106 PCI status register:
> > 0x0020, error detect register 1: 0x00, 2: 0x08
> > Aug 14 23:38:51 JuniperM10i-HMNDLAMA cfeb mpc106 error ack count = 0
> > Aug 14 23:38:51 JuniperM10i-HMNDLAMA cfeb mpc106 error address:
> 0x0f3827f8
> > Aug 14 23:38:51 JuniperM10i-HMNDLAMA cfeb mpc106 Processor bus error
> status
> > register: 0x52
> > Aug 14 23:38:51 JuniperM10i-HMNDLAMA cfeb transfer type 0b01010,
> transfer
> > size 2
> > Aug 14 23:38:51 JuniperM10i-HMNDLAMA cfeb mpc106 error detection reg2:
> ECC
> > multibit
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb ^B
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb last message repeated 6 times
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Context: Interrupt Level (0)
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Registers:
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb R00: 0x00000446 R01:
> 0x00799450
> > R02: 0x00000000 R03: 0x4f3827fc
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb R04: 0x00000552 R05:
> 0x00000000
> > R06: 0x007994a0 R07: 0x00000004
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb R08: 0x00000548 R09:
> 0x0017f48b
> > R10: 0x00000002 R11: 0xb0c7d8ec
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb R12: 0x28002044 R13:
> 0x02420020
> > R14: 0xf1ae2100 R15: 0x82600020
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb R16: 0x442104c2 R17:
> 0x2248000b
> > R18: 0x00670000 R19: 0x00670000
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb R20: 0x00670000 R21:
> 0x006ce5a0
> > R22: 0x007902d0 R23: 0x00670000
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb R24: 0x00000002 R25:
> 0x00000004
> > R26: 0x0080bd40 R27: 0x0000ffff
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb R28: 0x00000001 R29:
> 0x00000001
> > R30: 0x4f38271c R31: 0x4f382714
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb MSR: 0x00089030 CTR:
> 0x00000239
> > Link:0x002e34c8 SP: 0x00799450
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb CCR: 0x48002028 XER:
> 0x20000000
> > PC: 0x00460320
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb DSISR: 0x00000000 DAR:
> 0x00000000
> > K_MSR: 0x00000030
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Stack Traceback:
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 01: sp = 0x00799450, pc
> =
> > 0x0000c001
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 02: sp = 0x00799468, pc
> =
> > 0x002e4d74
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 03: sp = 0x00799498, pc
> =
> > 0x002e35e0
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 04: sp = 0x007994b8, pc
> =
> > 0x002e3bb0
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 05: sp = 0x007994c0, pc
> =
> > 0x00058818
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 06: sp = 0x007994d8, pc
> =
> > 0x0003df34
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 07: sp = 0x00799500, pc
> =
> > 0x003b4488
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 08: sp = 0x00799530, pc
> =
> > 0x003b4660
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 09: sp = 0x00799548, pc
> =
> > 0x003b3ed0
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 10: sp = 0x007995c8, pc
> =
> > 0x003b3d30
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 11: sp = 0x007995e8, pc
> =
> > 0x000b9f6c
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 12: sp = 0x00799610, pc
> =
> > 0x000b8928
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 13: sp = 0x00799628, pc
> =
> > 0x00448518
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 14: sp = 0x00799678, pc
> =
> > 0x00442d00
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 15: sp = 0x00799698, pc
> =
> > 0x0003a500
> > Aug 14 23:38:52 JuniperM10i-HMNDLAMA cfeb Frame 16: sp = 0x007996b0, pc
> =
> > 0x0003b268
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA /kernel: rdp keepalive expired,
> > connection dropped - src 1:1021 dest 2:15360
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA craftd[2999]: Major alarm set,
> CFEB
> > not online, the box is not forwarding
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA alarmd[2998]: Alarm set: CFEB
> > color=RED, class=CHASSIS, reason=CFEB not online, the box is not
> forwarding
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA craftd[2999]: forwarding display
> > request to chassisd: type = 4, subtype = 43
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_SHUTDOWN_NOTICE: Shutdown reason: CFEB connection lost
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_IFDEV_DETACH_FPC: ifdev_detach(0)
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA mib2d[3111]: SNMP_TRAP_LINK_DOWN:
> > ifIndex 77, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/0
> >
> > (Lots of BGP notifications due to interface down issues)
> >
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA snmpd[3132]: SNMPD_SEND_FAILURE:
> > trap_io_send_trap_now: send to (207.29.223.55) failure: Network is down
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA alarmd[2998]: shutting down
> chassisd
> > connection: chassisd ipc pipe read error
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA craftd[2999]:
> > craftd_user_conn_shutdown: socket 5, errno = 0
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA craftd[2999]: chassisd connection
> > succeeded after 0 retries
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA alarmd[2998]: chassisd connection
> > succeeded after 0 retries
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA mib2d[3111]: SNMP_TRAP_LINK_DOWN:
> > ifIndex 80, ifAdminStatus down(2), ifOperStatus down(2), ifName ge-
> 1/0/0.462
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA alarmd[2998]: resending alarm
> state
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA craftd[2999]: forwarding display
> > request to chassisd: type = 4, subtype = 43
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA alarmd[2998]: Alarm set: CFEB
> > color=RED, class=CHASSIS, reason=CFEB not online, the box is not
> forwarding
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA alarmd[2998]: Alarm set: RE
> color=RED,
> > class=CHASSIS, reason=Host 0 fxp0: Ethernet Link Down
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA craftd[2999]: forwarding display
> > request to chassisd: type = 4, subtype = 43
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA alarmd[2998]: Alarm set: RE
> color=RED,
> > class=CHASSIS, reason=Host 1 fxp0: Ethernet Link Down
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA craftd[2999]: forwarding display
> > request to chassisd: type = 4, subtype = 43
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA /kernel: rdp keepalive expired,
> > connection dropped - src 1:1020 dest 2:15361
> > Aug 14 23:38:56 JuniperM10i-HMNDLAMA /kernel: pfe_listener_disconnect:
> conn
> > dropped: listener idx=0, tnpaddr=0x2, reason: socket error
> > Aug 14 23:39:41 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_BLOWERS_SPEED_FULL: Fans and impellers being set to full speed
> > [system warm]
> > Aug 14 23:40:09 JuniperM10i-HMNDLAMA chassisd[2997]:
> CHASSISD_SNMP_TRAP10:
> > SNMP trap generated: FRU power on (jnxFruContentsIndex 6, jnxFruL1Index
> 1,
> > jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName CFEB 0, jnxFruType 4,
> > jnxFruSlot 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 0,
> jnxFruLastPowerOn
> > 0)
> > Aug 14 23:40:09 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_IFDEV_DETACH_FPC: ifdev_detach(0)
> > Aug 14 23:40:09 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_IFDEV_DETACH_FPC: ifdev_detach(1)
> > Aug 14 23:40:09 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_IFDEV_DETACH_ALL_PSEUDO: ifdev_detach(pseudo devices: all)
> > Aug 14 23:40:09 JuniperM10i-HMNDLAMA craftd[2999]: Major alarm cleared,
> > Host 0 fxp0: Ethernet Link Down
> > Aug 14 23:40:09 JuniperM10i-HMNDLAMA alarmd[2998]: Alarm cleared: RE
> > color=RED, class=CHASSIS, reason=Host 0 fxp0: Ethernet Link Down
> > Aug 14 23:40:09 JuniperM10i-HMNDLAMA craftd[2999]: forwarding display
> > request to chassisd: type = 4, subtype = 44
> > Aug 14 23:40:09 JuniperM10i-HMNDLAMA cfeb CM: ALARM SET: (Major) Slot
> 0:
> > CFEB not online, the box is not forwarding
> > Aug 14 23:40:09 JuniperM10i-HMNDLAMA cfeb CM: ALARM SET: (Major) Slot
> 0:
> > Host 0 fxp0: Ethernet Link Down
> > Aug 14 23:40:09 JuniperM10i-HMNDLAMA cfeb CM: ALARM SET: (Major) Slot
> 1:
> > Host 1 fxp0: Ethernet Link Down
> > Aug 14 23:40:10 JuniperM10i-HMNDLAMA chassisd[2997]:
> CHASSISD_FRU_EVENT:
> > fpc_m40_recv_restart: restarted FPC 0
> > Aug 14 23:40:10 JuniperM10i-HMNDLAMA chassisd[2997]:
> CHASSISD_FRU_EVENT:
> > fpc_m40_recv_restart: restarted FPC 1
> > Aug 14 23:40:12 JuniperM10i-HMNDLAMA craftd[2999]: Major alarm set,
> Host 0
> > fxp0: Ethernet Link Down
> > Aug 14 23:40:12 JuniperM10i-HMNDLAMA alarmd[2998]: Alarm set: RE
> color=RED,
> > class=CHASSIS, reason=Host 0 fxp0: Ethernet Link Down
> > Aug 14 23:40:12 JuniperM10i-HMNDLAMA craftd[2999]: forwarding display
> > request to chassisd: type = 4, subtype = 43
> > Aug 14 23:40:12 JuniperM10i-HMNDLAMA cfeb CM: ALARM CLEAR: Slot 0: Host
> 0
> > fxp0: Ethernet Link Down
> > Aug 14 23:40:17 JuniperM10i-HMNDLAMA craftd[2999]: Major alarm cleared,
> > CFEB not online, the box is not forwarding
> > Aug 14 23:40:17 JuniperM10i-HMNDLAMA alarmd[2998]: Alarm cleared: CFEB
> > color=RED, class=CHASSIS, reason=CFEB not online, the box is not
> forwarding
> > Aug 14 23:40:17 JuniperM10i-HMNDLAMA craftd[2999]: forwarding display
> > request to chassisd: type = 4, subtype = 44
> > Aug 14 23:40:17 JuniperM10i-HMNDLAMA cfeb CM: ALARM SET: (Major) Slot
> 0:
> > Host 0 fxp0: Ethernet Link Down
> > Aug 14 23:40:32 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_BLOWERS_SPEED: Fans and impellers are now running at normal
> speed
> > Aug 14 23:40:33 JuniperM10i-HMNDLAMA chassisd[2997]:
> CHASSISD_FRU_EVENT:
> > scb_recv_slot_attach: attached FPC 0
> > Aug 14 23:40:55 JuniperM10i-HMNDLAMA chassisd[2997]:
> CHASSISD_FRU_EVENT:
> > scb_recv_slot_attach: attached FPC 1
> > Aug 14 23:40:57 JuniperM10i-HMNDLAMA chassisd[2997]:
> CHASSISD_SNMP_TRAP10:
> > SNMP trap generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index
> 1,
> > jnxFruL2Index 1, jnxFruL3Index 0, jnxFruName PIC: 1x G/E, 1000 BASE-SX @
> > 0/0/*, jnxFruType 11, jnxFruSlot 1, jnxFruOfflineReason 2,
> > jnxFruLastPowerOff 0, jnxFruLastPowerOn 0)
> > Aug 14 23:40:57 JuniperM10i-HMNDLAMA chassisd[2997]:
> CHASSISD_SNMP_TRAP10:
> > SNMP trap generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index
> 2,
> > jnxFruL2Index 1, jnxFruL3Index 0, jnxFruName PIC: 1x G/E, 1000 BASE-SX @
> > 1/0/*, jnxFruType 11, jnxFruSlot 2, jnxFruOfflineReason 2,
> > jnxFruLastPowerOff 0, jnxFruLastPowerOn 0)
> > Aug 14 23:40:57 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for
> > ge-0/0/0
> > Aug 14 23:40:58 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for
> > ge-1/0/0
> > Aug 14 23:40:58 JuniperM10i-HMNDLAMA chassisd[2997]:
> CHASSISD_SNMP_TRAP10:
> > SNMP trap generated: FRU power on (jnxFruContentsIndex 7, jnxFruL1Index
> 1,
> > jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC: @ 0/*/*, jnxFruType
> 3,
> > jnxFruSlot 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 0,
> jnxFruLastPowerOn
> > 0)
> >
> > (BGP notifications that peers are responding)
> >
> >
> > Aug 14 23:42:22 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_BLOWERS_SPEED_FULL: Fans and impellers being set to full speed
> > [system warm]
> > Aug 14 23:43:22 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_BLOWERS_SPEED: Fans and impellers are now running at normal
> speed
> > Aug 14 23:44:02 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_BLOWERS_SPEED_FULL: Fans and impellers being set to full speed
> > [system warm]
> > Aug 14 23:44:37 JuniperM10i-HMNDLAMA chassisd[2997]:
> > CHASSISD_BLOWERS_SPEED: Fans and impellers are now running at normal
> speed
> > _______________________________________________
> > juniper-nsp mailing list juniper-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/juniper-nsp
> > .
> >
>
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
More information about the juniper-nsp
mailing list