[j-nsp] Hardware issue with M7i

Brendan Mannella bmannella at teraswitch.com
Wed Aug 12 11:22:33 EDT 2009



All, 



My juniper m7i suddently rebooted today. The logs show the following. Can someone tell me what exactly failed. It appears the onboard hard disk was the issue, but i just wanted to verify. 



Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=18 e=03 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=28 e=03 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=09 e=09 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=1a e=09 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=2a e=09 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=0b e=0b 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=1d e=1d 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=2c e=1d 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=0d e=0d 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=1f e=1f 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=00 e=1f 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=0f e=0f 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=20 e=0f 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=01 e=01 
Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting 
Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices .. 
Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_TRACE_FAILED: bgp_keepalive_timeout: peer 204.16.241.252 (Internal AS 20326) last checked 18 last recv'd 17 last sent 17 last keepalive 29RPD_TRACE_FAILED: Unable to write to trace file /var/log/bgp 
Aug 12 10:08:11  ibr1.pit cfeb CM: ALARM SET: (Major) RE chassis socket closed abruptly 
Aug 12 10:08:11  ibr1.pit cfeb PFEMAN: Master socket closed 
Aug 12 10:08:11  ibr1.pit cfeb CM: Routing engine CM reconnection succeeded after 3 tries 
Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_hold_timeout: NOTIFICATION sent to 208.4.47.65 (External AS 1239): code 4 (Hold Timer Expired Error), Reason: holdtime expired for 208.4.47.65 (External AS 1239), socket buffer sndcc: 19 rcvcc: 1623 TCP state: 4, snd_una: 2628552212 snd_nxt: 2628552231 snd_wnd: 32350 rcv_nxt: 2226880899 rcv_adv: 2226895660, hold timer 0 
Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event: peer 208.4.47.65 (External AS 1239) old state Established event HoldTime new state Idle 
Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_SCHED_SLIP: 75 sec scheduler slip, user: 0 sec 0 usec, system: 0 sec, 2769 usec 
Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_PPM_WRITE_ERROR: ppm_send: write error on pipe to ppmd (Broken pipe) 
Aug 12 10:08:11  ibr1.pit cfeb CM: ALARM CLEAR: RE chassis socket closed abruptly 
Aug 12 10:08:11  ibr1.pit /kernel: pfe_listener_disconnect: conn dropped: listener idx=0, tnpaddr=0x2, reason: socket error 
Aug 12 10:08:11  ibr1.pit craftd[3121]:  Minor alarm set, Host 0 hard-disk drive error 
Aug 12 10:08:11  ibr1.pit alarmd[3120]: Alarm set: RE color=YELLOW, class=CHASSIS, reason=Host 0 hard-disk drive error 
Aug 12 10:08:11  ibr1.pit craftd[3121]: forwarding display request to chassisd: type = 4, subtype = 43 
Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_recv: read from peer 204.16.241.252 (Internal AS 20326) failed: Connection reset by peer 
Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event: peer 204.16.241.252 (Internal AS 20326) old state Established event Restart new state Idle 
Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_send: sending 24 bytes to 204.16.242.162 (Internal AS 20326) failed: Broken pipe 
Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_send: sending 24 bytes to 204.16.242.163 (Internal AS 20326) failed: Broken pipe 
Aug 12 10:08:15  ibr1.pit /kernel: ata0:  Failed to reset devices. 
Aug 12 10:08:15  ibr1.pit /kernel: ad1: removed from configuration due to failure 
Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices .. (after 1 seconds) 
Aug 12 10:08:15  ibr1.pit /kernel: ata0:  device dissapeared! 2 ata0: Finished resetting devices .. (after 1 seconds) 
Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices .. (after 1 seconds) 
Aug 12 10:08:15  ibr1.pit last message repeated 5 times 
Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices .. (after 2 seconds) 
Aug 12 10:08:15  ibr1.pit last message repeated 16 times 
Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices .. (after 3 seconds) 
Aug 12 10:08:15  ibr1.pit last message repeated 16 times 
Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices .. (after 4 seconds) 
Aug 12 10:08:15  ibr1.pit last message repeated 16 times 
Aug 12 10:08:21  ibr1.pit cfeb Frame 06: sp = 0x0113b9b8, pc = 0x000276b8 
Aug 12 10:08:22  ibr1.pit tnp.tftpd[30571]: open: /var/crash/core-CSBR0.core.4 Device not configured 
Aug 12 10:08:28  ibr1.pit xntpd[3122]: sendto(129.6.15.28): No route to host 
Aug 12 10:08:35  ibr1.pit /kernel: fxp1: link media DOWN 10Mb / half-duplex 
Aug 12 10:08:35  ibr1.pit mib2d[3125]: SNMP_TRAP_LINK_DOWN: ifIndex 2, ifAdminStatus up(1), ifOperStatus down(2), ifName fxp1 
Aug 12 10:08:36  ibr1.pit /kernel: fxp1: media DOWN 100Mb / full-duplex 
Aug 12 10:08:37  ibr1.pit /kernel: fxp1: link UP 100Mb / full-duplex 
Aug 12 10:08:57  ibr1.pit /kernel: fxp1: link media DOWN 10Mb / half-duplex 
Aug 12 10:08:57  ibr1.pit mib2d[3125]: SNMP_TRAP_LINK_DOWN: ifIndex 2, ifAdminStatus up(1), ifOperStatus down(2), ifName fxp1 
Aug 12 10:08:58  ibr1.pit /kernel: fxp1: media DOWN 100Mb / full-duplex 
Aug 12 10:08:59  ibr1.pit /kernel: fxp1: link UP 100Mb / full-duplex 
Aug 12 10:09:07  ibr1.pit rpd[3126]: task_connect: task BGP_20326.204.16.242.163+179 addr 204.16.242.163+179: No route to host 
Aug 12 10:09:07  ibr1.pit rpd[3126]: bgp_connect_start: connect 204.16.242.163 (Internal AS 20326): No route to host 
Aug 12 10:09:07  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on (jnxFruContentsIndex 6, jnxFruL1Index 1, jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName CFEB, jnxFruType 4, jnxFruSlot 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 434638088, jnxFruLastPowerOn 434638088) 
Aug 12 10:09:07  ibr1.pit craftd[30568]: Minor alarm cleared, Host 0 hard-disk drive error 
Aug 12 10:09:07  ibr1.pit craftd[30568]: forwarding display request to chassisd: type = 4, subtype = 44 
Aug 12 10:09:07  ibr1.pit cfeb CM: ALARM SET: (Major) Slot 0: CFEB not online, the box is not forwarding 
Aug 12 10:09:07  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_DETACH_FPC: ifdev_detach(0) 
Aug 12 10:09:07  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_DETACH_FPC: ifdev_detach(1) 
Aug 12 10:09:07  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_DETACH_ALL_PSEUDO: ifdev_detach(pseudo devices: all) 
Aug 12 10:09:07  ibr1.pit alarmd[30569]: Alarm cleared: RE color=YELLOW, class=CHASSIS, reason=Host 0 hard-disk drive error 
Aug 12 10:09:07  ibr1.pit cfeb CM: ALARM SET: (Minor) Slot 0: Host 0 hard-disk drive error 
Aug 12 10:09:08  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT: fpc_m40_recv_restart: restarted FPC 0 
Aug 12 10:09:08  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT: fpc_m40_recv_restart: restarted FPC 1 
Aug 12 10:09:11  ibr1.pit cfeb CM: ALARM CLEAR: Slot 0: Host 0 hard-disk drive error 
Aug 12 10:09:16  ibr1.pit craftd[30568]: Major alarm cleared, CFEB not online, the box is not forwarding 
Aug 12 10:09:16  ibr1.pit alarmd[30569]: Alarm cleared: CFEB color=RED, class=CHASSIS, reason=CFEB not online, the box is not forwarding 
Aug 12 10:09:16  ibr1.pit craftd[30568]: forwarding display request to chassisd: type = 4, subtype = 44 
Aug 12 10:09:17  ibr1.pit cfeb CM: ALARM CLEAR: Slot 0: CFEB not online, the box is not forwarding 
Aug 12 10:09:31  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT: scb_recv_slot_attach: attached FPC 0 
Aug 12 10:09:54  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT: scb_recv_slot_attach: attached FPC 1 
Aug 12 10:09:54  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index 2, jnxFruL2Index 3, jnxFruL3Index 0, jnxFruName PIC: 1x Tunnel @ 1/2/*, jnxFruType 11, jnxFruSlot 2, jnxFruOfflineReason 2, jnxFruLastPowerOff 0, jnxFruLastPowerOn 0) 
Aug 12 10:09:54  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index 2, jnxFruL2Index 4, jnxFruL3Index 0, jnxFruName PIC: 1x G/E, 1000 BASE @ 1/3/*, jnxFruType 11, jnxFruSlot 2, jnxFruOfflineReason 2, jnxFruLastPowerOff 0, jnxFruLastPowerOn 0) 
Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for pd-1/2/0 
Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for pe-1/2/0 
Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for gr-1/2/0 
Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ip-1/2/0 
Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for vt-1/2/0 
Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for mt-1/2/0 
Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for lt-1/2/0 
Aug 12 10:09:56  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-1/3/0 
Aug 12 10:09:57  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on (jnxFruContentsIndex 7, jnxFruL1Index 1, jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC:  @ 0/*/*, jnxFruType 3, jnxFruSlot 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 0, jnxFruLastPowerOn 434643122) 
Aug 12 10:09:57  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on (jnxFruContentsIndex 7, jnxFruL1Index 2, jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC:  @ 1/*/*, jnxFruType 3, jnxFruSlot 2, jnxFruOfflineReason 2, jnxFruLastPowerOff 0, jnxFruLastPowerOn 434643140) 
Aug 12 10:10:18  ibr1.pit rpd[3126]: RPD_SCHED_SLIP: 7 sec scheduler slip, user: 6 sec 626912 usec, system: 0 sec, 236833 usec 
Aug 12 10:10:57  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event: peer 204.16.242.163 (Internal AS 20326) old state OpenConfirm event RecvKeepAlive new state Established 
Aug 12 10:11:00  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event: peer 204.16.242.162 (Internal AS 20326) old state OpenConfirm event RecvKeepAlive new state Established 
Aug 12 10:11:05  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event: peer 208.4.47.65 (External AS 1239) old state OpenConfirm event RecvKeepAlive new state Established 
Aug 12 10:11:11  ibr1.pit rpd[3126]: 204.16.241.252 (Internal AS 20326): reseting pending active connection 
Aug 12 10:11:11  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event: peer 204.16.241.252 (Internal AS 20326) old state OpenConfirm event RecvKeepAlive new state Established 
Aug 12 10:12:59  ibr1.pit rpd[3126]: bgp_nexthop_sanity: peer 204.16.241.252 (Internal AS 20326) next hop 204.16.242.172 local, ignoring routes in this update 
Aug 12 10:12:59  ibr1.pit rpd[3126]: bgp_nexthop_sanity: peer 204.16.241.252 (Internal AS 20326) next hop 204.16.242.172 local, ignoring routes in this update 
Aug 12 10:21:58  ibr1.pit /kernel: ad1: ata_command: timeout waiting for intr 
Aug 12 10:21:58  ibr1.pit /kernel: ad1: error executing command b0ata0: resetting devices .. 
Aug 12 10:21:58  ibr1.pit /kernel: ata0: Finished resetting devices .. (after 0 seconds) 
Aug 12 10:21:58  ibr1.pit /kernel: ata0: WARNING: active changed while DKIOCMDin progress 
Aug 12 10:21:58  ibr1.pit smartd[3139]: atareadsmartthresholds: ioctl: Device busy 
Aug 12 10:21:58  ibr1.pit smartd[3139]: /dev/ad1a: Device smart_check, non zero return from atacheckdevice 


More information about the juniper-nsp mailing list