[j-nsp] Hardware issue with M7i

Shiva Shankar shankarks at gmail.com
Mon Aug 17 09:52:17 EDT 2009


It seems that the hard-drive is faulty (meaning RE)....don't you have
Compact Flash (ad0) installed on the RE..if not, you need replace the RE..

Cheers

On Wed, Aug 12, 2009 at 4:22 PM, Brendan Mannella
<bmannella at teraswitch.com>wrote:

>
>
> All,
>
>
>
> My juniper m7i suddently rebooted today. The logs show the following. Can
> someone tell me what exactly failed. It appears the onboard hard disk was
> the issue, but i just wanted to verify.
>
>
>
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=18
> e=03
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=28
> e=03
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=09
> e=09
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=1a
> e=09
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=2a
> e=09
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=0b
> e=0b
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=1d
> e=1d
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=2c
> e=1d
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=0d
> e=0d
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=1f
> e=1f
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=00
> e=1f
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=0f
> e=0f
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=20
> e=0f
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=01
> e=01
> Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command - resetting
> Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_TRACE_FAILED:
> bgp_keepalive_timeout: peer 204.16.241.252 (Internal AS 20326) last checked
> 18 last recv'd 17 last sent 17 last keepalive 29RPD_TRACE_FAILED: Unable to
> write to trace file /var/log/bgp
> Aug 12 10:08:11  ibr1.pit cfeb CM: ALARM SET: (Major) RE chassis socket
> closed abruptly
> Aug 12 10:08:11  ibr1.pit cfeb PFEMAN: Master socket closed
> Aug 12 10:08:11  ibr1.pit cfeb CM: Routing engine CM reconnection succeeded
> after 3 tries
> Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_hold_timeout: NOTIFICATION sent to
> 208.4.47.65 (External AS 1239): code 4 (Hold Timer Expired Error), Reason:
> holdtime expired for 208.4.47.65 (External AS 1239), socket buffer sndcc: 19
> rcvcc: 1623 TCP state: 4, snd_una: 2628552212 snd_nxt: 2628552231 snd_wnd:
> 32350 rcv_nxt: 2226880899 rcv_adv: 2226895660, hold timer 0
> Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> peer 208.4.47.65 (External AS 1239) old state Established event HoldTime new
> state Idle
> Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_SCHED_SLIP: 75 sec scheduler slip,
> user: 0 sec 0 usec, system: 0 sec, 2769 usec
> Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_PPM_WRITE_ERROR: ppm_send: write
> error on pipe to ppmd (Broken pipe)
> Aug 12 10:08:11  ibr1.pit cfeb CM: ALARM CLEAR: RE chassis socket closed
> abruptly
> Aug 12 10:08:11  ibr1.pit /kernel: pfe_listener_disconnect: conn dropped:
> listener idx=0, tnpaddr=0x2, reason: socket error
> Aug 12 10:08:11  ibr1.pit craftd[3121]:  Minor alarm set, Host 0 hard-disk
> drive error
> Aug 12 10:08:11  ibr1.pit alarmd[3120]: Alarm set: RE color=YELLOW,
> class=CHASSIS, reason=Host 0 hard-disk drive error
> Aug 12 10:08:11  ibr1.pit craftd[3121]: forwarding display request to
> chassisd: type = 4, subtype = 43
> Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_recv: read from peer
> 204.16.241.252 (Internal AS 20326) failed: Connection reset by peer
> Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> peer 204.16.241.252 (Internal AS 20326) old state Established event Restart
> new state Idle
> Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_send: sending 24 bytes to
> 204.16.242.162 (Internal AS 20326) failed: Broken pipe
> Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_send: sending 24 bytes to
> 204.16.242.163 (Internal AS 20326) failed: Broken pipe
> Aug 12 10:08:15  ibr1.pit /kernel: ata0:  Failed to reset devices.
> Aug 12 10:08:15  ibr1.pit /kernel: ad1: removed from configuration due to
> failure
> Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices ..
> (after 1 seconds)
> Aug 12 10:08:15  ibr1.pit /kernel: ata0:  device dissapeared! 2 ata0:
> Finished resetting devices .. (after 1 seconds)
> Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices ..
> (after 1 seconds)
> Aug 12 10:08:15  ibr1.pit last message repeated 5 times
> Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices ..
> (after 2 seconds)
> Aug 12 10:08:15  ibr1.pit last message repeated 16 times
> Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices ..
> (after 3 seconds)
> Aug 12 10:08:15  ibr1.pit last message repeated 16 times
> Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices ..
> (after 4 seconds)
> Aug 12 10:08:15  ibr1.pit last message repeated 16 times
> Aug 12 10:08:21  ibr1.pit cfeb Frame 06: sp = 0x0113b9b8, pc = 0x000276b8
> Aug 12 10:08:22  ibr1.pit tnp.tftpd[30571]: open:
> /var/crash/core-CSBR0.core.4 Device not configured
> Aug 12 10:08:28  ibr1.pit xntpd[3122]: sendto(129.6.15.28): No route to
> host
> Aug 12 10:08:35  ibr1.pit /kernel: fxp1: link media DOWN 10Mb / half-duplex
> Aug 12 10:08:35  ibr1.pit mib2d[3125]: SNMP_TRAP_LINK_DOWN: ifIndex 2,
> ifAdminStatus up(1), ifOperStatus down(2), ifName fxp1
> Aug 12 10:08:36  ibr1.pit /kernel: fxp1: media DOWN 100Mb / full-duplex
> Aug 12 10:08:37  ibr1.pit /kernel: fxp1: link UP 100Mb / full-duplex
> Aug 12 10:08:57  ibr1.pit /kernel: fxp1: link media DOWN 10Mb / half-duplex
> Aug 12 10:08:57  ibr1.pit mib2d[3125]: SNMP_TRAP_LINK_DOWN: ifIndex 2,
> ifAdminStatus up(1), ifOperStatus down(2), ifName fxp1
> Aug 12 10:08:58  ibr1.pit /kernel: fxp1: media DOWN 100Mb / full-duplex
> Aug 12 10:08:59  ibr1.pit /kernel: fxp1: link UP 100Mb / full-duplex
> Aug 12 10:09:07  ibr1.pit rpd[3126]: task_connect: task
> BGP_20326.204.16.242.163+179 addr 204.16.242.163+179: No route to host
> Aug 12 10:09:07  ibr1.pit rpd[3126]: bgp_connect_start: connect
> 204.16.242.163 (Internal AS 20326): No route to host
> Aug 12 10:09:07  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap
> generated: FRU power on (jnxFruContentsIndex 6, jnxFruL1Index 1,
> jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName CFEB, jnxFruType 4, jnxFruSlot
> 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 434638088, jnxFruLastPowerOn
> 434638088)
> Aug 12 10:09:07  ibr1.pit craftd[30568]: Minor alarm cleared, Host 0
> hard-disk drive error
> Aug 12 10:09:07  ibr1.pit craftd[30568]: forwarding display request to
> chassisd: type = 4, subtype = 44
> Aug 12 10:09:07  ibr1.pit cfeb CM: ALARM SET: (Major) Slot 0: CFEB not
> online, the box is not forwarding
> Aug 12 10:09:07  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_DETACH_FPC:
> ifdev_detach(0)
> Aug 12 10:09:07  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_DETACH_FPC:
> ifdev_detach(1)
> Aug 12 10:09:07  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_DETACH_ALL_PSEUDO:
> ifdev_detach(pseudo devices: all)
> Aug 12 10:09:07  ibr1.pit alarmd[30569]: Alarm cleared: RE color=YELLOW,
> class=CHASSIS, reason=Host 0 hard-disk drive error
> Aug 12 10:09:07  ibr1.pit cfeb CM: ALARM SET: (Minor) Slot 0: Host 0
> hard-disk drive error
> Aug 12 10:09:08  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT:
> fpc_m40_recv_restart: restarted FPC 0
> Aug 12 10:09:08  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT:
> fpc_m40_recv_restart: restarted FPC 1
> Aug 12 10:09:11  ibr1.pit cfeb CM: ALARM CLEAR: Slot 0: Host 0 hard-disk
> drive error
> Aug 12 10:09:16  ibr1.pit craftd[30568]: Major alarm cleared, CFEB not
> online, the box is not forwarding
> Aug 12 10:09:16  ibr1.pit alarmd[30569]: Alarm cleared: CFEB color=RED,
> class=CHASSIS, reason=CFEB not online, the box is not forwarding
> Aug 12 10:09:16  ibr1.pit craftd[30568]: forwarding display request to
> chassisd: type = 4, subtype = 44
> Aug 12 10:09:17  ibr1.pit cfeb CM: ALARM CLEAR: Slot 0: CFEB not online,
> the box is not forwarding
> Aug 12 10:09:31  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT:
> scb_recv_slot_attach: attached FPC 0
> Aug 12 10:09:54  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT:
> scb_recv_slot_attach: attached FPC 1
> Aug 12 10:09:54  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap
> generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index 2,
> jnxFruL2Index 3, jnxFruL3Index 0, jnxFruName PIC: 1x Tunnel @ 1/2/*,
> jnxFruType 11, jnxFruSlot 2, jnxFruOfflineReason 2, jnxFruLastPowerOff 0,
> jnxFruLastPowerOn 0)
> Aug 12 10:09:54  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap
> generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index 2,
> jnxFruL2Index 4, jnxFruL3Index 0, jnxFruName PIC: 1x G/E, 1000 BASE @ 1/3/*,
> jnxFruType 11, jnxFruSlot 2, jnxFruOfflineReason 2, jnxFruLastPowerOff 0,
> jnxFruLastPowerOn 0)
> Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> create_pics: created interface device for pd-1/2/0
> Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> create_pics: created interface device for pe-1/2/0
> Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> create_pics: created interface device for gr-1/2/0
> Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> create_pics: created interface device for ip-1/2/0
> Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> create_pics: created interface device for vt-1/2/0
> Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> create_pics: created interface device for mt-1/2/0
> Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> create_pics: created interface device for lt-1/2/0
> Aug 12 10:09:56  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> create_pics: created interface device for ge-1/3/0
> Aug 12 10:09:57  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap
> generated: FRU power on (jnxFruContentsIndex 7, jnxFruL1Index 1,
> jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC:  @ 0/*/*, jnxFruType 3,
> jnxFruSlot 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 0, jnxFruLastPowerOn
> 434643122)
> Aug 12 10:09:57  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap
> generated: FRU power on (jnxFruContentsIndex 7, jnxFruL1Index 2,
> jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC:  @ 1/*/*, jnxFruType 3,
> jnxFruSlot 2, jnxFruOfflineReason 2, jnxFruLastPowerOff 0, jnxFruLastPowerOn
> 434643140)
> Aug 12 10:10:18  ibr1.pit rpd[3126]: RPD_SCHED_SLIP: 7 sec scheduler slip,
> user: 6 sec 626912 usec, system: 0 sec, 236833 usec
> Aug 12 10:10:57  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> peer 204.16.242.163 (Internal AS 20326) old state OpenConfirm event
> RecvKeepAlive new state Established
> Aug 12 10:11:00  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> peer 204.16.242.162 (Internal AS 20326) old state OpenConfirm event
> RecvKeepAlive new state Established
> Aug 12 10:11:05  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> peer 208.4.47.65 (External AS 1239) old state OpenConfirm event
> RecvKeepAlive new state Established
> Aug 12 10:11:11  ibr1.pit rpd[3126]: 204.16.241.252 (Internal AS 20326):
> reseting pending active connection
> Aug 12 10:11:11  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> peer 204.16.241.252 (Internal AS 20326) old state OpenConfirm event
> RecvKeepAlive new state Established
> Aug 12 10:12:59  ibr1.pit rpd[3126]: bgp_nexthop_sanity: peer
> 204.16.241.252 (Internal AS 20326) next hop 204.16.242.172 local, ignoring
> routes in this update
> Aug 12 10:12:59  ibr1.pit rpd[3126]: bgp_nexthop_sanity: peer
> 204.16.241.252 (Internal AS 20326) next hop 204.16.242.172 local, ignoring
> routes in this update
> Aug 12 10:21:58  ibr1.pit /kernel: ad1: ata_command: timeout waiting for
> intr
> Aug 12 10:21:58  ibr1.pit /kernel: ad1: error executing command b0ata0:
> resetting devices ..
> Aug 12 10:21:58  ibr1.pit /kernel: ata0: Finished resetting devices ..
> (after 0 seconds)
> Aug 12 10:21:58  ibr1.pit /kernel: ata0: WARNING: active changed while
> DKIOCMDin progress
> Aug 12 10:21:58  ibr1.pit smartd[3139]: atareadsmartthresholds: ioctl:
> Device busy
> Aug 12 10:21:58  ibr1.pit smartd[3139]: /dev/ad1a: Device smart_check, non
> zero return from atacheckdevice
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp


More information about the juniper-nsp mailing list