[j-nsp] Hardware issue with M7i

Steve Steiner ntwrkguru at gmail.com
Tue Aug 18 09:28:20 EDT 2009


To add to this (late), if the box was running on HDD and it failed, it will
reboot.  My guess is that you had a CF installed, but the box was running
from the HDD.

On Mon, Aug 17, 2009 at 9:52 AM, Shiva Shankar <shankarks at gmail.com> wrote:

> It seems that the hard-drive is faulty (meaning RE)....don't you have
> Compact Flash (ad0) installed on the RE..if not, you need replace the RE..
>
> Cheers
>
> On Wed, Aug 12, 2009 at 4:22 PM, Brendan Mannella
> <bmannella at teraswitch.com>wrote:
>
> >
> >
> > All,
> >
> >
> >
> > My juniper m7i suddently rebooted today. The logs show the following. Can
> > someone tell me what exactly failed. It appears the onboard hard disk was
> > the issue, but i just wanted to verify.
> >
> >
> >
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=18
> > e=03
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=28
> > e=03
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=09
> > e=09
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=1a
> > e=09
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=2a
> > e=09
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=0b
> > e=0b
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=1d
> > e=1d
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=2c
> > e=1d
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=0d
> > e=0d
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=1f
> > e=1f
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=00
> > e=1f
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=0f
> > e=0f
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=20
> > e=0f
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: timeout sending command=ca s=01
> > e=01
> > Aug 12 10:08:11  ibr1.pit /kernel: ad1: error executing command -
> resetting
> > Aug 12 10:08:11  ibr1.pit /kernel: ata0: resetting devices ..
> > Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_TRACE_FAILED:
> > bgp_keepalive_timeout: peer 204.16.241.252 (Internal AS 20326) last
> checked
> > 18 last recv'd 17 last sent 17 last keepalive 29RPD_TRACE_FAILED: Unable
> to
> > write to trace file /var/log/bgp
> > Aug 12 10:08:11  ibr1.pit cfeb CM: ALARM SET: (Major) RE chassis socket
> > closed abruptly
> > Aug 12 10:08:11  ibr1.pit cfeb PFEMAN: Master socket closed
> > Aug 12 10:08:11  ibr1.pit cfeb CM: Routing engine CM reconnection
> succeeded
> > after 3 tries
> > Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_hold_timeout: NOTIFICATION sent
> to
> > 208.4.47.65 (External AS 1239): code 4 (Hold Timer Expired Error),
> Reason:
> > holdtime expired for 208.4.47.65 (External AS 1239), socket buffer sndcc:
> 19
> > rcvcc: 1623 TCP state: 4, snd_una: 2628552212 snd_nxt: 2628552231
> snd_wnd:
> > 32350 rcv_nxt: 2226880899 rcv_adv: 2226895660, hold timer 0
> > Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> > peer 208.4.47.65 (External AS 1239) old state Established event HoldTime
> new
> > state Idle
> > Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_SCHED_SLIP: 75 sec scheduler
> slip,
> > user: 0 sec 0 usec, system: 0 sec, 2769 usec
> > Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_PPM_WRITE_ERROR: ppm_send: write
> > error on pipe to ppmd (Broken pipe)
> > Aug 12 10:08:11  ibr1.pit cfeb CM: ALARM CLEAR: RE chassis socket closed
> > abruptly
> > Aug 12 10:08:11  ibr1.pit /kernel: pfe_listener_disconnect: conn dropped:
> > listener idx=0, tnpaddr=0x2, reason: socket error
> > Aug 12 10:08:11  ibr1.pit craftd[3121]:  Minor alarm set, Host 0
> hard-disk
> > drive error
> > Aug 12 10:08:11  ibr1.pit alarmd[3120]: Alarm set: RE color=YELLOW,
> > class=CHASSIS, reason=Host 0 hard-disk drive error
> > Aug 12 10:08:11  ibr1.pit craftd[3121]: forwarding display request to
> > chassisd: type = 4, subtype = 43
> > Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_recv: read from peer
> > 204.16.241.252 (Internal AS 20326) failed: Connection reset by peer
> > Aug 12 10:08:11  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> > peer 204.16.241.252 (Internal AS 20326) old state Established event
> Restart
> > new state Idle
> > Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_send: sending 24 bytes to
> > 204.16.242.162 (Internal AS 20326) failed: Broken pipe
> > Aug 12 10:08:11  ibr1.pit rpd[3126]: bgp_send: sending 24 bytes to
> > 204.16.242.163 (Internal AS 20326) failed: Broken pipe
> > Aug 12 10:08:15  ibr1.pit /kernel: ata0:  Failed to reset devices.
> > Aug 12 10:08:15  ibr1.pit /kernel: ad1: removed from configuration due to
> > failure
> > Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices ..
> > (after 1 seconds)
> > Aug 12 10:08:15  ibr1.pit /kernel: ata0:  device dissapeared! 2 ata0:
> > Finished resetting devices .. (after 1 seconds)
> > Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices ..
> > (after 1 seconds)
> > Aug 12 10:08:15  ibr1.pit last message repeated 5 times
> > Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices ..
> > (after 2 seconds)
> > Aug 12 10:08:15  ibr1.pit last message repeated 16 times
> > Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices ..
> > (after 3 seconds)
> > Aug 12 10:08:15  ibr1.pit last message repeated 16 times
> > Aug 12 10:08:15  ibr1.pit /kernel: ata0: Finished resetting devices ..
> > (after 4 seconds)
> > Aug 12 10:08:15  ibr1.pit last message repeated 16 times
> > Aug 12 10:08:21  ibr1.pit cfeb Frame 06: sp = 0x0113b9b8, pc = 0x000276b8
> > Aug 12 10:08:22  ibr1.pit tnp.tftpd[30571]: open:
> > /var/crash/core-CSBR0.core.4 Device not configured
> > Aug 12 10:08:28  ibr1.pit xntpd[3122]: sendto(129.6.15.28): No route to
> > host
> > Aug 12 10:08:35  ibr1.pit /kernel: fxp1: link media DOWN 10Mb /
> half-duplex
> > Aug 12 10:08:35  ibr1.pit mib2d[3125]: SNMP_TRAP_LINK_DOWN: ifIndex 2,
> > ifAdminStatus up(1), ifOperStatus down(2), ifName fxp1
> > Aug 12 10:08:36  ibr1.pit /kernel: fxp1: media DOWN 100Mb / full-duplex
> > Aug 12 10:08:37  ibr1.pit /kernel: fxp1: link UP 100Mb / full-duplex
> > Aug 12 10:08:57  ibr1.pit /kernel: fxp1: link media DOWN 10Mb /
> half-duplex
> > Aug 12 10:08:57  ibr1.pit mib2d[3125]: SNMP_TRAP_LINK_DOWN: ifIndex 2,
> > ifAdminStatus up(1), ifOperStatus down(2), ifName fxp1
> > Aug 12 10:08:58  ibr1.pit /kernel: fxp1: media DOWN 100Mb / full-duplex
> > Aug 12 10:08:59  ibr1.pit /kernel: fxp1: link UP 100Mb / full-duplex
> > Aug 12 10:09:07  ibr1.pit rpd[3126]: task_connect: task
> > BGP_20326.204.16.242.163+179 addr 204.16.242.163+179: No route to host
> > Aug 12 10:09:07  ibr1.pit rpd[3126]: bgp_connect_start: connect
> > 204.16.242.163 (Internal AS 20326): No route to host
> > Aug 12 10:09:07  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap
> > generated: FRU power on (jnxFruContentsIndex 6, jnxFruL1Index 1,
> > jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName CFEB, jnxFruType 4,
> jnxFruSlot
> > 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 434638088, jnxFruLastPowerOn
> > 434638088)
> > Aug 12 10:09:07  ibr1.pit craftd[30568]: Minor alarm cleared, Host 0
> > hard-disk drive error
> > Aug 12 10:09:07  ibr1.pit craftd[30568]: forwarding display request to
> > chassisd: type = 4, subtype = 44
> > Aug 12 10:09:07  ibr1.pit cfeb CM: ALARM SET: (Major) Slot 0: CFEB not
> > online, the box is not forwarding
> > Aug 12 10:09:07  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_DETACH_FPC:
> > ifdev_detach(0)
> > Aug 12 10:09:07  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_DETACH_FPC:
> > ifdev_detach(1)
> > Aug 12 10:09:07  ibr1.pit chassisd[3119]:
> CHASSISD_IFDEV_DETACH_ALL_PSEUDO:
> > ifdev_detach(pseudo devices: all)
> > Aug 12 10:09:07  ibr1.pit alarmd[30569]: Alarm cleared: RE color=YELLOW,
> > class=CHASSIS, reason=Host 0 hard-disk drive error
> > Aug 12 10:09:07  ibr1.pit cfeb CM: ALARM SET: (Minor) Slot 0: Host 0
> > hard-disk drive error
> > Aug 12 10:09:08  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT:
> > fpc_m40_recv_restart: restarted FPC 0
> > Aug 12 10:09:08  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT:
> > fpc_m40_recv_restart: restarted FPC 1
> > Aug 12 10:09:11  ibr1.pit cfeb CM: ALARM CLEAR: Slot 0: Host 0 hard-disk
> > drive error
> > Aug 12 10:09:16  ibr1.pit craftd[30568]: Major alarm cleared, CFEB not
> > online, the box is not forwarding
> > Aug 12 10:09:16  ibr1.pit alarmd[30569]: Alarm cleared: CFEB color=RED,
> > class=CHASSIS, reason=CFEB not online, the box is not forwarding
> > Aug 12 10:09:16  ibr1.pit craftd[30568]: forwarding display request to
> > chassisd: type = 4, subtype = 44
> > Aug 12 10:09:17  ibr1.pit cfeb CM: ALARM CLEAR: Slot 0: CFEB not online,
> > the box is not forwarding
> > Aug 12 10:09:31  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT:
> > scb_recv_slot_attach: attached FPC 0
> > Aug 12 10:09:54  ibr1.pit chassisd[3119]: CHASSISD_FRU_EVENT:
> > scb_recv_slot_attach: attached FPC 1
> > Aug 12 10:09:54  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap
> > generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index 2,
> > jnxFruL2Index 3, jnxFruL3Index 0, jnxFruName PIC: 1x Tunnel @ 1/2/*,
> > jnxFruType 11, jnxFruSlot 2, jnxFruOfflineReason 2, jnxFruLastPowerOff 0,
> > jnxFruLastPowerOn 0)
> > Aug 12 10:09:54  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap
> > generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index 2,
> > jnxFruL2Index 4, jnxFruL3Index 0, jnxFruName PIC: 1x G/E, 1000 BASE @
> 1/3/*,
> > jnxFruType 11, jnxFruSlot 2, jnxFruOfflineReason 2, jnxFruLastPowerOff 0,
> > jnxFruLastPowerOn 0)
> > Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> > create_pics: created interface device for pd-1/2/0
> > Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> > create_pics: created interface device for pe-1/2/0
> > Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> > create_pics: created interface device for gr-1/2/0
> > Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> > create_pics: created interface device for ip-1/2/0
> > Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> > create_pics: created interface device for vt-1/2/0
> > Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> > create_pics: created interface device for mt-1/2/0
> > Aug 12 10:09:55  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> > create_pics: created interface device for lt-1/2/0
> > Aug 12 10:09:56  ibr1.pit chassisd[3119]: CHASSISD_IFDEV_CREATE_NOTICE:
> > create_pics: created interface device for ge-1/3/0
> > Aug 12 10:09:57  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap
> > generated: FRU power on (jnxFruContentsIndex 7, jnxFruL1Index 1,
> > jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC:  @ 0/*/*, jnxFruType 3,
> > jnxFruSlot 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 0,
> jnxFruLastPowerOn
> > 434643122)
> > Aug 12 10:09:57  ibr1.pit chassisd[3119]: CHASSISD_SNMP_TRAP10: SNMP trap
> > generated: FRU power on (jnxFruContentsIndex 7, jnxFruL1Index 2,
> > jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC:  @ 1/*/*, jnxFruType 3,
> > jnxFruSlot 2, jnxFruOfflineReason 2, jnxFruLastPowerOff 0,
> jnxFruLastPowerOn
> > 434643140)
> > Aug 12 10:10:18  ibr1.pit rpd[3126]: RPD_SCHED_SLIP: 7 sec scheduler
> slip,
> > user: 6 sec 626912 usec, system: 0 sec, 236833 usec
> > Aug 12 10:10:57  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> > peer 204.16.242.163 (Internal AS 20326) old state OpenConfirm event
> > RecvKeepAlive new state Established
> > Aug 12 10:11:00  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> > peer 204.16.242.162 (Internal AS 20326) old state OpenConfirm event
> > RecvKeepAlive new state Established
> > Aug 12 10:11:05  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> > peer 208.4.47.65 (External AS 1239) old state OpenConfirm event
> > RecvKeepAlive new state Established
> > Aug 12 10:11:11  ibr1.pit rpd[3126]: 204.16.241.252 (Internal AS 20326):
> > reseting pending active connection
> > Aug 12 10:11:11  ibr1.pit rpd[3126]: RPD_BGP_NEIGHBOR_UPDOWN: bgp_event:
> > peer 204.16.241.252 (Internal AS 20326) old state OpenConfirm event
> > RecvKeepAlive new state Established
> > Aug 12 10:12:59  ibr1.pit rpd[3126]: bgp_nexthop_sanity: peer
> > 204.16.241.252 (Internal AS 20326) next hop 204.16.242.172 local,
> ignoring
> > routes in this update
> > Aug 12 10:12:59  ibr1.pit rpd[3126]: bgp_nexthop_sanity: peer
> > 204.16.241.252 (Internal AS 20326) next hop 204.16.242.172 local,
> ignoring
> > routes in this update
> > Aug 12 10:21:58  ibr1.pit /kernel: ad1: ata_command: timeout waiting for
> > intr
> > Aug 12 10:21:58  ibr1.pit /kernel: ad1: error executing command b0ata0:
> > resetting devices ..
> > Aug 12 10:21:58  ibr1.pit /kernel: ata0: Finished resetting devices ..
> > (after 0 seconds)
> > Aug 12 10:21:58  ibr1.pit /kernel: ata0: WARNING: active changed while
> > DKIOCMDin progress
> > Aug 12 10:21:58  ibr1.pit smartd[3139]: atareadsmartthresholds: ioctl:
> > Device busy
> > Aug 12 10:21:58  ibr1.pit smartd[3139]: /dev/ad1a: Device smart_check,
> non
> > zero return from atacheckdevice
> > _______________________________________________
> > juniper-nsp mailing list juniper-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/juniper-nsp
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>


More information about the juniper-nsp mailing list