[j-nsp] Flapping GE Module causing FEB to reboot

Joerg Staedele / Trusted Network js at tnib.de
Mon Apr 3 14:39:14 EDT 2006


Hi there,

half an hour ago, our Peering Interface (GigE) went down due to some hardware problems at the IXP.

The interface went down, all (190) BGP Sessions were dropped and somehow the FEB rebooted at the same time.

This (GigE down at IXP) happend twice within 20 Minutes.

It's a M5 with 7.4R2.6

Here's a part from the logfile

---
Apr  3 20:08:09  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 42, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/0
Apr  3 20:10:00  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 42, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/0
Apr  3 20:10:01  cr0.ffm.de feb CMFPC: PIC 0/0 PIO Write error, address 0x0006
Apr  3 20:10:01  cr0.ffm.de feb mpc106 machine check caused by error on the PCI Bus
Apr  3 20:10:01  cr0.ffm.de feb mpc106 PCI status register: 0x6080, error detect register 1: 0x08, 2: 0x00
Apr  3 20:10:01  cr0.ffm.de feb mpc106 error ack count = 2
Apr  3 20:10:01  cr0.ffm.de feb mpc106 error address: 0x05068804
Apr  3 20:10:01  cr0.ffm.de feb mpc106 PCI bus error status register: 0x03
Apr  3 20:10:01  cr0.ffm.de feb mpc106 was the PCI master
Apr  3 20:10:01  cr0.ffm.de feb C/BE bits: I/O write [0b0011]
Apr  3 20:10:01  cr0.ffm.de feb mpc106 error detection reg1: PCI cycle
Apr  3 20:10:01  cr0.ffm.de feb mpc106 PCI status reg: received master abort signaled system error
Apr  3 20:10:01  cr0.ffm.de feb ^B
Apr  3 20:10:01  cr0.ffm.de feb last message repeated 7 times
Apr  3 20:10:01  cr0.ffm.de feb Registers:
Apr  3 20:10:01  cr0.ffm.de feb R00: 0x00003030 R01: 0x00ff45d0 R02: 0x00003344 R03: 0x00000001
Apr  3 20:10:01  cr0.ffm.de feb R04: 0x00008000 R05: 0x005fd844 R06: 0x00000001 R07: 0x850400d4
Apr  3 20:10:01  cr0.ffm.de feb R08: 0x00000000 R09: 0x0101f7f4 R10: 0x00000000 R11: 0x00000000
Apr  3 20:10:01  cr0.ffm.de feb R12: 0x24282042 R13: 0xffffffff R14: 0x00580000 R15: 0x00580000
Apr  3 20:10:01  cr0.ffm.de feb R16: 0x00580000 R17: 0x006804c8 R18: 0x00580000 R19: 0x00580000
Apr  3 20:10:01  cr0.ffm.de feb R20: 0x00440000 R21: 0x00600000 R22: 0x00600000 R23: 0x00000000
Apr  3 20:10:01  cr0.ffm.de feb R24: 0x00000000 R25: 0x00000000 R26: 0x00000000 R27: 0x00000041
Apr  3 20:10:01  cr0.ffm.de feb R28: 0x00000001 R29: 0x85068804 R30: 0x00000001 R31: 0x85068804
Apr  3 20:10:01  cr0.ffm.de feb MSR: 0x00083030 CTR: 0x0014bc44 Link:0x00148fa8 SP:  0x00ff45d0
Apr  3 20:10:01  cr0.ffm.de feb CCR: 0x24284048 XER: 0x00000000 PC:  0x00148fac
Apr  3 20:10:01  cr0.ffm.de feb DSISR: 0x00000000 DAR: 0x00000000 K_MSR: 0x00000030
Apr  3 20:10:01  cr0.ffm.de feb Stack Traceback:
Apr  3 20:10:01  cr0.ffm.de feb Frame 01: sp = 0x00ff45d0, pc = 0x007d621c
Apr  3 20:10:01  cr0.ffm.de feb Frame 02: sp = 0x00ff45f8, pc = 0x001496a8
Apr  3 20:10:01  cr0.ffm.de feb Frame 03: sp = 0x00ff4620, pc = 0x00145ab4
Apr  3 20:10:01  cr0.ffm.de feb Frame 04: sp = 0x00ff4628, pc = 0x0014bedc
Apr  3 20:10:01  cr0.ffm.de feb Frame 05: sp = 0x00ff4658, pc = 0x002efff0
Apr  3 20:10:01  cr0.ffm.de feb Frame 06: sp = 0x00ff4678, pc = 0x002ee988
Apr  3 20:10:01  cr0.ffm.de feb Frame 07: sp = 0x00ff4690, pc = 0x002c81b8
Apr  3 20:10:01  cr0.ffm.de feb Frame 08: sp = 0x00ff4698, pc = 0x002dd834
Apr  3 20:10:01  cr0.ffm.de feb Frame 09: sp = 0x00ff46b0, pc = 0x002dff40
Apr  3 20:10:01  cr0.ffm.de feb Frame 10: sp = 0x00ff46d0, pc = 0x002dd028
Apr  3 20:10:01  cr0.ffm.de feb Frame 11: sp = 0x00ff46e8, pc = 0x002dd200
Apr  3 20:10:01  cr0.ffm.de feb Frame 12: sp = 0x00ff4708, pc = 0x002c7344
Apr  3 20:10:01  cr0.ffm.de feb Frame 13: sp = 0x00ff4740, pc = 0x002c7934
Apr  3 20:10:01  cr0.ffm.de feb Frame 14: sp = 0x00ff4768, pc = 0x002c8b94
Apr  3 20:10:01  cr0.ffm.de feb Frame 15: sp = 0x00ff4848, pc = 0x00026e24
Apr  3 20:10:06  cr0.ffm.de /kernel: rdp keepalive expired, connection dropped - src 1:1020 dest 2:49153
Apr  3 20:10:06  cr0.ffm.de syslogd: sendto: Network is down
Apr  3 20:10:06  cr0.ffm.de /kernel: rdp keepalive expired, connection dropped - src 1:1021 dest 2:49152
Apr  3 20:10:06  cr0.ffm.de /kernel: pfe_listener_disconnect: conn dropped: listener idx=0, tnpaddr=0x2, reason: socket error
Apr  3 20:10:06  cr0.ffm.de chassisd[2682]: CHASSISD_SHUTDOWN_NOTICE: Shutdown reason: FEB connection lost
Apr  3 20:10:06  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_DETACH_FPC: ifdev_detach(0)
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 42, ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/0
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 30, ifAdminStatus up(1), ifOperStatus down(2), ifName fe-0/1/0
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 31, ifAdminStatus up(1), ifOperStatus down(2), ifName fe-0/1/1
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 33, ifAdminStatus up(1), ifOperStatus down(2), ifName fe-0/1/3
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 35, ifAdminStatus up(1), ifOperStatus down(2), ifName so-0/2/1
Apr  3 20:10:06  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_DETACH_ALL_PSEUDO: ifdev_detach(pseudo devices: all)
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 45, ifAdminStatus down(2), ifOperStatus down(2), ifName fe-0/1/0.2
Apr  3 20:10:06  cr0.ffm.de alarmd[2683]: shutting down chassisd connection: chassisd ipc pipe read error
Apr  3 20:10:06  cr0.ffm.de craftd[2684]: craftd_user_conn_shutdown: socket 5, errno = 0
Apr  3 20:10:06  cr0.ffm.de alarmd[2683]: chassisd connection succeeded after 0 retries
Apr  3 20:10:06  cr0.ffm.de alarmd[2683]: resending alarm state
Apr  3 20:10:06  cr0.ffm.de craftd[2684]: chassisd connection succeeded after 0 retries
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 47, ifAdminStatus down(2), ifOperStatus down(2), ifName fe-0/1/0.3
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 44, ifAdminStatus down(2), ifOperStatus down(2), ifName fe-0/1/0.100
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 48, ifAdminStatus down(2), ifOperStatus down(2), ifName fe-0/1/1.23
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 55, ifAdminStatus down(2), ifOperStatus down(2), ifName fe-0/1/1.220
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 56, ifAdminStatus down(2), ifOperStatus down(2), ifName fe-0/1/1.221
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 57, ifAdminStatus down(2), ifOperStatus down(2), ifName fe-0/1/1.550
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 40, ifAdminStatus down(2), ifOperStatus down(2), ifName fe-0/1/3.0
Apr  3 20:10:06  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 46, ifAdminStatus down(2), ifOperStatus down(2), ifName so-0/2/1.0
Apr  3 20:10:18  cr0.ffm.de /kernel: fxp1: link media DOWN 10Mb / half-duplex
Apr  3 20:10:18  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 2, ifAdminStatus up(1), ifOperStatus down(2), ifName fxp1
Apr  3 20:10:20  cr0.ffm.de /kernel: fxp1: media DOWN 100Mb / full-duplex
Apr  3 20:10:20  cr0.ffm.de /kernel: fxp1: link UP 100Mb / full-duplex
Apr  3 20:10:46  cr0.ffm.de /kernel: fxp1: link media DOWN 10Mb / half-duplex
Apr  3 20:10:46  cr0.ffm.de mib2d[2688]: SNMP_TRAP_LINK_DOWN: ifIndex 2, ifAdminStatus up(1), ifOperStatus down(2), ifName fxp1
Apr  3 20:10:47  cr0.ffm.de /kernel: fxp1: media DOWN 100Mb / full-duplex
Apr  3 20:10:48  cr0.ffm.de /kernel: fxp1: link UP 100Mb / full-duplex
Apr  3 20:10:49  cr0.ffm.de feb SBR: Booted from flash partition 1
Apr  3 20:10:55  cr0.ffm.de chassisd[2682]: CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on (jnxFruContentsIndex 6, jnxFruL1Index 1, 
jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FEB, jnxFruType 5, jnxFruSlot 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 591941907, jnxFruLastPowerOn 
591941907)
Apr  3 20:10:55  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_DETACH_ALL_PSEUDO: ifdev_detach(pseudo devices: all)
Apr  3 20:10:55  cr0.ffm.de craftd[2684]: attempt to delete alarm not in list
Apr  3 20:10:55  cr0.ffm.de craftd[2684]: forwarding display request to chassisd: type = 4, subtype = 44
Apr  3 20:10:55  cr0.ffm.de alarmd[2683]: Alarm cleared: RE color=IGNORE, class=CHASSIS, reason=Host 0 fxp0: Ethernet Link Down
Apr  3 20:10:57  cr0.ffm.de chassisd[2682]: CHASSISD_FRU_EVENT: fpc_m40_recv_restart: restarted FPC 0
Apr  3 20:11:03  cr0.ffm.de chassisd[2682]: CHASSISD_FRU_EVENT: scb_recv_slot_attach: attached FPC 0
Apr  3 20:11:05  cr0.ffm.de chassisd[2682]: CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index 1, 
jnxFruL2Index 1, jnxFruL3Index 0, jnxFruName PIC: 1x G/E, 1000 BASE-LH @ 0/0/*, jnxFruType 11, jnxFruSlot 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 
0, jnxFruLastPowerOn 0)
Apr  3 20:11:05  cr0.ffm.de chassisd[2682]: CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index 1, 
jnxFruL2Index 2, jnxFruL3Index 0, jnxFruName PIC: 4x F/E, 100 BASE-TX @ 0/1/*, jnxFruType 11, jnxFruSlot 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 
0, jnxFruLastPowerOn 0)
Apr  3 20:11:05  cr0.ffm.de chassisd[2682]: CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on (jnxFruContentsIndex 8, jnxFruL1Index 1, 
jnxFruL2Index 3, jnxFruL3Index 0, jnxFruName PIC: 4x STM-1 SDH, SMIR @ 0/2/*, jnxFruType 11, jnxFruSlot 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 0, 
jnxFruLastPowerOn 0)
Apr  3 20:11:06  cr0.ffm.de chassisd[2682]: CHASSISD_SNMP_TRAP10: SNMP trap generated: FRU power on (jnxFruContentsIndex 7, jnxFruL1Index 1, 
jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC:  @ 0/*/*, jnxFruType 3, jnxFruSlot 1, jnxFruOfflineReason 2, jnxFruLastPowerOff 0, jnxFruLastPowerOn 
591942957)
Apr  3 20:11:07  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for ge-0/0/0
Apr  3 20:11:07  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for fe-0/1/0
Apr  3 20:11:07  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for fe-0/1/1
Apr  3 20:11:07  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for fe-0/1/2
Apr  3 20:11:07  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for fe-0/1/3
Apr  3 20:11:07  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for so-0/2/0
Apr  3 20:11:07  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for so-0/2/1
Apr  3 20:11:07  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for so-0/2/2
Apr  3 20:11:07  cr0.ffm.de chassisd[2682]: CHASSISD_IFDEV_CREATE_NOTICE: create_pics: created interface device for so-0/2/3
Apr  3 20:11:10  cr0.ffm.de /kernel: so-0/2/0: Asserting SDH alarm(s) LOL LOS
Apr  3 20:11:10  cr0.ffm.de /kernel: so-0/2/2: Asserting SDH alarm(s) LOL LOS
Apr  3 20:11:10  cr0.ffm.de /kernel: so-0/2/3: Asserting SDH alarm(s) LOL LOS


Maybe someone has an idea why the interface flap is causing the FEB to crash/reboot.

--
Mit freundlichen Gruessen,

 Joerg Staedele
 
:  Trusted Network (TNIB) :  fon +49-89-37006640  :    www.tnib.de    :
:    Max-Planck-Str. 1    :  fax +49-89-37006643  :   info at tnib.de    :
: 85716 Unterschleissheim : sip info at voip.tnib.de : AS21385 / AS-TNIB :

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail oder des Inhaltes ist nicht gestattet.

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.



More information about the juniper-nsp mailing list