[j-nsp] diagnosing the kablooys
Richard A Steenbergen
ras at e-gerbil.net
Tue Dec 7 01:16:56 EST 2004
Aside from the usual "call JTAC" answer, does anyone have any idea what
specifically went bad here:
Dec 6 23:54:05 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (321, Unicast) (timeout)
Dec 6 23:54:05 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (323, Unicast) (generic failure)
Dec 6 23:54:05 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (328, Unicast) (timeout)
Dec 6 23:54:05 router fpc1 DXO: Plane 1, link CRC error (0x0f)
Dec 6 23:54:05 router fpc1 DXO: Plane 3, link CRC error (0x0f)
Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (334, Unicast) (timeout)
Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (335, Unicast) (timeout)
Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (341, Unicast) (timeout)
Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (346, Unicast) (timeout)
Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (351, Unicast) (timeout)
Dec 6 23:54:06 router fpc2 GE(2/0): Kchip 0 crc/fifo errors
Dec 6 23:54:06 router fpc2 GE(2/1): Kchip 0 crc/fifo errors
Dec 6 23:54:06 router fpc2 GE(2/2): Kchip 0 crc/fifo errors
Dec 6 23:54:06 router fpc2 GE(2/3): Kchip 0 crc/fifo errors
Dec 6 23:54:06 router fpc2 DXO: Plane 1, link CRC error (0x0f)
Dec 6 23:54:06 router fpc3 DCHIP(3/1): BD link(0) CRC error
Dec 6 23:54:06 router fpc3 DCHIP(3/1): BD link(1) CRC error
Dec 6 23:54:06 router fpc3 DCHIP(3/1): BD link(2) CRC error
Dec 6 23:54:06 router fpc3 DCHIP(3/1): BD link(3) CRC error
Dec 6 23:54:06 router fpc3 DCHIP(3/2): BD link(0) CRC error
Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (363, Unicast) (timeout)
Dec 6 23:54:06 router fpc2 DXO: Plane 3, link CRC error (0x0f)
Dec 6 23:54:06 router fpc3 DCHIP(3/2): BD link(1) CRC error
Dec 6 23:54:06 router fpc3 DCHIP(3/2): BD link(2) CRC error
Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (409, Unicast) (timeout)
Dec 6 23:54:06 router fpc3 DCHIP(3/2): BD link(3) CRC error
Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (452, Unicast) (timeout)
Dec 6 23:54:06 router fpc3 DXO: Plane 1, link CRC error (0x0f)
Dec 6 23:54:06 router fpc3 DXO: Plane 3, link CRC error (0x0f)
Dec 6 23:54:06 router fpc1 DXO: Plane 3, link CRC error (0x0f)
Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (455, Unicast) (timeout)
Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (456, Unicast) (timeout)
Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (462, Unicast) (timeout)
Dec 6 23:54:07 router sfm1 NH(nh_ucast_poll_stats): Failed to fetch stats for nh (464, Unicast) (timeout)
Basically followed by a long and continuous string of:
Dec 7 00:00:13 router fpc2 DXI: PIC 1, link CRC error (0x01)
Dec 7 00:00:13 router fpc3 DCHIP(3/2): BD link(1) CRC error
Dec 7 00:00:13 router fpc2 DXI: PIC 2, link CRC error (0x01)
Dec 7 00:00:13 router fpc3 DCHIP(3/2): BD link(2) CRC error
Dec 7 00:00:13 router fpc2 DXI: PIC 3, link CRC error (0x01)
Dec 7 00:00:13 router fpc3 DCHIP(3/2): BD link(3) CRC error
Dec 7 00:00:13 router fpc3 DXI: PIC 1, link CRC error (0x0f)
Dec 7 00:00:13 router fpc3 DXI: PIC 2, link CRC error (0x0f)
Dec 7 00:00:14 router fpc2 GE(2/0): Kchip 0 crc/fifo errors
Dec 7 00:00:14 router fpc2 GE(2/1): Kchip 0 crc/fifo errors
Dec 7 00:00:14 router fpc2 GE(2/2): Kchip 0 crc/fifo errors
Dec 7 00:00:14 router fpc2 GE(2/3): Kchip 0 crc/fifo errors
Dec 7 00:00:14 router fpc2 DXI: PIC 0, link CRC error (0x01)
Dec 7 00:00:14 router fpc3 DCHIP(3/1): BD link(0) CRC error
Dec 7 00:00:14 router fpc3 DCHIP(3/1): BD link(1) CRC error
Dec 7 00:00:14 router fpc3 DCHIP(3/1): BD link(2) CRC error
Dec 7 00:00:14 router fpc3 DCHIP(3/1): BD link(3) CRC error
Dec 7 00:00:14 router fpc3 DCHIP(3/2): BD link(0) CRC error
Dec 7 00:00:14 router fpc2 DXI: PIC 1, link CRC error (0x01)
Dec 7 00:00:14 router fpc3 DCHIP(3/2): BD link(1) CRC error
Dec 7 00:00:14 router fpc2 DXI: PIC 2, link CRC error (0x01)
Dec 7 00:00:14 router fpc3 DCHIP(3/2): BD link(2) CRC error
Dec 7 00:00:14 router fpc2 DXI: PIC 3, link CRC error (0x01)
Dec 7 00:00:14 router fpc3 DCHIP(3/2): BD link(3) CRC error
Dec 7 00:00:14 router fpc3 DXI: PIC 1, link CRC error (0x0f)
Dec 7 00:00:14 router fpc3 DXI: PIC 2, link CRC error (0x0f)
Dec 7 00:00:15 router fpc2 GE(2/0): Kchip 0 crc/fifo errors
Dec 7 00:00:15 router fpc2 GE(2/1): Kchip 0 crc/fifo errors
Dec 7 00:00:15 router fpc2 GE(2/2): Kchip 0 crc/fifo errors
Dec 7 00:00:15 router fpc2 GE(2/3): Kchip 0 crc/fifo errors
Dec 7 00:00:15 router fpc2 DXI: PIC 0, link CRC error (0x01)
Dec 7 00:00:15 router fpc3 DCHIP(3/1): BD link(0) CRC error
Dec 7 00:00:15 router fpc3 DCHIP(3/1): BD link(1) CRC error
Dec 7 00:00:15 router fpc3 DCHIP(3/1): BD link(2) CRC error
Dec 7 00:00:15 router fpc3 DCHIP(3/1): BD link(3) CRC error
Dec 7 00:00:15 router fpc3 DCHIP(3/2): BD link(0) CRC error
Dec 7 00:00:15 router fpc2 DXI: PIC 1, link CRC error (0x01)
Dec 7 00:00:15 router fpc3 DCHIP(3/2): BD link(1) CRC error
Dec 7 00:00:15 router fpc2 DXI: PIC 2, link CRC error (0x01)
Dec 7 00:00:15 router fpc3 DCHIP(3/2): BD link(2) CRC error
Dec 7 00:00:15 router fpc2 DXI: PIC 3, link CRC error (0x01)
Dec 7 00:00:15 router fpc3 DCHIP(3/2): BD link(3) CRC error
Dec 7 00:00:15 router fpc3 DXI: PIC 1, link CRC error (0x0f)
Dec 7 00:00:15 router fpc3 DXI: PIC 2, link CRC error (0x0f)
Dec 7 00:00:16 router fpc2 GE(2/0): Kchip 0 crc/fifo errors
Dec 7 00:00:16 router fpc2 GE(2/1): Kchip 0 crc/fifo errors
Dec 7 00:00:16 router fpc2 GE(2/2): Kchip 0 crc/fifo errors
Dec 7 00:00:16 router fpc2 GE(2/3): Kchip 0 crc/fifo errors
Dec 7 00:00:16 router fpc2 DXI: PIC 0, link CRC error (0x01)
until power cycle. Physical interfaces/link stayed up, but router failed
to respond externally and didn't return keepalives. I'm going to go with
the obvious and guess that sfm1 went wonky and started corrupting data
coming from the FPCs, but its always nice to get a second opinion from
Juniper folks whenever the "guess the cause of the failure of the ASIC
named after a letter" game starts. :)
--
Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
More information about the juniper-nsp
mailing list