[j-nsp] diagnosing the kablooys
Josef Buchsteiner
josefb at juniper.net
Tue Dec 7 04:01:59 EST 2004
Richard,
I don't have a firm answer. However the nh_ucast_poll_stats
command are queuing and they trying to get the data form the
I/O Asic SRAM on the FPC. On each FPC we have four of them to
serve all 4 SFM's. So for some reason this does not work
anymore and the CRC errors are all over the place. So your
obvious guess is founded by the fact that only SFM1 is
reporting the stats timeout whereas all 4 SFM's do have to do
this to get the stats. So I would suggest that you turn off
SFM1 and see if the error goes away which would be a quick
test/confirmation.
Josef
Tuesday, December 7, 2004, 7:16:56 AM, you wrote:
RAS> Aside from the usual "call JTAC" answer, does anyone have any idea what
RAS> specifically went bad here:
RAS> Dec 6 23:54:05 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (321, Unicast) (timeout)
RAS> Dec 6 23:54:05 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (323, Unicast) (generic failure)
RAS> Dec 6 23:54:05 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (328, Unicast) (timeout)
RAS> Dec 6 23:54:05 router fpc1 DXO: Plane 1, link CRC error (0x0f)
RAS> Dec 6 23:54:05 router fpc1 DXO: Plane 3, link CRC error (0x0f)
RAS> Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (334, Unicast) (timeout)
RAS> Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (335, Unicast) (timeout)
RAS> Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (341, Unicast) (timeout)
RAS> Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (346, Unicast) (timeout)
RAS> Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (351, Unicast) (timeout)
RAS> Dec 6 23:54:06 router fpc2 GE(2/0): Kchip 0 crc/fifo errors
RAS> Dec 6 23:54:06 router fpc2 GE(2/1): Kchip 0 crc/fifo errors
RAS> Dec 6 23:54:06 router fpc2 GE(2/2): Kchip 0 crc/fifo errors
RAS> Dec 6 23:54:06 router fpc2 GE(2/3): Kchip 0 crc/fifo errors
RAS> Dec 6 23:54:06 router fpc2 DXO: Plane 1, link CRC error (0x0f)
RAS> Dec 6 23:54:06 router fpc3 DCHIP(3/1): BD link(0) CRC error
RAS> Dec 6 23:54:06 router fpc3 DCHIP(3/1): BD link(1) CRC error
RAS> Dec 6 23:54:06 router fpc3 DCHIP(3/1): BD link(2) CRC error
RAS> Dec 6 23:54:06 router fpc3 DCHIP(3/1): BD link(3) CRC error
RAS> Dec 6 23:54:06 router fpc3 DCHIP(3/2): BD link(0) CRC error
RAS> Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (363, Unicast) (timeout)
RAS> Dec 6 23:54:06 router fpc2 DXO: Plane 3, link CRC error (0x0f)
RAS> Dec 6 23:54:06 router fpc3 DCHIP(3/2): BD link(1) CRC error
RAS> Dec 6 23:54:06 router fpc3 DCHIP(3/2): BD link(2) CRC error
RAS> Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (409, Unicast) (timeout)
RAS> Dec 6 23:54:06 router fpc3 DCHIP(3/2): BD link(3) CRC error
RAS> Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (452, Unicast) (timeout)
RAS> Dec 6 23:54:06 router fpc3 DXO: Plane 1, link CRC error (0x0f)
RAS> Dec 6 23:54:06 router fpc3 DXO: Plane 3, link CRC error (0x0f)
RAS> Dec 6 23:54:06 router fpc1 DXO: Plane 3, link CRC error (0x0f)
RAS> Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (455, Unicast) (timeout)
RAS> Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (456, Unicast) (timeout)
RAS> Dec 6 23:54:06 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (462, Unicast) (timeout)
RAS> Dec 6 23:54:07 router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (464, Unicast) (timeout)
RAS> Basically followed by a long and continuous string of:
RAS> Dec 7 00:00:13 router fpc2 DXI: PIC 1, link CRC error (0x01)
RAS> Dec 7 00:00:13 router fpc3 DCHIP(3/2): BD link(1) CRC error
RAS> Dec 7 00:00:13 router fpc2 DXI: PIC 2, link CRC error (0x01)
RAS> Dec 7 00:00:13 router fpc3 DCHIP(3/2): BD link(2) CRC error
RAS> Dec 7 00:00:13 router fpc2 DXI: PIC 3, link CRC error (0x01)
RAS> Dec 7 00:00:13 router fpc3 DCHIP(3/2): BD link(3) CRC error
RAS> Dec 7 00:00:13 router fpc3 DXI: PIC 1, link CRC error (0x0f)
RAS> Dec 7 00:00:13 router fpc3 DXI: PIC 2, link CRC error (0x0f)
RAS> Dec 7 00:00:14 router fpc2 GE(2/0): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:14 router fpc2 GE(2/1): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:14 router fpc2 GE(2/2): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:14 router fpc2 GE(2/3): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:14 router fpc2 DXI: PIC 0, link CRC error (0x01)
RAS> Dec 7 00:00:14 router fpc3 DCHIP(3/1): BD link(0) CRC error
RAS> Dec 7 00:00:14 router fpc3 DCHIP(3/1): BD link(1) CRC error
RAS> Dec 7 00:00:14 router fpc3 DCHIP(3/1): BD link(2) CRC error
RAS> Dec 7 00:00:14 router fpc3 DCHIP(3/1): BD link(3) CRC error
RAS> Dec 7 00:00:14 router fpc3 DCHIP(3/2): BD link(0) CRC error
RAS> Dec 7 00:00:14 router fpc2 DXI: PIC 1, link CRC error (0x01)
RAS> Dec 7 00:00:14 router fpc3 DCHIP(3/2): BD link(1) CRC error
RAS> Dec 7 00:00:14 router fpc2 DXI: PIC 2, link CRC error (0x01)
RAS> Dec 7 00:00:14 router fpc3 DCHIP(3/2): BD link(2) CRC error
RAS> Dec 7 00:00:14 router fpc2 DXI: PIC 3, link CRC error (0x01)
RAS> Dec 7 00:00:14 router fpc3 DCHIP(3/2): BD link(3) CRC error
RAS> Dec 7 00:00:14 router fpc3 DXI: PIC 1, link CRC error (0x0f)
RAS> Dec 7 00:00:14 router fpc3 DXI: PIC 2, link CRC error (0x0f)
RAS> Dec 7 00:00:15 router fpc2 GE(2/0): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:15 router fpc2 GE(2/1): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:15 router fpc2 GE(2/2): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:15 router fpc2 GE(2/3): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:15 router fpc2 DXI: PIC 0, link CRC error (0x01)
RAS> Dec 7 00:00:15 router fpc3 DCHIP(3/1): BD link(0) CRC error
RAS> Dec 7 00:00:15 router fpc3 DCHIP(3/1): BD link(1) CRC error
RAS> Dec 7 00:00:15 router fpc3 DCHIP(3/1): BD link(2) CRC error
RAS> Dec 7 00:00:15 router fpc3 DCHIP(3/1): BD link(3) CRC error
RAS> Dec 7 00:00:15 router fpc3 DCHIP(3/2): BD link(0) CRC error
RAS> Dec 7 00:00:15 router fpc2 DXI: PIC 1, link CRC error (0x01)
RAS> Dec 7 00:00:15 router fpc3 DCHIP(3/2): BD link(1) CRC error
RAS> Dec 7 00:00:15 router fpc2 DXI: PIC 2, link CRC error (0x01)
RAS> Dec 7 00:00:15 router fpc3 DCHIP(3/2): BD link(2) CRC error
RAS> Dec 7 00:00:15 router fpc2 DXI: PIC 3, link CRC error (0x01)
RAS> Dec 7 00:00:15 router fpc3 DCHIP(3/2): BD link(3) CRC error
RAS> Dec 7 00:00:15 router fpc3 DXI: PIC 1, link CRC error (0x0f)
RAS> Dec 7 00:00:15 router fpc3 DXI: PIC 2, link CRC error (0x0f)
RAS> Dec 7 00:00:16 router fpc2 GE(2/0): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:16 router fpc2 GE(2/1): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:16 router fpc2 GE(2/2): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:16 router fpc2 GE(2/3): Kchip 0 crc/fifo errors
RAS> Dec 7 00:00:16 router fpc2 DXI: PIC 0, link CRC error (0x01)
RAS> until power cycle. Physical interfaces/link stayed up, but router failed
RAS> to respond externally and didn't return keepalives. I'm going to go with
RAS> the obvious and guess that sfm1 went wonky and started corrupting data
RAS> coming from the FPCs, but its always nice to get a second opinion from
RAS> Juniper folks whenever the "guess the cause of the failure of the ASIC
RAS> named after a letter" game starts. :)
RAS> --
RAS> Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
RAS> GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
RAS> _______________________________________________
RAS> juniper-nsp mailing list juniper-nsp at puck.nether.net
RAS> http://puck.nether.net/mailman/listinfo/juniper-nsp
More information about the juniper-nsp
mailing list