[j-nsp] diagnosing the kablooys

Josef Buchsteiner josefb at juniper.net
Tue Dec 7 04:01:59 EST 2004


Richard,

        I  don't  have  a firm answer. However the nh_ucast_poll_stats
        command  are  queuing and they trying to get the data form the
        I/O  Asic SRAM on the FPC. On each FPC we have four of them to
        serve  all  4  SFM's.  So  for  some reason this does not work
        anymore  and  the  CRC  errors are all over the place. So your
        obvious  guess  is  founded  by  the  fact  that  only SFM1 is
        reporting  the stats timeout whereas all 4 SFM's do have to do
        this  to  get  the stats. So I would suggest that you turn off
        SFM1  and  see  if  the error goes away which would be a quick
        test/confirmation.


        Josef

Tuesday, December 7, 2004, 7:16:56 AM, you wrote:

  
RAS> Aside from the usual "call JTAC" answer, does anyone have any idea what
RAS>  specifically went bad here:

RAS>  Dec  6 23:54:05  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (321, Unicast) (timeout)
RAS>  Dec  6 23:54:05  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (323, Unicast) (generic failure)
RAS>  Dec  6 23:54:05  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (328, Unicast) (timeout)
RAS>  Dec  6 23:54:05  router fpc1 DXO: Plane 1, link CRC error (0x0f)
RAS>  Dec  6 23:54:05  router fpc1 DXO: Plane 3, link CRC error (0x0f)
RAS>  Dec  6 23:54:06  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (334, Unicast) (timeout)
RAS>  Dec  6 23:54:06  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (335, Unicast) (timeout)
RAS>  Dec  6 23:54:06  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (341, Unicast) (timeout)
RAS>  Dec  6 23:54:06  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (346, Unicast) (timeout)
RAS>  Dec  6 23:54:06  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (351, Unicast) (timeout)
RAS>  Dec  6 23:54:06  router fpc2 GE(2/0): Kchip 0 crc/fifo errors
RAS>  Dec  6 23:54:06  router fpc2 GE(2/1): Kchip 0 crc/fifo errors
RAS>  Dec  6 23:54:06  router fpc2 GE(2/2): Kchip 0 crc/fifo errors
RAS>  Dec  6 23:54:06  router fpc2 GE(2/3): Kchip 0 crc/fifo errors
RAS>  Dec  6 23:54:06  router fpc2 DXO: Plane 1, link CRC error (0x0f)
RAS>  Dec  6 23:54:06  router fpc3 DCHIP(3/1): BD link(0) CRC error
RAS>  Dec  6 23:54:06  router fpc3 DCHIP(3/1): BD link(1) CRC error
RAS>  Dec  6 23:54:06  router fpc3 DCHIP(3/1): BD link(2) CRC error
RAS>  Dec  6 23:54:06  router fpc3 DCHIP(3/1): BD link(3) CRC error
RAS>  Dec  6 23:54:06  router fpc3 DCHIP(3/2): BD link(0) CRC error
RAS>  Dec  6 23:54:06  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (363, Unicast) (timeout)
RAS>  Dec  6 23:54:06  router fpc2 DXO: Plane 3, link CRC error (0x0f)
RAS>  Dec  6 23:54:06  router fpc3 DCHIP(3/2): BD link(1) CRC error
RAS>  Dec  6 23:54:06  router fpc3 DCHIP(3/2): BD link(2) CRC error
RAS>  Dec  6 23:54:06  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (409, Unicast) (timeout)
RAS>  Dec  6 23:54:06  router fpc3 DCHIP(3/2): BD link(3) CRC error
RAS>  Dec  6 23:54:06  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (452, Unicast) (timeout)
RAS>  Dec  6 23:54:06  router fpc3 DXO: Plane 1, link CRC error (0x0f)
RAS>  Dec  6 23:54:06  router fpc3 DXO: Plane 3, link CRC error (0x0f)
RAS>  Dec  6 23:54:06  router fpc1 DXO: Plane 3, link CRC error (0x0f)
RAS>  Dec  6 23:54:06  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (455, Unicast) (timeout)
RAS>  Dec  6 23:54:06  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (456, Unicast) (timeout)
RAS>  Dec  6 23:54:06  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (462, Unicast) (timeout)
RAS>  Dec  6 23:54:07  router sfm1 NH(nh_ucast_poll_stats):
RAS> Failed to fetch stats for nh (464, Unicast) (timeout)


RAS>  Basically followed by a long and continuous string of:

RAS>  Dec  7 00:00:13  router fpc2 DXI: PIC 1, link CRC error (0x01)
RAS>  Dec  7 00:00:13  router fpc3 DCHIP(3/2): BD link(1) CRC error
RAS>  Dec  7 00:00:13  router fpc2 DXI: PIC 2, link CRC error (0x01)
RAS>  Dec  7 00:00:13  router fpc3 DCHIP(3/2): BD link(2) CRC error
RAS>  Dec  7 00:00:13  router fpc2 DXI: PIC 3, link CRC error (0x01)
RAS>  Dec  7 00:00:13  router fpc3 DCHIP(3/2): BD link(3) CRC error
RAS>  Dec  7 00:00:13  router fpc3 DXI: PIC 1, link CRC error (0x0f)
RAS>  Dec  7 00:00:13  router fpc3 DXI: PIC 2, link CRC error (0x0f)
RAS>  Dec  7 00:00:14  router fpc2 GE(2/0): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:14  router fpc2 GE(2/1): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:14  router fpc2 GE(2/2): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:14  router fpc2 GE(2/3): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:14  router fpc2 DXI: PIC 0, link CRC error (0x01)
RAS>  Dec  7 00:00:14  router fpc3 DCHIP(3/1): BD link(0) CRC error
RAS>  Dec  7 00:00:14  router fpc3 DCHIP(3/1): BD link(1) CRC error
RAS>  Dec  7 00:00:14  router fpc3 DCHIP(3/1): BD link(2) CRC error
RAS>  Dec  7 00:00:14  router fpc3 DCHIP(3/1): BD link(3) CRC error
RAS>  Dec  7 00:00:14  router fpc3 DCHIP(3/2): BD link(0) CRC error
RAS>  Dec  7 00:00:14  router fpc2 DXI: PIC 1, link CRC error (0x01)
RAS>  Dec  7 00:00:14  router fpc3 DCHIP(3/2): BD link(1) CRC error
RAS>  Dec  7 00:00:14  router fpc2 DXI: PIC 2, link CRC error (0x01)
RAS>  Dec  7 00:00:14  router fpc3 DCHIP(3/2): BD link(2) CRC error
RAS>  Dec  7 00:00:14  router fpc2 DXI: PIC 3, link CRC error (0x01)
RAS>  Dec  7 00:00:14  router fpc3 DCHIP(3/2): BD link(3) CRC error
RAS>  Dec  7 00:00:14  router fpc3 DXI: PIC 1, link CRC error (0x0f)
RAS>  Dec  7 00:00:14  router fpc3 DXI: PIC 2, link CRC error (0x0f)
RAS>  Dec  7 00:00:15  router fpc2 GE(2/0): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:15  router fpc2 GE(2/1): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:15  router fpc2 GE(2/2): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:15  router fpc2 GE(2/3): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:15  router fpc2 DXI: PIC 0, link CRC error (0x01)
RAS>  Dec  7 00:00:15  router fpc3 DCHIP(3/1): BD link(0) CRC error
RAS>  Dec  7 00:00:15  router fpc3 DCHIP(3/1): BD link(1) CRC error
RAS>  Dec  7 00:00:15  router fpc3 DCHIP(3/1): BD link(2) CRC error
RAS>  Dec  7 00:00:15  router fpc3 DCHIP(3/1): BD link(3) CRC error
RAS>  Dec  7 00:00:15  router fpc3 DCHIP(3/2): BD link(0) CRC error
RAS>  Dec  7 00:00:15  router fpc2 DXI: PIC 1, link CRC error (0x01)
RAS>  Dec  7 00:00:15  router fpc3 DCHIP(3/2): BD link(1) CRC error
RAS>  Dec  7 00:00:15  router fpc2 DXI: PIC 2, link CRC error (0x01)
RAS>  Dec  7 00:00:15  router fpc3 DCHIP(3/2): BD link(2) CRC error
RAS>  Dec  7 00:00:15  router fpc2 DXI: PIC 3, link CRC error (0x01)
RAS>  Dec  7 00:00:15  router fpc3 DCHIP(3/2): BD link(3) CRC error
RAS>  Dec  7 00:00:15  router fpc3 DXI: PIC 1, link CRC error (0x0f)
RAS>  Dec  7 00:00:15  router fpc3 DXI: PIC 2, link CRC error (0x0f)
RAS>  Dec  7 00:00:16  router fpc2 GE(2/0): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:16  router fpc2 GE(2/1): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:16  router fpc2 GE(2/2): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:16  router fpc2 GE(2/3): Kchip 0 crc/fifo errors
RAS>  Dec  7 00:00:16  router fpc2 DXI: PIC 0, link CRC error (0x01)

RAS>  until power cycle. Physical interfaces/link stayed up, but router failed
RAS>  to respond externally and didn't return keepalives. I'm going to go with
RAS>  the obvious and guess that sfm1 went wonky and started corrupting data
RAS>  coming from the FPCs, but its always nice to get a second opinion from
RAS>  Juniper folks whenever the "guess the cause of the failure of the ASIC
RAS>  named after a letter" game starts. :)

RAS>  --
RAS>  Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
RAS>  GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
RAS>  _______________________________________________
RAS>  juniper-nsp mailing list juniper-nsp at puck.nether.net
RAS> http://puck.nether.net/mailman/listinfo/juniper-nsp
  
  

 



More information about the juniper-nsp mailing list