[j-nsp] MX240 Fabric Errors

David DeSimone fox at verio.net
Wed Oct 20 15:21:13 EDT 2010


I'm trying to understand more about Juniper MX architecture, in response
to a failure we recently saw.

The event starts with an RE panic, which caused a failure over to the
redundant RE.  JTAC noticed some fabric errors showing up, and concluded
that a failing FPC had caused the RE to panic, and recommended replacing
the FPC.  We are in the midst of doing that, but in the meantime, I'm
trying to make sense of the information, for my own edification.

For instance, here's the output of "show chassis fabric summary":

    Plane   State    Uptime
     0      Check    455 days, 18 hours, 49 minutes, 59 seconds
     1      Check    455 days, 18 hours, 49 minutes, 59 seconds
     2      Check    455 days, 18 hours, 49 minutes, 59 seconds
     3      Check    455 days, 18 hours, 49 minutes, 59 seconds
     4      Spare    455 days, 18 hours, 49 minutes, 59 seconds
     5      Spare    455 days, 18 hours, 49 minutes, 59 seconds
     6      Spare    455 days, 18 hours, 49 minutes, 59 seconds
     7      Spare    455 days, 18 hours, 49 minutes, 59 seconds

I'm not sure what the "Check" status means.  Is it telling me I need to
check something?  What sorts of commands would be used for that, and
what am I looking for in their output?

See below for some (long) output from "show chassis fabric plane". 
Notice that for each fabric plane, FPC 0 show "Link error" on all
connections.  From my view, this tells me that FPC 0 should be basically
offline and unusable.  However, all ports on that FPC were operating
perfectly and with no apparent errors, slow-downs, etc.  In fact, the
only indications we had that there were any problems were yellow chassis
alarms and the crashed RE (which rebooted just fine afterward).

I asked JTAC how FPC 0 could remain operational in this state, and the
reply is that "there is a lot of redundancy of connections", but looking
again at this fabric plane output, I cannot figure out where the
redundancy is.  It still seems to be telling me that all links to FPC 0
are down, so the FPC should be dead.

What am I missing here?

I'd appreciate any insight, or pointers to information describing the
fabric architecture on the MX platform, to help me understand this.


    Fabric management PLANE state
    Plane 0
      Plane state: ACTIVE
	  FPC 0
	      PFE 0 :Link error
	      PFE 1 :Link error
	      PFE 2 :Link error
	      PFE 3 :Link error
	  FPC 1
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 2
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 3
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
    Plane 1
      Plane state: ACTIVE
	  FPC 0
	      PFE 0 :Link error
	      PFE 1 :Link error
	      PFE 2 :Link error
	      PFE 3 :Link error
	  FPC 1
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 2
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 3
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
    Plane 2
      Plane state: ACTIVE
	  FPC 0
	      PFE 0 :Link error
	      PFE 1 :Link error
	      PFE 2 :Link error
	      PFE 3 :Link error
	  FPC 1
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 2
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 3
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
    Plane 3
      Plane state: ACTIVE
	  FPC 0
	      PFE 0 :Link error
	      PFE 1 :Link error
	      PFE 2 :Link error
	      PFE 3 :Link error
	  FPC 1
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 2
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 3
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
    Plane 4
      Plane state: SPARE
	  FPC 0
	      PFE 0 :Link error
	      PFE 1 :Link error
	      PFE 2 :Link error
	      PFE 3 :Link error
	  FPC 1
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 2
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 3
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
    Plane 5
      Plane state: SPARE
	  FPC 0
	      PFE 0 :Link error
	      PFE 1 :Link error
	      PFE 2 :Link error
	      PFE 3 :Link error
	  FPC 1
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 2
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 3
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
    Plane 6
      Plane state: SPARE
	  FPC 0
	      PFE 0 :Link error
	      PFE 1 :Link error
	      PFE 2 :Link error
	      PFE 3 :Link error
	  FPC 1
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 2
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 3
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
    Plane 7
      Plane state: SPARE
	  FPC 0
	      PFE 0 :Link error
	      PFE 1 :Link error
	      PFE 2 :Link error
	      PFE 3 :Link error
	  FPC 1
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 2
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok
	  FPC 3
	      PFE 0 :Links ok
	      PFE 1 :Links ok
	      PFE 2 :Links ok
	      PFE 3 :Links ok

-- 
David DeSimone == Network Admin == fox at verio.net
  "I don't like spinach, and I'm glad I don't, because if I
   liked it I'd eat it, and I just hate it." -- Clarence Darrow


This email message is intended for the use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. Verio, Inc. makes no warranty that this email is error or virus free.  Thank you.


More information about the juniper-nsp mailing list