[f-nsp] Trying to diagnose a possibly failing FESX648-PREM

Wed May 7 14:46:58 EDT 2014

Hi all,

I believe I have a failing switch on my hands and I'm wondering if you
might be able to provide an assessment based on the symptoms I've seeing.

I'm currently running a Foundry FESX648-PREM with the following version
info:

SSH at FESX648 Router>show version
  SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade Communications
Systems, Inc. All rights reserved.
      Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e
      (4593059 bytes) Primary sxr07400e.bin
      BootROM: Version 07.4.01T3e5 (FEv2)
  HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)
==========================================================================
      Serial  #: FL18090011
         License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE   (LID: XXXXXXXXXXX)
      P-ASIC  0: type 0111, rev 00  subrev 01
      P-ASIC  1: type 0111, rev 00  subrev 01
      P-ASIC  2: type 0111, rev 00  subrev 01
      P-ASIC  3: type 0111, rev 00  subrev 01
==========================================================================
  300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus
  512 KB boot flash memory
 8192 KB code flash memory
  256 MB DRAM
The system uptime is 26 minutes 49 seconds
The system : started=warm start   reloaded=by "reload"

Quick summary of the symptoms:

1. These problems started only after ~15 servers were connected to the
switch. Although many servers were connected, utilization remains low, only
~40Mbit on a 1Gbit uplink.

2. I just rebooted my switch 20 minutes ago, but I'm already seeing a ton
of FCS errors across many ports: http://pbrd.co/SABLtk

3. Inexplicably high and erratic ping times (80ms, instead of the usual
20ms over the same route and variation of +- 20ms on every ping). Ping
times were low and stable before many servers were connected.

4. High packet loss. Before a lot of servers were connected, there was no
packet loss. Yesterday, the packet loss was hovering around 10%. It seems
to be worsening now. Today the average packet loss is 20%.

Screen capture: http://pbrd.co/SADKO7 <http://pbrd.co/SABZ3D>

5. Yesterday I was also able to temporarily eliminate packet loss and the
high ping times by disabling specific ports. Today, disabling ports 7 and
11 has no effect.

6. The cross-connect cables were suspect, but all cables have since been
tested with a MicroTest PentaScanner and all passed. We even replaced the
CAT5 cross-connect with a machined and molded CAT6 cable -- the same packet
loss and erratic ping times persisted.

7. Other strange things have happened. Yesterday I attempted to connect up
two new servers to the switch on port 37 and 38. Ports 5-48 belong to the
same default VLAN. The servers could connect to the switch, and ping the
gateway IP, but they could not ping to the outside world. I then moved the
CAT5 cables to ports 22 and 23 -- same VLAN -- and everything worked
perfectly.

Does this seem like a failing switch? Are there any further diagnostic
tests I could run to verify this?

Thanks,
Elliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/foundry-nsp/attachments/20140507/8c84b279/attachment.html>