[f-nsp] Trying to diagnose a possibly failing FESX648-PREM

Thu May 8 17:13:24 EDT 2014

Just spoke with a sysadmin working out of a different datacenter. They have
FESX648-PREMs deployed and they're running sxr07400e.bin firmware as well.
Completely stumped at this point :-/

On Thu, May 8, 2014 at 1:38 PM, ebradsha at gmail.com <ebradsha at gmail.com>wrote:

> I just had a replacement FESX648-PREM delivered overnight, hooked it up
> and initially all looked good. However, when I imported my config and moved
> over all of the CAT5e cables, the packet loss and erratic pings resumed.
>
> Assuming that there was some firmware issue at play, I started removing
> different parts of my config while running a continuous ping test in the
> background. The moment I removed all rate-limiting from the device, packet
> loss halted and ping times stabilized. However, I continue to have problems
> downloading files at full speed -- speed test files will do these 'stop and
> start' pauses. Ultimately I can only average 6MB/s where I'd
> normally expect to pull down at least 200MB/s.
>
> My original switch was running sxr07400e.bin and the replacement is
> running sxr07400d.bin
>
> All my other switches are FESX448-PREMs, so unfortunately I don't have an
> existing example config to model after.
>
> Anyone recommend a boot ROM and firmware version that works well with a
> FESX648-PREM?
>
>
>
>
> On Wed, May 7, 2014 at 4:36 PM, ebradsha at gmail.com <ebradsha at gmail.com>wrote:
>
>> This is a stand-alone switch in a cabinet so no L2 loop there. Pretty
>> simple setup -- single BGP session with an upstream provider with the
>> default route pointing right to them. CPU utilization currently sitting at
>> 1%.
>>
>> Initially when I noticed the packet loss I thought I was getting DoS
>> attacked, but I have sFlow monitoring activated on all ports and don't see
>> anything out of the ordinary.
>>
>> I'll check the boot time diagnostics soon -- thanks for your input.
>>
>> - Elliot
>>
>>
>> On Wed, May 7, 2014 at 4:28 PM, Jeroen Wunnink | Hibernia Networks <
>> jeroen.wunnink at atrato.com> wrote:
>>
>>>  Could be a L2 loop or ddos against the mgmt IP. is the CPU load also
>>> high?
>>>
>>>
>>> On 07/05/14 20:46, ebradsha at gmail.com wrote:
>>>
>>> Hi all,
>>>
>>>  I believe I have a failing switch on my hands and I'm wondering if you
>>> might be able to provide an assessment based on the symptoms I've seeing.
>>>
>>>  I'm currently running a Foundry FESX648-PREM with the following
>>> version info:
>>>
>>>  SSH at FESX648 Router>show version
>>>   SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade
>>> Communications Systems, Inc. All rights reserved.
>>>       Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e
>>>       (4593059 bytes) Primary sxr07400e.bin
>>>        BootROM: Version 07.4.01T3e5 (FEv2)
>>>   HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)
>>>
>>> ==========================================================================
>>>       Serial  #: FL18090011
>>>          License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE   (LID: XXXXXXXXXXX)
>>>        P-ASIC  0: type 0111, rev 00  subrev 01
>>>       P-ASIC  1: type 0111, rev 00  subrev 01
>>>       P-ASIC  2: type 0111, rev 00  subrev 01
>>>       P-ASIC  3: type 0111, rev 00  subrev 01
>>>
>>> ==========================================================================
>>>   300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus
>>>   512 KB boot flash memory
>>>  8192 KB code flash memory
>>>   256 MB DRAM
>>> The system uptime is 26 minutes 49 seconds
>>> The system : started=warm start   reloaded=by "reload"
>>>
>>>
>>>  Quick summary of the symptoms:
>>>
>>>  1. These problems started only after ~15 servers were connected to the
>>> switch. Although many servers were connected, utilization remains low, only
>>> ~40Mbit on a 1Gbit uplink.
>>>
>>>  2. I just rebooted my switch 20 minutes ago, but I'm already seeing a
>>> ton of FCS errors across many ports: http://pbrd.co/SABLtk
>>>
>>>  3. Inexplicably high and erratic ping times (80ms, instead of the
>>> usual 20ms over the same route and variation of +- 20ms on every ping).
>>> Ping times were low and stable before many servers were connected.
>>>
>>>  4. High packet loss. Before a lot of servers were connected, there was
>>> no packet loss. Yesterday, the packet loss was hovering around 10%. It
>>> seems to be worsening now. Today the average packet loss is 20%.
>>>
>>>  Screen capture: http://pbrd.co/SADKO7 <http://pbrd.co/SABZ3D>
>>>
>>>  5. Yesterday I was also able to temporarily eliminate packet loss and
>>> the high ping times by disabling specific ports. Today, disabling ports 7
>>> and 11 has no effect.
>>>
>>>  6. The cross-connect cables were suspect, but all cables have since
>>> been tested with a MicroTest PentaScanner and all passed. We even replaced
>>> the CAT5 cross-connect with a machined and molded CAT6 cable -- the same
>>> packet loss and erratic ping times persisted.
>>>
>>>  7. Other strange things have happened. Yesterday I attempted to
>>> connect up two new servers to the switch on port 37 and 38. Ports 5-48
>>> belong to the same default VLAN. The servers could connect to the switch,
>>> and ping the gateway IP, but they could not ping to the outside world. I
>>> then moved the CAT5 cables to ports 22 and 23 -- same VLAN -- and
>>> everything worked perfectly.
>>>
>>>  Does this seem like a failing switch? Are there any further diagnostic
>>> tests I could run to verify this?
>>>
>>>  Thanks,
>>> Elliot
>>>
>>>
>>>
>>> _______________________________________________
>>> foundry-nsp mailing listfoundry-nsp at puck.nether.nethttp://puck.nether.net/mailman/listinfo/foundry-nsp
>>>
>>>
>>>
>>> --
>>>
>>> Jeroen Wunnink
>>> IP NOC Manager - Hibernia Networksjeroen.wunnink at hibernianetworks.com
>>> Phone: +1 908 516 4200 (Ext: 1011)
>>> 24/7 NOC Phone: +31 20 82 00 623
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/foundry-nsp/attachments/20140508/186f21af/attachment.html>