[f-nsp] Trying to diagnose a possibly failing FESX648-PREM
ebradsha at gmail.com
ebradsha at gmail.com
Thu May 8 18:07:12 EDT 2014
Plenty of FCS errors and they're incrementing on the new switch as well.
Flow control is enabled on all ports. Here's my 'show statistics' output:
SSH at FESX648 Router(config)#show statistics
Port In Packets Out Packets In Errors Out
Errors
1 180855 0
0 0
2 0 0
0 0
3 123136488 70341679
0 0
4 0 0
0 0
5 5315137 6604598
648949 0
6 342105 1549867
535454 0
7 9669516 16503017
3137016 0
8 14399232 29683571
1 0
9 9974691 18817287
3853703 0
10 4152353 4000770
0 0
11 13630527 25175503
5483288 0
12 71369 149477
1642 0
13 6881418 1668386
158036 0
14 939892 3171692
261376 0
15 11008907 20921720
4404347 0
16 77529 222362
24009 0
17 433 87820
0 0
18 82308 1759389
759693 0
19 0 0
0 0
20 27175 109184
1567 0
21 0 0
0 0
22 0 0
0 0
23 0 0
0 0
24 0 0
0 0
25 0 391
0 0
26 410 0
0 0
27 0 0
0 0
28 0 0
0 0
29 0 0
0 0
Almost every port that is active has FCS errors.
I've had such an bizarre combination of symptoms (15% packet loss
and erratic pings that was resolved by removing rate-limiting), that I
initially I discounted the possibility that my cables were bad. However, I
did self-terminate all of them (I've terminated thousands of cables) and I
was using a new bag of RJ45 plugs that I haven't used elsewhere.
The datacenter technician who tested my uplink cross-connect cable also
tested one of my self-terminated cables. Both cables passed the test, but
maybe the rest of my self-terminated cables are bad...
On Thu, May 8, 2014 at 2:22 PM, Eldon Koyle <
esk-puck.nether.net at esk.cs.usu.edu> wrote:
> Could it be a cabling issue? Are there any errors?
>
> Is flow control enabled?
>
> --
> Eldon Koyle
>
> On May 08 14:13-0700, ebradsha at gmail.com wrote:
> > Just spoke with a sysadmin working out of a different datacenter. They
> have
> > FESX648-PREMs deployed and they're running sxr07400e.bin firmware as
> well.
> > Completely stumped at this point :-/
> >
> >
> > On Thu, May 8, 2014 at 1:38 PM, ebradsha at gmail.com <ebradsha at gmail.com
> >wrote:
> >
> > > I just had a replacement FESX648-PREM delivered overnight, hooked it up
> > > and initially all looked good. However, when I imported my config and
> moved
> > > over all of the CAT5e cables, the packet loss and erratic pings
> resumed.
> > >
> > > Assuming that there was some firmware issue at play, I started removing
> > > different parts of my config while running a continuous ping test in
> the
> > > background. The moment I removed all rate-limiting from the device,
> packet
> > > loss halted and ping times stabilized. However, I continue to have
> problems
> > > downloading files at full speed -- speed test files will do these
> 'stop and
> > > start' pauses. Ultimately I can only average 6MB/s where I'd
> > > normally expect to pull down at least 200MB/s.
> > >
> > > My original switch was running sxr07400e.bin and the replacement is
> > > running sxr07400d.bin
> > >
> > > All my other switches are FESX448-PREMs, so unfortunately I don't have
> an
> > > existing example config to model after.
> > >
> > > Anyone recommend a boot ROM and firmware version that works well with a
> > > FESX648-PREM?
> > >
> > >
> > >
> > >
> > > On Wed, May 7, 2014 at 4:36 PM, ebradsha at gmail.com <ebradsha at gmail.com
> >wrote:
> > >
> > >> This is a stand-alone switch in a cabinet so no L2 loop there. Pretty
> > >> simple setup -- single BGP session with an upstream provider with the
> > >> default route pointing right to them. CPU utilization currently
> sitting at
> > >> 1%.
> > >>
> > >> Initially when I noticed the packet loss I thought I was getting DoS
> > >> attacked, but I have sFlow monitoring activated on all ports and
> don't see
> > >> anything out of the ordinary.
> > >>
> > >> I'll check the boot time diagnostics soon -- thanks for your input.
> > >>
> > >> - Elliot
> > >>
> > >>
> > >> On Wed, May 7, 2014 at 4:28 PM, Jeroen Wunnink | Hibernia Networks <
> > >> jeroen.wunnink at atrato.com> wrote:
> > >>
> > >>> Could be a L2 loop or ddos against the mgmt IP. is the CPU load also
> > >>> high?
> > >>>
> > >>>
> > >>> On 07/05/14 20:46, ebradsha at gmail.com wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> I believe I have a failing switch on my hands and I'm wondering if
> you
> > >>> might be able to provide an assessment based on the symptoms I've
> seeing.
> > >>>
> > >>> I'm currently running a Foundry FESX648-PREM with the following
> > >>> version info:
> > >>>
> > >>> SSH at FESX648 Router>show version
> > >>> SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade
> > >>> Communications Systems, Inc. All rights reserved.
> > >>> Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e
> > >>> (4593059 bytes) Primary sxr07400e.bin
> > >>> BootROM: Version 07.4.01T3e5 (FEv2)
> > >>> HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)
> > >>>
> > >>>
> ==========================================================================
> > >>> Serial #: FL18090011
> > >>> License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE (LID:
> XXXXXXXXXXX)
> > >>> P-ASIC 0: type 0111, rev 00 subrev 01
> > >>> P-ASIC 1: type 0111, rev 00 subrev 01
> > >>> P-ASIC 2: type 0111, rev 00 subrev 01
> > >>> P-ASIC 3: type 0111, rev 00 subrev 01
> > >>>
> > >>>
> ==========================================================================
> > >>> 300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus
> > >>> 512 KB boot flash memory
> > >>> 8192 KB code flash memory
> > >>> 256 MB DRAM
> > >>> The system uptime is 26 minutes 49 seconds
> > >>> The system : started=warm start reloaded=by "reload"
> > >>>
> > >>>
> > >>> Quick summary of the symptoms:
> > >>>
> > >>> 1. These problems started only after ~15 servers were connected to
> the
> > >>> switch. Although many servers were connected, utilization remains
> low, only
> > >>> ~40Mbit on a 1Gbit uplink.
> > >>>
> > >>> 2. I just rebooted my switch 20 minutes ago, but I'm already seeing
> a
> > >>> ton of FCS errors across many ports: http://pbrd.co/SABLtk
> > >>>
> > >>> 3. Inexplicably high and erratic ping times (80ms, instead of the
> > >>> usual 20ms over the same route and variation of +- 20ms on every
> ping).
> > >>> Ping times were low and stable before many servers were connected.
> > >>>
> > >>> 4. High packet loss. Before a lot of servers were connected, there
> was
> > >>> no packet loss. Yesterday, the packet loss was hovering around 10%.
> It
> > >>> seems to be worsening now. Today the average packet loss is 20%.
> > >>>
> > >>> Screen capture: http://pbrd.co/SADKO7 <http://pbrd.co/SABZ3D>
> > >>>
> > >>> 5. Yesterday I was also able to temporarily eliminate packet loss
> and
> > >>> the high ping times by disabling specific ports. Today, disabling
> ports 7
> > >>> and 11 has no effect.
> > >>>
> > >>> 6. The cross-connect cables were suspect, but all cables have since
> > >>> been tested with a MicroTest PentaScanner and all passed. We even
> replaced
> > >>> the CAT5 cross-connect with a machined and molded CAT6 cable -- the
> same
> > >>> packet loss and erratic ping times persisted.
> > >>>
> > >>> 7. Other strange things have happened. Yesterday I attempted to
> > >>> connect up two new servers to the switch on port 37 and 38. Ports
> 5-48
> > >>> belong to the same default VLAN. The servers could connect to the
> switch,
> > >>> and ping the gateway IP, but they could not ping to the outside
> world. I
> > >>> then moved the CAT5 cables to ports 22 and 23 -- same VLAN -- and
> > >>> everything worked perfectly.
> > >>>
> > >>> Does this seem like a failing switch? Are there any further
> diagnostic
> > >>> tests I could run to verify this?
> > >>>
> > >>> Thanks,
> > >>> Elliot
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> foundry-nsp mailing listfoundry-nsp at puck.nether.nethttp://
> puck.nether.net/mailman/listinfo/foundry-nsp
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Jeroen Wunnink
> > >>> IP NOC Manager - Hibernia
> Networksjeroen.wunnink at hibernianetworks.com
> > >>> Phone: +1 908 516 4200 (Ext: 1011)
> > >>> 24/7 NOC Phone: +31 20 82 00 623
> > >>>
> > >>>
> > >>
> > >
>
> > _______________________________________________
> > foundry-nsp mailing list
> > foundry-nsp at puck.nether.net
> > http://puck.nether.net/mailman/listinfo/foundry-nsp
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/foundry-nsp/attachments/20140508/9639b93a/attachment.html>
More information about the foundry-nsp
mailing list