<div dir="ltr">Hi all,<div><br></div><div>After spamming this mailing list so heavily, I might as well spam it once more with the resolution for my perplexing problem.</div><div><br></div><div>Turns out the problem was more straightforward that I anticipated. As I hooked more servers up to the switch, I set many switch ports manually to 100-full -- as a crude means of rate-limiting. It turns out many of those servers dropped down to 100-half on their end -- with the <i>notable exception</i> of the server I was using for testing.</div>


<div><br></div><div>I suppose the packet loss, erratic ping times and degraded transfer speeds (even on my correctly negotiated test server) were all just a result of the switch becoming overwhelmed with duplex mismatch errors.</div>


<div><br></div><div>Thanks for all the tips -- one of you on here prompted me to check all the connected servers for duplex mismatches and that was the prompting I needed.</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">


On Thu, May 8, 2014 at 2:22 PM, Eldon Koyle <span dir="ltr"><<a href="mailto:esk-puck.nether.net@esk.cs.usu.edu" target="_blank">esk-puck.nether.net@esk.cs.usu.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Could it be a cabling issue?  Are there any errors?<br>

<br>

Is flow control enabled?<br>

<br>

--<br>

Eldon Koyle<br>

<div><div class="h5"><br>

On  May 08 14:13-0700, <a href="mailto:ebradsha@gmail.com">ebradsha@gmail.com</a> wrote:<br>

> Just spoke with a sysadmin working out of a different datacenter. They have<br>

> FESX648-PREMs deployed and they're running sxr07400e.bin firmware as well.<br>

> Completely stumped at this point :-/<br>

><br>

><br>

> On Thu, May 8, 2014 at 1:38 PM, <a href="mailto:ebradsha@gmail.com">ebradsha@gmail.com</a> <<a href="mailto:ebradsha@gmail.com">ebradsha@gmail.com</a>>wrote:<br>

><br>

> > I just had a replacement FESX648-PREM delivered overnight, hooked it up<br>

> > and initially all looked good. However, when I imported my config and moved<br>

> > over all of the CAT5e cables, the packet loss and erratic pings resumed.<br>

> ><br>

> > Assuming that there was some firmware issue at play, I started removing<br>

> > different parts of my config while running a continuous ping test in the<br>

> > background. The moment I removed all rate-limiting from the device, packet<br>

> > loss halted and ping times stabilized. However, I continue to have problems<br>

> > downloading files at full speed -- speed test files will do these 'stop and<br>

> > start' pauses. Ultimately I can only average 6MB/s where I'd<br>

> > normally expect to pull down at least 200MB/s.<br>

> ><br>

> > My original switch was running sxr07400e.bin and the replacement is<br>

> > running sxr07400d.bin<br>

> ><br>

> > All my other switches are FESX448-PREMs, so unfortunately I don't have an<br>

> > existing example config to model after.<br>

> ><br>

> > Anyone recommend a boot ROM and firmware version that works well with a<br>

> > FESX648-PREM?<br>

> ><br>

> ><br>

> ><br>

> ><br>

> > On Wed, May 7, 2014 at 4:36 PM, <a href="mailto:ebradsha@gmail.com">ebradsha@gmail.com</a> <<a href="mailto:ebradsha@gmail.com">ebradsha@gmail.com</a>>wrote:<br>

> ><br>

> >> This is a stand-alone switch in a cabinet so no L2 loop there. Pretty<br>

> >> simple setup -- single BGP session with an upstream provider with the<br>

> >> default route pointing right to them. CPU utilization currently sitting at<br>

> >> 1%.<br>

> >><br>

> >> Initially when I noticed the packet loss I thought I was getting DoS<br>

> >> attacked, but I have sFlow monitoring activated on all ports and don't see<br>

> >> anything out of the ordinary.<br>

> >><br>

> >> I'll check the boot time diagnostics soon -- thanks for your input.<br>

> >><br>

> >> - Elliot<br>

> >><br>

> >><br>

> >> On Wed, May 7, 2014 at 4:28 PM, Jeroen Wunnink | Hibernia Networks <<br>

> >> <a href="mailto:jeroen.wunnink@atrato.com">jeroen.wunnink@atrato.com</a>> wrote:<br>

> >><br>

> >>>  Could be a L2 loop or ddos against the mgmt IP. is the CPU load also<br>

> >>> high?<br>

> >>><br>

> >>><br>

> >>> On 07/05/14 20:46, <a href="mailto:ebradsha@gmail.com">ebradsha@gmail.com</a> wrote:<br>

> >>><br>

> >>> Hi all,<br>

> >>><br>

> >>>  I believe I have a failing switch on my hands and I'm wondering if you<br>

> >>> might be able to provide an assessment based on the symptoms I've seeing.<br>

> >>><br>

> >>>  I'm currently running a Foundry FESX648-PREM with the following<br>

> >>> version info:<br>

> >>><br>

> >>>  SSH@FESX648 Router>show version<br>

> >>>   SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade<br>

> >>> Communications Systems, Inc. All rights reserved.<br>

> >>>       Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e<br>

> >>>       (4593059 bytes) Primary sxr07400e.bin<br>

> >>>        BootROM: Version 07.4.01T3e5 (FEv2)<br>

> >>>   HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)<br>

> >>><br>

> >>> ==========================================================================<br>

> >>>       Serial  #: FL18090011<br>

> >>>          License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE   (LID: XXXXXXXXXXX)<br>

> >>>        P-ASIC  0: type 0111, rev 00  subrev 01<br>

> >>>       P-ASIC  1: type 0111, rev 00  subrev 01<br>

> >>>       P-ASIC  2: type 0111, rev 00  subrev 01<br>

> >>>       P-ASIC  3: type 0111, rev 00  subrev 01<br>

> >>><br>

> >>> ==========================================================================<br>

> >>>   300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus<br>

> >>>   512 KB boot flash memory<br>

> >>>  8192 KB code flash memory<br>

> >>>   256 MB DRAM<br>

> >>> The system uptime is 26 minutes 49 seconds<br>

> >>> The system : started=warm start   reloaded=by "reload"<br>

> >>><br>

> >>><br>

> >>>  Quick summary of the symptoms:<br>

> >>><br>

> >>>  1. These problems started only after ~15 servers were connected to the<br>

> >>> switch. Although many servers were connected, utilization remains low, only<br>

> >>> ~40Mbit on a 1Gbit uplink.<br>

> >>><br>

> >>>  2. I just rebooted my switch 20 minutes ago, but I'm already seeing a<br>

> >>> ton of FCS errors across many ports: <a href="http://pbrd.co/SABLtk" target="_blank">http://pbrd.co/SABLtk</a><br>

> >>><br>

> >>>  3. Inexplicably high and erratic ping times (80ms, instead of the<br>

> >>> usual 20ms over the same route and variation of +- 20ms on every ping).<br>

> >>> Ping times were low and stable before many servers were connected.<br>

> >>><br>

> >>>  4. High packet loss. Before a lot of servers were connected, there was<br>

> >>> no packet loss. Yesterday, the packet loss was hovering around 10%. It<br>

> >>> seems to be worsening now. Today the average packet loss is 20%.<br>

> >>><br>

</div></div>> >>>  Screen capture: <a href="http://pbrd.co/SADKO7" target="_blank">http://pbrd.co/SADKO7</a> <<a href="http://pbrd.co/SABZ3D" target="_blank">http://pbrd.co/SABZ3D</a>><br>

<div class="">> >>><br>

> >>>  5. Yesterday I was also able to temporarily eliminate packet loss and<br>

> >>> the high ping times by disabling specific ports. Today, disabling ports 7<br>

> >>> and 11 has no effect.<br>

> >>><br>

> >>>  6. The cross-connect cables were suspect, but all cables have since<br>

> >>> been tested with a MicroTest PentaScanner and all passed. We even replaced<br>

> >>> the CAT5 cross-connect with a machined and molded CAT6 cable -- the same<br>

> >>> packet loss and erratic ping times persisted.<br>

> >>><br>

> >>>  7. Other strange things have happened. Yesterday I attempted to<br>

> >>> connect up two new servers to the switch on port 37 and 38. Ports 5-48<br>

> >>> belong to the same default VLAN. The servers could connect to the switch,<br>

> >>> and ping the gateway IP, but they could not ping to the outside world. I<br>

> >>> then moved the CAT5 cables to ports 22 and 23 -- same VLAN -- and<br>

> >>> everything worked perfectly.<br>

> >>><br>

> >>>  Does this seem like a failing switch? Are there any further diagnostic<br>

> >>> tests I could run to verify this?<br>

> >>><br>

> >>>  Thanks,<br>

> >>> Elliot<br>

> >>><br>

> >>><br>

> >>><br>

> >>> _______________________________________________<br>

</div>> >>> foundry-nsp mailing listfoundry-nsp@puck.nether.nethttp://<a href="http://puck.nether.net/mailman/listinfo/foundry-nsp" target="_blank">puck.nether.net/mailman/listinfo/foundry-nsp</a><br>

<div class="HOEnZb"><div class="h5">> >>><br>

> >>><br>

> >>><br>

> >>> --<br>

> >>><br>

> >>> Jeroen Wunnink<br>

> >>> IP NOC Manager - Hibernia <a href="mailto:Networksjeroen.wunnink@hibernianetworks.com">Networksjeroen.wunnink@hibernianetworks.com</a><br>

> >>> Phone: <a href="tel:%2B1%20908%20516%204200" value="+19085164200">+1 908 516 4200</a> (Ext: 1011)<br>

> >>> 24/7 NOC Phone: <a href="tel:%2B31%2020%2082%2000%20623" value="+31208200623">+31 20 82 00 623</a><br>

> >>><br>

> >>><br>

> >><br>

> ><br>

<br>

> _______________________________________________<br>

> foundry-nsp mailing list<br>

> <a href="mailto:foundry-nsp@puck.nether.net">foundry-nsp@puck.nether.net</a><br>

> <a href="http://puck.nether.net/mailman/listinfo/foundry-nsp" target="_blank">http://puck.nether.net/mailman/listinfo/foundry-nsp</a><br>

<br>

</div></div></blockquote></div><br></div>