<div dir="ltr">I've seen ASIC failures in a lot of my old FESX's recently. Each failure has had it's own unique set of features.<div><br></div><div>I'd try moving the cross connect to a different tower (port group, ASIC) on both switches to rule out ASIC failure. </div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 28, 2015 at 5:52 PM, Randy McAnally <span dir="ltr"><<a href="mailto:rsm@fast-serv.com" target="_blank">rsm@fast-serv.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all, sorry for the long winded post but this has been eating away at me. Feel free to reply on or off list.<br>
<br>
Everything was fine for almost 2 years then out of the blue, a near complete black hole occurs with traffic between two FWSX switches. In case you aren't aware, FWSX are just regular FESX switches neutered so they can't be upgraded with a PREM layer3 license. Here's a diagram:<br>
<br>
-----------xc1-----------[FWSX 1]--[server1]<br>
| |<br>
[upstream switch] xc3<br>
| |<br>
-----------xc2-----------[FWSX 2]--[server2]<br>
<br>
Both FWSX's are pure layer2 and form a 802.1w loop with xc2 the blocking link. No frills, bells or whistles.<br>
<br>
After many hours of tcpdumping on servers connected to a pair of FWSX (basic layer2) switches, it turns out ARP unicast packets are being dropped by the x-connect between two switches but only in one direction. Below, you'll see the unicast reply to the initial broadcast, but subsequent unicast pings are dropped (thus only a single reply using arping).<br>
<br>
Traffic between two servers - SERVER 1 (switch1) to SERVER 2 (switch2):<br>
[root@cl-ash-s1 ~]# arping -I eth1.2 10.11.13.11<br>
ARPING 10.11.13.11 from 10.11.13.5 eth1.2<br>
Unicast reply from 10.11.13.11 [A0:36:9F:0E:13:B2] 2.453ms<br>
Sent 11 probes (1 broadcast(s))<br>
Received 1 response(s)<br>
<br>
And on the other server - SERVER 2 (switch2) to SERVER 1 (switch1):<br>
[root@localhost ~]# arping 10.11.13.5 -I xenbr2<br>
ARPING 10.11.13.5 from 10.11.13.11 xenbr2<br>
Sent 11 probes (11 broadcast(s))<br>
Received 0 response(s)<br>
<br>
<br>
In a nutshell -- Unicast ARP from server1 to server2 is completely dropped. Broadcast works in both directions, and unicast works only from server2 to server1.<br>
<br>
MAC tables on the FWSX's are sane. Every server is shown where it should be.<br>
<br>
Can reproduce this with any device or operating system. It's definitely NOT a problem with the host configuration(s).<br>
<br>
Now the kicker - if I remove the x-connect between the switches (and let spanning tree re-converge through the upstream switch both are connected to), things work normally. Tried swapping xc3 to different ports, no change. So as long as I boomerang inter-switch traffic through the upstream switch, we're good. Which is quite a bit, actually -- including SAN traffic -- I need to avoid this. Reboot both switches. No change. Software is latest for the platform (05.1.00eT1e0) so I can't try upgrading.<br>
<br>
So my simple question is -- has anyone ever seen brocade switches (in pure L2 duties) just straight up eat arp packets? And not only that -- but JUST unicast arp and in only one direction?<span class="HOEnZb"><font color="#888888"><br>
<br>
<br>
-- <br>
Randy McAnally<br>
_______________________________________________<br>
foundry-nsp mailing list<br>
<a href="mailto:foundry-nsp@puck.nether.net" target="_blank">foundry-nsp@puck.nether.net</a><br>
<a href="http://puck.nether.net/mailman/listinfo/foundry-nsp" target="_blank">http://puck.nether.net/mailman/listinfo/foundry-nsp</a><br>
</font></span></blockquote></div><br></div>