[f-nsp] MAC address table issues with ICX 6610 stack running 7.4.00f switch code

Frank Bulk frnkblk at iname.com
Wed Mar 11 10:10:43 EDT 2015


One last follow up the list on this -- as far as we're concerned, this issue
is now resolved.  We've been running this new code for 3+ weeks now and
we've only seen the normal flooding one would see.

The fix will not be in the 7.4 code line, but will be in the 8.0 code line
starting with 8.0.10k.

Frank

-----Original Message-----
From: foundry-nsp [mailto:foundry-nsp-bounces at puck.nether.net] On Behalf Of
Frank Bulk
Sent: Wednesday, February 18, 2015 12:51 AM
To: foundry-nsp at puck.nether.net
Subject: Re: [f-nsp] MAC address table issues with ICX 6610 stack running
7.4.00f switch code

Another update -- after many long months, Brocade provided us a lab build
that includes improved storm control and mac address table code based on top
of 8.0.10h.  It was stressed by the developer to confirm that the fdb/MAC
tables for each core always stay in sync.  It's been less than 24 hours, but
rather than leaking 20 to 90 Mbps of traffic, it's averaging less than 300
kbps of traffic, which would appear to be appropriate for broadcast traffic
and the occasional unicast flood.  To appreciate the drop off, see that
attached screenshot of the last 24 hours and last 7 days. 

Frank

-----Original Message-----
From: foundry-nsp [mailto:foundry-nsp-bounces at puck.nether.net] On Behalf Of
Frank Bulk
Sent: Friday, December 05, 2014 8:22 PM
To: foundry-nsp at puck.nether.net
Subject: Re: [f-nsp] MAC address table issues with ICX 6610 stack running
7.4.00f switch code

I have an update on this issue: DEFECT 528034 has been opened.  The short of
it is that if there's a traffic loop causing a lot of MAC moves between
cores that the FastIron storm control code, once the loop has subsided,
eventually updates only one of the four cores.

The good news is that as long as you're sure that there will be no traffic
loops this issue won't be a problem.  The bad news is that if there is a
traffic loop the fdb can become corrupted, and if you want to remove the
corruption the switch (or switches, if it's a stack) will need to be
reloaded (as clearing the MAC address entry/ies may not clear the fdb
corruption).  It's my understanding that the corruption is the cause of
issue (a), unicast flooding, and perhaps items (b) and (c).

And yes, it took four months for the case to get this far.

Frank

-----Original Message-----
From: foundry-nsp [mailto:foundry-nsp-bounces at puck.nether.net] On Behalf Of
Frank Bulk
Sent: Friday, October 03, 2014 6:09 PM
To: foundry-nsp at puck.nether.net
Subject: Re: [f-nsp] MAC address table issues with ICX 6610 stack running
7.4.00f switch code

I'm looking to find more examples of the issue in the field.  I'd appreciate
if anyone with an ICX6610 stack that has a largish MAC address table and
would be willing to have the stack's MAC address table audited by a Perl
script to contact me off-list (you would review and run the script
yourself).  

Thanks in advance,

Frank

-----Original Message-----
From: foundry-nsp [mailto:foundry-nsp-bounces at puck.nether.net] On Behalf Of
Frank Bulk (iname.com)
Sent: Wednesday, October 01, 2014 10:50 AM
To: foundry-nsp at puck.nether.net
Subject: Re: [f-nsp] MAC address table issues with ICX 6610 stack running
7.4.00f switch code

Just an update on this issue: we finally got a Brocade developer to
personally interact with our ICX 6610 stack.  After reviewing the issue and
poking through some 'dm' output the developer eventually identified the
reason for the unicast flooding issue: not all four packet processor CPU
cores share the same MAC address value for a certain hardware index.  

In the example I have after the signature you can find an example.  MAC
address entry 0090.5E14.6182 has a hardware index of 25864 (you can find the
HW index from the "show mac" command).  Checking each CPU core on each shelf
using the 'dm' command (I used the rcon command to remote to the standby
shelf) you can see that on the active shelf the MAC address for cores 1 thru
3 is incorrect.  All the cores on the standby shelf have the correct MAC
address.  Apparently the standby shelf learns its MAC address entries from
the active shelf.

In our case frames are predominately entering core 0 (each interface is tied
to one of the four cores), but they flood out the interfaces that are in
other cores on that shelf because there is no matching MAC address for that
traffic.

The developer is looking into why cores 1 thru 3 sometimes don't have the
right values.

I wrote a script to check every single MAC address in the MAC address table
(takes about 2.5 seconds) and found five such inconsistencies out of ~5,500
MAC addresses.  When I ran the script again overnight I found one more
inconsistency a different nature, where one of the standby shelf's cores had
a MAC address of all zeroes.  I checked another ICX 6610 stack that has just
~550 MAC addresses and found no inconsistencies. 

Frank


dm pp-dev 2 chow-diags core-id 0 read-mst 25864
 cli_ch5_core_based_dm_pp_read_hw_mst
 cli_ch5_core_based_dm_pp_read_hw_mst_idx
 Data: C3044B01 0120BC28 0000E004 00000000
     Valid [0]= True                       Skip [1]= False
     age [2]= False                        EntryType [3:4]= MAC addr
     VID [5:16]= 600                       MacAddr [17:64]= 0090.5E14.6182
     DevId [65:69]= 2                      SrcId [70:74]= 0
     Bits(24:13) [77:88]= 0x007            static [89]= False
     multiple [90]= False                  DA-Cmd [91:93]= FORWARD
     SA-Cmd [94:96]= FORWARD               DARoute [97]= False
     StormPrevention [98]= False           SAQosProfile ID [99:101]= 0
     DAQosProfile ID [102:104]= 0          MirrorToAnalyze [105]= False
     Bits(24:13)=0x7, user_defined=0, trunk=yes, port=3
 telnet at ICX6610-24 Switc 
 
dm pp-dev 2 chow-diags core-id 1 read-mst 25864
 cli_ch5_core_based_dm_pp_read_hw_mst
 cli_ch5_core_based_dm_pp_read_hw_mst_idx
 Data: E1B40143 38FDCA92 00000004 00000000
     Valid [0]= True                       Skip [1]= True
     age [2]= False                        EntryType [3:4]= MAC addr
     VID [5:16]= 10                        MacAddr [17:64]= 1C7E.E549.70DA
     DevId [65:69]= 2                      SrcId [70:74]= 0
     Bits(24:13) [77:88]= 0x000            static [89]= False
     multiple [90]= False                  DA-Cmd [91:93]= FORWARD
     SA-Cmd [94:96]= FORWARD               DARoute [97]= False
     StormPrevention [98]= False           SAQosProfile ID [99:101]= 0
     DAQosProfile ID [102:104]= 0          MirrorToAnalyze [105]= False
     Bits(24:13)=0x0, user_defined=0, trunk=no, port=0
 telnet at ICX6610-24 Switc 
 
dm pp-dev 2 chow-diags core-id 2 read-mst 25864
 cli_ch5_core_based_dm_pp_read_hw_mst
 cli_ch5_core_based_dm_pp_read_hw_mst_idx
 Data: E1B40143 38FDCA92 00000004 00000000
     Valid [0]= True                       Skip [1]= True
     age [2]= False                        EntryType [3:4]= MAC addr
     VID [5:16]= 10                        MacAddr [17:64]= 1C7E.E549.70DA
     DevId [65:69]= 2                      SrcId [70:74]= 0
     Bits(24:13) [77:88]= 0x000            static [89]= False
     multiple [90]= False                  DA-Cmd [91:93]= FORWARD
     SA-Cmd [94:96]= FORWARD               DARoute [97]= False
     StormPrevention [98]= False           SAQosProfile ID [99:101]= 0
     DAQosProfile ID [102:104]= 0          MirrorToAnalyze [105]= False
     Bits(24:13)=0x0, user_defined=0, trunk=no, port=0
 telnet at ICX6610-24 Switc 
 
dm pp-dev 2 chow-diags core-id 3 read-mst 25864
 cli_ch5_core_based_dm_pp_read_hw_mst
 cli_ch5_core_based_dm_pp_read_hw_mst_idx
 Data: E1B40143 38FDCA92 00000004 00000000
     Valid [0]= True                       Skip [1]= True
     age [2]= False                        EntryType [3:4]= MAC addr
     VID [5:16]= 10                        MacAddr [17:64]= 1C7E.E549.70DA
     DevId [65:69]= 2                      SrcId [70:74]= 0
     Bits(24:13) [77:88]= 0x000            static [89]= False
     multiple [90]= False                  DA-Cmd [91:93]= FORWARD
     SA-Cmd [94:96]= FORWARD               DARoute [97]= False
     StormPrevention [98]= False           SAQosProfile ID [99:101]= 0
     DAQosProfile ID [102:104]= 0          MirrorToAnalyze [105]= False
     Bits(24:13)=0x0, user_defined=0, trunk=no, port=0
 telnet at ICX6610-24 Switc 
 
dm pp-dev 0 chow-diags core-id 0 read-mst 25864
 cli_ch5_core_based_dm_pp_read_hw_mst
 cli_ch5_core_based_dm_pp_read_hw_mst_idx
 Data: C3044B01 0120BC28 0000E004 00000000
     Valid [0]= True                       Skip [1]= False
     age [2]= False                        EntryType [3:4]= MAC addr
     VID [5:16]= 600                       MacAddr [17:64]= 0090.5E14.6182
     DevId [65:69]= 2                      SrcId [70:74]= 0
     Bits(24:13) [77:88]= 0x007            static [89]= False
     multiple [90]= False                  DA-Cmd [91:93]= FORWARD
     SA-Cmd [94:96]= FORWARD               DARoute [97]= False
     StormPrevention [98]= False           SAQosProfile ID [99:101]= 0
     DAQosProfile ID [102:104]= 0          MirrorToAnalyze [105]= False
     Bits(24:13)=0x7, user_defined=0, trunk=yes, port=3
 [STBY]rconsole-1 at ICX6610-24 Switc 
 
dm pp-dev 0 chow-diags core-id 1 read-mst 25864
 cli_ch5_core_based_dm_pp_read_hw_mst
 cli_ch5_core_based_dm_pp_read_hw_mst_idx
 Data: C3044B01 0120BC28 0000E004 00000000
     Valid [0]= True                       Skip [1]= False
     age [2]= False                        EntryType [3:4]= MAC addr
     VID [5:16]= 600                       MacAddr [17:64]= 0090.5E14.6182
     DevId [65:69]= 2                      SrcId [70:74]= 0
     Bits(24:13) [77:88]= 0x007            static [89]= False
     multiple [90]= False                  DA-Cmd [91:93]= FORWARD
     SA-Cmd [94:96]= FORWARD               DARoute [97]= False
     StormPrevention [98]= False           SAQosProfile ID [99:101]= 0
     DAQosProfile ID [102:104]= 0          MirrorToAnalyze [105]= False
     Bits(24:13)=0x7, user_defined=0, trunk=yes, port=3
 [STBY]rconsole-1 at ICX6610-24 Switc 
 
dm pp-dev 0 chow-diags core-id 2 read-mst 25864
 cli_ch5_core_based_dm_pp_read_hw_mst
 cli_ch5_core_based_dm_pp_read_hw_mst_idx
 Data: C3044B05 0120BC28 0000E004 00000000
     Valid [0]= True                       Skip [1]= False
     age [2]= True                         EntryType [3:4]= MAC addr
     VID [5:16]= 600                       MacAddr [17:64]= 0090.5E14.6182
     DevId [65:69]= 2                      SrcId [70:74]= 0
     Bits(24:13) [77:88]= 0x007            static [89]= False
     multiple [90]= False                  DA-Cmd [91:93]= FORWARD
     SA-Cmd [94:96]= FORWARD               DARoute [97]= False
     StormPrevention [98]= False           SAQosProfile ID [99:101]= 0
     DAQosProfile ID [102:104]= 0          MirrorToAnalyze [105]= False
     Bits(24:13)=0x7, user_defined=0, trunk=yes, port=3
 [STBY]rconsole-1 at ICX6610-24 Switc 
 
dm pp-dev 0 chow-diags core-id 3 read-mst 25864
 cli_ch5_core_based_dm_pp_read_hw_mst
 cli_ch5_core_based_dm_pp_read_hw_mst_idx
 Data: C3044B01 0120BC28 0000E004 00000000
     Valid [0]= True                       Skip [1]= False
     age [2]= False                        EntryType [3:4]= MAC addr
     VID [5:16]= 600                       MacAddr [17:64]= 0090.5E14.6182
     DevId [65:69]= 2                      SrcId [70:74]= 0
     Bits(24:13) [77:88]= 0x007            static [89]= False
     multiple [90]= False                  DA-Cmd [91:93]= FORWARD
     SA-Cmd [94:96]= FORWARD               DARoute [97]= False
     StormPrevention [98]= False           SAQosProfile ID [99:101]= 0
     DAQosProfile ID [102:104]= 0          MirrorToAnalyze [105]= False
     Bits(24:13)=0x7, user_defined=0, trunk=yes, port=3
 [STBY]rconsole-1 at ICX6610-24 Switc 



-----Original Message-----
From: foundry-nsp [mailto:foundry-nsp-bounces at puck.nether.net] On Behalf Of
Frank Bulk
Sent: Friday, September 05, 2014 9:16 PM
To: foundry-nsp at puck.nether.net
Subject: Re: [f-nsp] MAC address table issues with ICX 6610 stack running
7.4.00f switch code

We replicated the issue with BTAC and continue troubleshooting.  BTAC
believes it's a packet processor issue -- we'll be more sure when we flip
the active member of the cross-stack LAG to the other stack member.

Frank

-----Original Message-----
From: foundry-nsp [mailto:foundry-nsp-bounces at puck.nether.net] On Behalf Of
Frank Bulk
Sent: Saturday, August 30, 2014 3:01 PM
To: foundry-nsp at puck.nether.net
Subject: [f-nsp] MAC address table issues with ICX 6610 stack running
7.4.00f switch code

A few weeks ago a customer alerted us to a packet loss issue that we
eventually traced down to a loop in a LAG on some access gear (not Brocade
gear).  

In the process of troubleshooting and looking at MAC address tables on the
intermediate gear we connected the WAN interface of a simple consumer-grade
router on ethernet 1/1/23 of ICX 6610 #1 so we had a pingable host.  What I
noticed, when graphing that port, is that we were seeing a lot of traffic
egressing the 1/1/23 -- anywhere from 200 kbps to 15 Mbps over the day!
That seemed like a lot more than the usual amount of broadcast traffic on
this 2500 host VLAN.  

This is an ICX 6610 stack running 7.4 switch code with almost all
connections being a cross-stack LAG.

Curious as to what was going on, I packet captured the port's output and
discovered a lot of unicast traffic flooding out of 1/1/23.  By doing some
troubleshooting I uncovered three different situations:
a) there are times the ICX 6610 lists the correct MAC address and port in
its table for a host yet it still floods (some) unicast traffic for that
host out of 1/1/23.
b) despite having a static mac address entry in the switch to a LAG port the
ICX6610 floods some traffic to that host out of 1/1/23 instead out of the
statically specified LAG port.
c) there are times the ICX 6610 has no MAC address table entry for a host,
even though it should have learned it because it's getting traffic from that
host on the LAG.  Entering a static MAC address and then removing it then
results in the switch learning it dynamically!

The only traffic I should be seeing out of 1/1/23 is spanning tree, ARP,
broadcast traffic, and any traffic for a host that has not yet been learned
by the switch.  

We opened up two cases with Brocade TAC and I was able to able to confirm
one of the two items with the tech, but since 7.4.00a had some MAC address
table issues (Defect ID 437017 is one of them), rather than troubleshoot
extensively we decided to start with a more current release and upgraded to
7.4.00f Thursday morning .... but I have already re-confirmed items (a) and
(c).

Has anyone else seen this issue?  I don't think you would really notice it
unless you really did some packet captures and looked for it.

Frank

_______________________________________________
foundry-nsp mailing list
foundry-nsp at puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp


_______________________________________________
foundry-nsp mailing list
foundry-nsp at puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp


_______________________________________________
foundry-nsp mailing list
foundry-nsp at puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp


_______________________________________________
foundry-nsp mailing list
foundry-nsp at puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp


_______________________________________________
foundry-nsp mailing list
foundry-nsp at puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp




More information about the foundry-nsp mailing list