[f-nsp] NetIron 5.8f feedbacks

Jörg Kost jk at ip-clear.de
Fri Feb 3 05:55:33 EST 2017


Hello,

good point with timers. I changed all transceivers but the issue still 
exists. I turned on debugging mode and noticed that the problem only 
exists on ethernet 2/4. That interface is part of  a lag spanning 2x 
BR-MLX-10Gx4-X, using default-mode. The other side is a Brocade VDX with 
all interfaces on long-mode, like it was recommended by some Brocade 
document.

The lacp-counters are exploding on that 2/4 interfaces also. There are 
no framing, crc - errors visible on MLXE or VDX.

MLXE-side:

====LAG "core10" ID 2 ====

Port  Role   Sys    Port    Oper 
[Act][Tio][Agg][Syn][Col][Dis][Def][Exp][Ope][Port]
              Pri    Pri     Key                                         
        Num

2/3   ACTR     1       1    100  Yes   L   Agg  Syn  Col  Dis  No   No   
Ope  51
2/3   PRTR 32768   32768     10  Yes   L   Agg  Syn  Col  Dis  No   No   
Ope  5124
2/4   ACTR     1       1    100  Yes   L   Agg  Syn  Col  Dis  No   No   
Ope  52
2/4   PRTR 32768   32768     10  Yes   L   Agg  Syn  Col  Dis  No   No   
Ope  5123
3/3   ACTR     1       1    100  Yes   L   Agg  Syn  Col  Dis  No   No   
Ope  99
3/3   PRTR 32768   32768     10  Yes   L   Agg  Syn  Col  Dis  No   No   
Ope  5636
3/4   ACTR     1       1    100  Yes   L   Agg  Syn  Col  Dis  No   No   
Ope  100
3/4   PRTR 32768   32768     10  Yes   L   Agg  Syn  Col  Dis  No   No   
Ope  5635

Actor	System MAC  cxxx.xxxx.xxxx

Port     Partner         MP LACP  LP LACP  MP LACP  LP LACP  MP LACP   
LP LACP   MP MARKER LP MARKER
         System MAC       Rx Count Rx Count Tx Count Tx Count Err Count 
Err Count Rx Count  Rx Count
2/3     01e0.5200.xxxx    76112    76112    76116    76116         0     
     0         0         0
2/4     01e0.5200.xxxx  1814178  1814178  1874205  1874205         0     
     0         0         0
3/3     01e0.5200.xxxx    76112    76112    76116    76116         0     
     0         0         0
3/4     01e0.5200.xxxx    76111    76111    76116    76116         0     
     0         0         0

VDX-Side:
core-10# show port-channel 10
  LACP Aggregator: Po 10 (vLAG)
  Aggregator type: Standard
  Ignore-split is enabled
   Member rbridges:
     rbridge-id: 10 (2)
     rbridge-id: 11 (2)
   Admin Key: 0010 - Oper Key 0010
   Partner Oper Key 0100
  Member ports on rbridge-id 10:
    Link: Te 10/0/3 (0xA18018002) sync: 1
    Link: Te 10/0/4 (0xA18020003) sync: 1

  Member ports on rbridge-id 11:
    Link: Te 11/0/3 (0xB18018002) sync: 1
    Link: Te 11/0/4 (0xB18020003) sync: 1   *

core-10# show running-config interface TenGigabitEthernet 10/0/3
interface TenGigabitEthernet 10/0/3
  no fabric isl enable
  no fabric trunk enable
  channel-group 10 mode active type standard
  lacp timeout long
  no shutdown
!
core-10# show running-config interface TenGigabitEthernet 10/0/4
interface TenGigabitEthernet 10/0/4
  no fabric isl enable
  no fabric trunk enable
  channel-group 10 mode active type standard
  lacp timeout long
  no shutdown
!
core-10# show running-config interface TenGigabitEthernet 11/0/4
interface TenGigabitEthernet 11/0/4
  no fabric isl enable
  no fabric trunk enable
  channel-group 10 mode active type standard
  lacp timeout long
  no shutdown
!
core-10# show running-config interface TenGigabitEthernet 11/0/3
interface TenGigabitEthernet 11/0/3
  no fabric isl enable
  no fabric trunk enable
  channel-group 10 mode active type standard
  lacp timeout long
  no shutdown

Debug output for lacp for any other interface is almost silent, but for 
2/4 it is every second complaining:

Feb  3 11:30:13 MLXE Feb  3 11:30:13.255 Ticks45671481: Lacp restrict_tx 
timer started for port 2/4 (timeout = 1000 ms)
Feb  3 11:30:14 MLXE Feb  3 11:30:14.160 LACP: RX on 2/4 
A<8000:01e0.5200.xxxx:000a:8000:1402:A.ASCD..>P<0001:cxxx.xxx.xxx:0064:0001:0033:ATAS....>
Feb  3 11:30:14 MLXE Feb  3 11:30:14.160 rx_machine:Port2/4: event = 7 
(Lac_received), current state = 105 (CURRENT)
Feb  3 11:30:14 MLXE Feb  3 11:30:14.160 rxm_current:Port2/4: old state 
= 105 (CURRENT)
Feb  3 11:30:14 MLXE Feb  3 11:30:14.161 Ticks45671499: Lacp 
tx_scheduler timer started for port 2/4 (timeout = 100 ms)
Feb  3 11:30:14 MLXE Feb  3 11:30:14.161 select_aggregator:Port2/4: 
select aggregator 3761f200 
[Aport2/3,Key0064,LAG[(0001,cxxx.xxx.xxx,0064),(8000,01e0.5200.xxxx,000a)]]
Feb  3 11:30:14 MLXE Feb  3 11:30:14.161 rxm_current:Port2/4: stop 
current_while_timer (handle 1686)
Feb  3 11:30:14 MLXE Feb  3 11:30:14.161 Ticks45671499: Lacp 
current_while timer stopped for port 2/4
Feb  3 11:30:14 MLXE Feb  3 11:30:14.161 Ticks45671499: Lacp 
current_while timer started for port 2/4 (timeout = 90000 ms)
Feb  3 11:30:14 MLXE Feb  3 11:30:14.161 mux_machine:Port2/4: event = 8 
(Lac_new_info), current state = 305 (DISTRIBUTING)
Feb  3 11:30:14 MLXE Feb  3 11:30:14.254 Ticks45671501: Lacp 
tx_scheduler timer expired for port 2/4
Feb  3 11:30:14 MLXE Feb  3 11:30:14.254 LACP: TX on 2/4 
A<0001:cxxx.xxx.xxx:0064:0001:0033:A.ASCD..>P<8000:01e0.5200.xxxx:000a:8000:1402:A.ASCD..>
Feb  3 11:30:14 MLXE Feb  3 11:30:14.254 Ticks45671501: Lacp restrict_tx 
timer expired for port 2/4

When i disable and re-enable that port 2/4 by editing the 
lag-configuration, things will look fine for an amount of time and the 
LACP messages will go silent like all the other ports. I tried this 
several times, but forgot to configure the debug output right to capture 
the messages.

So currently I am logging the debug LACP outputs to an external 
destination and I am looking forward that the interface will go nuts 
again.

Any idea so far?

Jörg

On 27 Jan 2017, at 1:39, Tim Warnock wrote:

> ---
>
> Sorry for the 20 questions but:
> Are you seeing this on slow timers or fast timers? Do you use 
> observium? What line card are you using (8x10g)?
>
> Thanks
> -Tim.
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp at puck.nether.net
> http://puck.nether.net/mailman/listinfo/foundry-nsp


More information about the foundry-nsp mailing list