[j-nsp] IRB interface on QFX5100 stopped receiving packets

Martin Millnert millnert at gmail.com
Sat Apr 2 19:35:05 EDT 2016


(5 year de-lurk)
Hi,

We have a couple of QFX5100's creating Spine/Leaf EBGP-based DC-network.
The ToRs are southbound to servers configured with:
 - MC-LAG, active/active
 - IRBs with VRRP (not native ae mac-sync)
 - IPv4 only
 
Running 14.1X53-D31.2 - release notes for 14.1X53-D35 (latest) has no
useful information.

We thought it worked fine for a very short and peaceful while. Until it
turns out it actually doesn't work fine, at all.

Observed behaviour is:
One or more IRBs will, on both switches in the ToR pair, start dropping
all ARPs. Others continue working.
They're supposed to snoop or similar the ARPs, and then install them
locally + send to the peer using ICCP over the ICL, for remote
installation.

anticimex at kvm-qfx5100-dc1-r2-1# run show ethernet-switching
redundancy-groups arp-statistics    
Redundancy Group ID : 1       Flags : Local Connect,Remote Connect


MCLAG ARP Statistics
Group ID                                 : 1       
ARP Rx Count From Line                   : 6359194
ARP Tx Count To Peer                     : 36713
ARP Rx Count From Peer                   : 23767
ARP Install Count                        : 23767 
ARP Drop Count received from line        : 6322481
ARP Drop Count received from peer        : 0    

anticimex at kvm-qfx5100-dc1-r2-2# run show ethernet-switching
redundancy-groups arp-statistics    
Redundancy Group ID : 1       Flags : Local Connect,Remote Connect


MCLAG ARP Statistics
Group ID                                 : 1       
ARP Rx Count From Line                   : 4431333
ARP Tx Count To Peer                     : 23767
ARP Rx Count From Peer                   : 36842
ARP Install Count                        : 36842 
ARP Drop Count received from line        : 4407566
ARP Drop Count received from peer        : 0    

-2 is smaller since I disabled south+northbound interfaces on it a
little while ago. No help.
Have tried deactivating and re-activating interfaces. Have tried 'commit
full'. All to no avail.

Additionally, the interface packet counters on the IRB interfaces do
seem to be working correctly.
They increment at a few packets per second, which may correlate with a
statement from someone on #juniper (IRC) about an issue where unicast
stopped working and only multicast did.
The counter effect may be red herring however, if ARPs are never
supposed to be counted on the IRBs due to the snooping/punting happening
earlier in receive path. *shrug*

Has anyone else had this issue, and managed to solve it?


Best & TIA for any sort of pointers,
Martin Millnert

PS. My personal favorite is actually to get rid of MC-LAG and do dynamic
routing to hosts over redundant separate interfaces. That's
unfortunately blocked by the fact that Contrail's vRouter doesn't
support it (yet).



More information about the juniper-nsp mailing list