[c-nsp] Nexus 7k with F2e line cards and egress queuing

Tue Dec 17 10:42:10 EST 2019

Hi Curtis,

Looks like we are on the same boat. We have similar experience with 
traffic stepping down from F3e 100G ports to 10G F2e, and also in 
parallel from F2e 10G to F2e 1G. Current TAC case opened for about 2 
months for now, I had two troubleshooting sessions with different TAC 
engineers and all I received after that were either out-of-office 
notifications or promises to respond to the EoD|EoW which didn't happen 
as of yet.
In our case the behavior of the issue is inconsistent: only particular 
traffic is getting dropped and only on the certain path. Incoming and 
outgoing ports are both in port-channels, no physical errors on the 
ports, only VOQ drops. Tried to re-shake port-channels with shut/no shut 
links one by one with no result. Some time ago segregated 10G ports to 
different port groups and it helped for some time but the problem 
re-appeared eventually. We are still on 6.2.12 and the outcome of 
previous case a year ago was "upgrade and reboot" but we didn't go for 
it partially because of the critical role of these switches and 
partially because of lack of confidence in the platform and final 
result. Fortunately, we found a workaround to reroute traffic to 
different path. Now it's coming to the same 10G egress ports from 
another 100G port-channel although 100G are on the same linecards as 
that were dropping packets.
Any advice would be appreciated.

Kind regards,
Andrey Kostin

Curtis Piehler писал 2019-12-14 11:17:
> I am hoping some of you Cisco Nexus veterans out there could shed some
> light on this issue or provide some insight if this has been 
> encountered
> before.
> 
> Has anyone had egress VQ Congestion issues on the Nexus 7k using F2e 
> line
> cards causing input discards?  There has been intentional influx of 
> traffic
> over the past few months to these units (Primarily VoIP traffic) IE: 
> SBCs
> and such.  These SBCs are mostly 1G interfaces with a 10G uplink to the
> core router.  At some point of traffic shift the switch interfaces 
> facing
> the SBC accrue egress VQ congestion and input discards start dropping
> packets into the switches from the core router uplinks.
> 
> We have opened a Cisco TAC ticket and they go through the whole thing 
> about
> the Nexus design and dropping packets on ingress if the destination 
> port is
> congestion, etc... and I get all that.  They also say going from a 10G
> uplink to a 1G downlink is not appropriate however those downstream 
> devices
> are not capable of 10G.  They amount of traffic influx isn't that much
> (your talking 20-30M max of VoIP).  We have removed Cisco FabricPath 
> from
> all VDCs and even upgraded our code from 6.2.16 to 6.2.20a on the 
> SUP-2E
> supervisors.
> 
> I understand the N7K-F248XP-23E/25E have 288KB/Port and 256KB/SOC and I
> would think these would be more than sufficient.  I know the 
> F3248XP-23/25
> have 295KB/Port 512KB/SOC however I can't see the need to drop 7x the
> amount for line cards that should be able to handle this traffic?
> 
> We have recently taken the approach of moving the 1G SBCs down to a 
> N5596
> VPC stack linked via 20G port-channel (per 5596) from the parent 7ks as 
> I
> understand the 5596 have different egress queue structures and maybe 
> more
> suited to handle this type of application?.
> 
> Any insight would be appreciated.
> 
> Thanks
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/