[c-nsp] Nexus 7k with F2e line cards and egress queuing

Tue Dec 17 11:08:20 EST 2019

We've had several troubleshooting sessions with Cisco TAC with the end
result the same.  Basically every tech regurgitating the design of the
Nexus platform and that you can never trust non saturated links as queues
could fill up quicker.

Their solution was: configure a port channel or add more links to the LAG.

Our environment has been running rock solid for the past 6 to 8 months.
Why all of the sudden?  It makes no logical sense at all.  It's not like we
have an influx of 10G of traffic.  Your talking maybe 70M tops overall.

We ended up taking those host 1G host devices and 10G firewalls and putting
them on a donwstream N5596 connected via a 20G port channel to the parent
7k.

I see no reason why we would need to drop hundreds of K on M3 line cards
for existing line cards that should be able to handle this.

On Tue, Dec 17, 2019, 7:42 AM Andrey Kostin <ankost at podolsk.ru> wrote:

> Hi Curtis,
>
> Looks like we are on the same boat. We have similar experience with
> traffic stepping down from F3e 100G ports to 10G F2e, and also in
> parallel from F2e 10G to F2e 1G. Current TAC case opened for about 2
> months for now, I had two troubleshooting sessions with different TAC
> engineers and all I received after that were either out-of-office
> notifications or promises to respond to the EoD|EoW which didn't happen
> as of yet.
> In our case the behavior of the issue is inconsistent: only particular
> traffic is getting dropped and only on the certain path. Incoming and
> outgoing ports are both in port-channels, no physical errors on the
> ports, only VOQ drops. Tried to re-shake port-channels with shut/no shut
> links one by one with no result. Some time ago segregated 10G ports to
> different port groups and it helped for some time but the problem
> re-appeared eventually. We are still on 6.2.12 and the outcome of
> previous case a year ago was "upgrade and reboot" but we didn't go for
> it partially because of the critical role of these switches and
> partially because of lack of confidence in the platform and final
> result. Fortunately, we found a workaround to reroute traffic to
> different path. Now it's coming to the same 10G egress ports from
> another 100G port-channel although 100G are on the same linecards as
> that were dropping packets.
> Any advice would be appreciated.
>
> Kind regards,
> Andrey Kostin
>
> Curtis Piehler писал 2019-12-14 11:17:
> > I am hoping some of you Cisco Nexus veterans out there could shed some
> > light on this issue or provide some insight if this has been
> > encountered
> > before.
> >
> > Has anyone had egress VQ Congestion issues on the Nexus 7k using F2e
> > line
> > cards causing input discards?  There has been intentional influx of
> > traffic
> > over the past few months to these units (Primarily VoIP traffic) IE:
> > SBCs
> > and such.  These SBCs are mostly 1G interfaces with a 10G uplink to the
> > core router.  At some point of traffic shift the switch interfaces
> > facing
> > the SBC accrue egress VQ congestion and input discards start dropping
> > packets into the switches from the core router uplinks.
> >
> > We have opened a Cisco TAC ticket and they go through the whole thing
> > about
> > the Nexus design and dropping packets on ingress if the destination
> > port is
> > congestion, etc... and I get all that.  They also say going from a 10G
> > uplink to a 1G downlink is not appropriate however those downstream
> > devices
> > are not capable of 10G.  They amount of traffic influx isn't that much
> > (your talking 20-30M max of VoIP).  We have removed Cisco FabricPath
> > from
> > all VDCs and even upgraded our code from 6.2.16 to 6.2.20a on the
> > SUP-2E
> > supervisors.
> >
> > I understand the N7K-F248XP-23E/25E have 288KB/Port and 256KB/SOC and I
> > would think these would be more than sufficient.  I know the
> > F3248XP-23/25
> > have 295KB/Port 512KB/SOC however I can't see the need to drop 7x the
> > amount for line cards that should be able to handle this traffic?
> >
> > We have recently taken the approach of moving the 1G SBCs down to a
> > N5596
> > VPC stack linked via 20G port-channel (per 5596) from the parent 7ks as
> > I
> > understand the 5596 have different egress queue structures and maybe
> > more
> > suited to handle this type of application?.
> >
> > Any insight would be appreciated.
> >
> > Thanks
> > _______________________________________________
> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > archive at http://puck.nether.net/pipermail/cisco-nsp/
>
>