[f-nsp] MLX throughput issues

Mon Feb 16 23:08:54 EST 2015

Why kind of wigout? And how do you diagnose the corruption?  I'm intrigued.

On Mon, Feb 16, 2015 at 8:43 AM, Brad Fleming <bdflemin at gmail.com> wrote:

> We’ve seen it since installing the high-capacity switch fabrics into our
> XMR4000 chassis roughly 4 years ago. We saw it through IronWare 5.4.00d.
> I’m not sure what software we were using when they were first installed;
> probably whatever would have been stable/popular around December 2010.
>
> Command is simply “power-off snm [1-3]” then “power-on snm [1-3]”.
>
> Note that the power-on process causes your management session to hang for
> a few seconds. The device isn’t broken and packets aren’t getting dropped;
> it’s just going through checks and echoing back status.
>
> -brad
>
>
> > On Feb 16, 2015, at 7:07 AM, Jethro R Binks <jethro.binks at strath.ac.uk>
> wrote:
> >
> > On Fri, 13 Feb 2015, Brad Fleming wrote:
> >
> >> Over the years we’ve seen odd issues where one of the
> >> switch-fabric-links will “wigout” and some of the data moving between
> >> cards will get corrupted. When this happens we power cycle each switch
> >> fab one at a time using this process:
> >>
> >> 1) Shutdown SFM #3
> >> 2) Wait 1 minute
> >> 3) Power SFM #3 on again
> >> 4) Verify all SFM links are up to SFM#3
> >> 5) Wait 1 minute
> >> 6) Perform steps 1-5 for SFM #2
> >> 7) Perform steps 1-5 for SFM #3
> >>
> >> Not sure you’re seeing the same issue that we see but the “SFM Dance”
> >> (as we call it) is a once-every-four-months thing somewhere across our
> >> 16 XMR4000 boxes. It can be done with little to no impact if you are
> >> patient verify status before moving to the next SFM.
> >
> > That's all interesting.  What code versions is this?  Also, how do you
> > shutdown the SFMs?  I don't recall seeing documentation for that.
> >
> > Jethro.
> >
> >
> >>
> >>> On Feb 13, 2015, at 11:41 AM, nethub at gmail.com wrote:
> >>>
> >>> We have three switch fabrics installed, all are under 1% utilized.
> >>>
> >>>
> >>> From: Jeroen Wunnink | Hibernia Networks [mailto:
> jeroen.wunnink at atrato.com <mailto:jeroen.wunnink at atrato.com>]
> >>> Sent: Friday, February 13, 2015 12:27 PM
> >>> To: nethub at gmail.com <mailto:nethub at gmail.com>; 'Jeroen Wunnink |
> Hibernia Networks'
> >>> Subject: Re: [f-nsp] MLX throughput issues
> >>>
> >>> How many switchfabrics do you have in that MLX and how high is the
> utilization on them
> >>>
> >>> On 13/02/15 18:12, nethub at gmail.com <mailto:nethub at gmail.com> wrote:
> >>>> We also tested with a spare Quanta LB4M we have and are seeing about
> the same speeds as we are seeing with the FLS648 (around 20MB/s or 160Mbps).
> >>>>
> >>>> I also reduced the number of routes we are accepting down to about
> 189K and that did not make a difference.
> >>>>
> >>>>
> >>>> From: foundry-nsp [mailto:foundry-nsp-bounces at puck.nether.net
> <mailto:foundry-nsp-bounces at puck.nether.net>] On Behalf Of Jeroen Wunnink
> | Hibernia Networks
> >>>> Sent: Friday, February 13, 2015 3:35 AM
> >>>> To: foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
> >>>> Subject: Re: [f-nsp] MLX throughput issues
> >>>>
> >>>> The FLS switches do something weird with packets. I've noticed they
> somehow interfere with changing the MSS window size dynamically, resulting
> in destinations further away having very poor speed results compared to
> destinations close by.
> >>>>
> >>>> We got rid of those a while ago.
> >>>>
> >>>>
> >>>> On 12/02/15 17:37, nethub at gmail.com <mailto:nethub at gmail.com> wrote:
> >>>>> We are having a strange issue on our MLX running code 5.6.00c.  We
> are encountering some throughput issues that seem to be randomly impacting
> specific networks.
> >>>>>
> >>>>> We use the MLX to handle both external BGP and internal VLAN
> routing.  Each FLS648 is used for Layer 2 VLANs only.
> >>>>>
> >>>>> From a server connected by 1 Gbps uplink to a Foundry FLS648 switch,
> which is then connected to the MLX on a 10 Gbps port, running a speed test
> to an external network is getting 20MB/s.
> >>>>>
> >>>>> Connecting the same server directly to the MLX is getting 70MB/s.
> >>>>>
> >>>>> Connecting the same server to one of my customer's Juniper EX3200
> (which BGP peers with the MLX) also gets 70MB/s.
> >>>>>
> >>>>> Testing to another external network, all three scenarios get 110MB/s.
> >>>>>
> >>>>> The path to both test network locations goes through the same IP
> transit provider.
> >>>>>
> >>>>> We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the
> Foundry FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly
> connecting the server.  A separate NI-MLX-10Gx4 connects to our upstream
> BGP providers.  Customer’s Juniper EX3200 connects to the same NI-MLX-10Gx4
> as the FLS648.  We take default routes plus full tables from three
> providers by BGP, but filter out most of the routes.
> >>>>>
> >>>>> The fiber and optics on everything look fine.  CPU usage is less
> than 10% on the MLX and all line cards and CPU usage at 1% on the FLS648.
> ARP table on the MLX is about 12K, and BGP table is about 308K routes.
> >>>>>
> >>>>> Any assistance would be appreciated.  I suspect there is a setting
> that we’re missing on the MLX that is causing this issue.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> foundry-nsp mailing list
> >>>>> foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
> >>>>> http://puck.nether.net/mailman/listinfo/foundry-nsp <
> http://puck.nether.net/mailman/listinfo/foundry-nsp>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Jeroen Wunnink
> >>>> IP NOC Manager - Hibernia Networks
> >>>> Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300
> >>>> Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
> >>>> jeroen.wunnink at hibernianetworks.com <mailto:
> jeroen.wunnink at hibernianetworks.com>
> >>>> www.hibernianetworks.com <http://www.hibernianetworks.com/>
> >>>
> >>>
> >>> --
> >>>
> >>> Jeroen Wunnink
> >>> IP NOC Manager - Hibernia Networks
> >>> Main numbers (Ext: 1011): USA +1.908.516.4200 | UK +44.1704.322.300
> >>> Netherlands +31.208.200.622 | 24/7 IP NOC Phone: +31.20.82.00.623
> >>> jeroen.wunnink at hibernianetworks.com <mailto:
> jeroen.wunnink at hibernianetworks.com>
> >>> www.hibernianetworks.com <http://www.hibernianetworks.com/
> >_______________________________________________
> >>> foundry-nsp mailing list
> >>> foundry-nsp at puck.nether.net <mailto:foundry-nsp at puck.nether.net>
> >>> http://puck.nether.net/mailman/listinfo/foundry-nsp <
> http://puck.nether.net/mailman/listinfo/foundry-nsp>
> >>
> >
> > .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .
> > Jethro R Binks, Network Manager,
> > Information Services Directorate, University Of Strathclyde, Glasgow, UK
> >
> > The University of Strathclyde is a charitable body, registered in
> > Scotland, number SC015263.
>
>
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp at puck.nether.net
> http://puck.nether.net/mailman/listinfo/foundry-nsp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/foundry-nsp/attachments/20150216/942c4a72/attachment.html>