[f-nsp] MLX throughput issues

Thu Feb 12 19:34:20 EST 2015

Thanks for your response, Frank.

I do mean megabytes per second (i.e. 20MB/s = 160 Mbps, 70MB/s = 560 Mbps,
110MB/s = 880 Mbps).

I am thinking that the FLS648 switches are not likely responsible since I
was able to get 110MB/s to another external network with all three scenarios
(server to FLS648 to MLX, server to MLX direct, server to EX3200 to MLX).
The FLS648 is layer 2 only, so I don't see how it would be interfering with
the throughput to one network and not to another.  The problem is also
occurring on servers attached to multiple FLS648 that are each directly
connected to the MLX, so it is across different 10G cards, optics, slots on
the MLX chassis, etc.

The remote server doesn't seem to be having any issues since I was able to
get 70MB/s to it from connecting directly to the MLX and from connecting
through the EX3200.  It is only from behind the FLS648 that I run into
issues.

As I stated in the first message, the Juniper EX3200 is a downstream BGP
customer that is single homed to our network, so it is on a different ASN
and the communication between my network and his network is layer 3.

Any additional insight would be appreciated.

From: Frank Bulk [mailto:frnkblk at iname.com] 
Sent: Thursday, February 12, 2015 6:48 PM
To: nethub at gmail.com; foundry-nsp at puck.nether.net
Subject: RE: [f-nsp] MLX throughput issues

Based on what you described it seems more to be the case that the FLS648 is
dropping throughput from ~70 Mbps to 20 Mbps (I presume you mean bits, not
bytes when you write MB/s).

How do you know that the remote speed server is not maxed out?  Or that your
uplink is not maxed out?

Frank

From: foundry-nsp [mailto:foundry-nsp-bounces at puck.nether.net] On Behalf Of
nethub at gmail.com
Sent: Thursday, February 12, 2015 11:38 AM
To: foundry-nsp at puck.nether.net
Subject: [f-nsp] MLX throughput issues

We are having a strange issue on our MLX running code 5.6.00c.  We are
encountering some throughput issues that seem to be randomly impacting
specific networks.

We use the MLX to handle both external BGP and internal VLAN routing.  Each
FLS648 is used for Layer 2 VLANs only.

>From a server connected by 1 Gbps uplink to a Foundry FLS648 switch, which
is then connected to the MLX on a 10 Gbps port, running a speed test to an
external network is getting 20MB/s.

Connecting the same server directly to the MLX is getting 70MB/s.

Connecting the same server to one of my customer's Juniper EX3200 (which BGP
peers with the MLX) also gets 70MB/s.

Testing to another external network, all three scenarios get 110MB/s.

The path to both test network locations goes through the same IP transit
provider.

We are running NI-MLX-MR with 2GB RAM, NI-MLX-10Gx4 connect to the Foundry
FLS648 by XFP-10G-LR, NI-MLX-1Gx20-GC was used for directly connecting the
server.  A separate NI-MLX-10Gx4 connects to our upstream BGP providers.
Customer's Juniper EX3200 connects to the same NI-MLX-10Gx4 as the FLS648.
We take default routes plus full tables from three providers by BGP, but
filter out most of the routes.

The fiber and optics on everything look fine.  CPU usage is less than 10% on
the MLX and all line cards and CPU usage at 1% on the FLS648.  ARP table on
the MLX is about 12K, and BGP table is about 308K routes.

Any assistance would be appreciated.  I suspect there is a setting that
we're missing on the MLX that is causing this issue.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/foundry-nsp/attachments/20150212/2ada5888/attachment-0001.html>