[j-nsp] High latency and slow connections

Blaz Zupan blaz at inlimbo.org
Thu Nov 6 15:49:46 EST 2003


Yesterdays voodoo seems to be continuing today. After the reboot and upgrade
from JunOS 5.5 to JunOS 5.7, the box seemed to behave itself with normal
latency below 1 ms for traffic going from one gig VLAN to another gig VLAN
through the M5.

But today when we reached this days peak utilization, latency started to ramp
up again and it "stabilized"  around 30 ms for traffic going through the
gigabit ethernet on the box from one VLAN to another VLAN (from server 1 to
server 2):

                           ___ server 1
                          /
Juniper M5 --- Cisco 3550
                          \___ server 2

This time I did some experimentation. I started with CoS and configured the
"best-effort" forwarding class with a buffer size of 0 percent:

scheduler-maps {
	data-scheduler {
		buffer-size percent 0;
	}
}

As soon as I commited this, the latency dropped below 1 ms. *BUT*, now I saw
about 2% packet loss on pings going from one VLAN to the other VLAN through
the M5. So, apparently the box fills up the queue on the gigabit PIC and when
the queue is full it starts to buffer packets which shoots up the latency. If
I remove the buffer, it instead drops the excess packets as it can't do
anything else with them when the queue is full.

Somebody might say, normal behaviour for a congested link. Sure, but at the
time this was happening, the gigabit was doing about 130 Mbps in both
directions. So either I can't read or I have the worlds first gigabit ethernet
that only does 130 Mbps. Even if you consider traffic spikes, they can't shoot
up from a 1 second average of 130 Mbps to a 1 second average of 1 Gbps to be
able fill up the queues on the PIC.

Now later that day, latency suddenly dropped below 1 ms even with "buffer-size
percent 95". Looking at the traffic rate on the gigabit PIC, it was around 100
Mbps. As soon as the traffic rate again went above 130 Mbps, latency was again
around 30 ms. So, 130 Mbps seems to be the "sweet spot".

To make sure there's no mistakes in my CoS configuration, I deleted the
complete class-of-service hierarchy from the configuration. There are no
firewall filters or policers on any of the VLANs on the gigabit ethernet
except for a firewall filter the classifies traffic from our VoIP gateways and
puts them into the VoIP queue. I removed that as well.  We do have
"encapsulation vlan-ccc" configured, as a couple of Layer 2 VPN's terminate on
this box. But otherwise, there's nothing unusual in there that could be
affecting the box in this way.

With all this information, I can actually partly explain yesterdays weirdness.
Apparently our traffic utilization on the gigabit PIC went above 130 Mbps for
the first time yesterday, that's why we didn't see the high latency until
yesterday. Looking at our mrtg graphs, this indeed seems to be the case.

A spare gigabit PIC which we need for another project should be shipped any
time now, so I'll try to replace the PIC as soon as the spare arrives.

Other than hardware, does anyone have any suggestions? What kind of stupidity
could I have commited to the configuration to degrade a gigabit ethernet link
to the level of a STM-1?


More information about the juniper-nsp mailing list