[j-nsp] MPLS-in-MPLS mtu

Jared Mauch jared at puck.nether.net
Mon Apr 16 08:21:50 EDT 2007


On Mon, Apr 16, 2007 at 06:26:38AM -0400, Jeff S Wheeler wrote:
> On Mon, 2007-04-16 at 10:32 +0100, Alex wrote:
> > Jared,
> > Max BGP message size is 4096 bytes anyway and I cannot possibly see how 9K 
> > MTU can further increase tcp performance in case of iBGP in comparison with 
> > MTU of 4470.
> > Rgds
> > Alex
> 
> It's sometimes foolish to run iBGP sessions with a TCP MSS > 536 bytes.
> This is the minimum allowed for IP networks, and there should never be a
> router transporting your iBGP session which cannot forward TCP segments
> conforming to this minimum MSS value.
> 
> A topology change might happen and cause your iBGP session, which under
> nominal conditions is transported over a 4470 MTU path, to pass across a
> 1500 MTU link.  Now your TCP stack is sending packets larger than the
> path MTU, which will either be fragmented or discarded.
> 
> The minimum values are the safest ones for iBGP, in my opinion; and the
> benefit to using any larger TCP MSS for iBGP is poor given the CPU time
> your router spends in its TCP/IP stack vs other tasks like calculating
> routes.  You could argue that converge time is improved if the iBGP TCP
> session is more efficient, but that benefit goes away the first time
> your operators page you at 4am to find out why an iBGP session flaps
> when a topology change favoring a lower-MTU core link happens.  :-)

	If you don't have sufficent control over your network
to know the underlying mtu of the links (be they l2vpn-type or
otherwise carrier provided) you have some serious issues.

	There was a presentation years ago at IETF where Cisco showed
the performance increases of enabling path mtu discovery and how this
took those bgp message sizes grew from the cisco default to match your
link mtu.

	I'm advocating a consistent internal MTU for your network, be that
1500, 4470, or something larger.  If the underlying transport does not
support it and you are dealing with broken host stacks from your vendors
then you should discontinue using their equipment until they
repair these critical defects.

	Scaling your bgp update messages to something larger than 500 bytes
can have a significant win in route convergence as we're all carrying
voip and other similar sensitive traffic on our networks (even if we don't
know what all that sensitive traffic is).

	- jared

-- 
Jared Mauch  | pgp key available via finger from jared at puck.nether.net
clue++;      | http://puck.nether.net/~jared/  My statements are only mine.


More information about the juniper-nsp mailing list