[j-nsp] Link aggregation and RSVP

Thu May 6 10:42:46 EDT 2010

On Thu, May 06, 2010 at 03:09:20PM +1200, Kris Price wrote:
> Hi all,
> 
> The subject is using link aggregation with RSVP. Or just generally the 
> subject of how best to handle multiple parallel links between adjacent 
> routers in a MPLS network. I'm hoping the list can provide guidance on 
> the best practise for this, or point me to useful material.
> 
> The context is outside of a data centre, in the core of a national 
> carrier network (P/P and P/PE links).
> 
> Link aggregation is the simplest option, but I am weary there may be 
> problems with this approach. E.g. with the hashing algorithms and how 
> this might impact on reported versus actual bandwidth for constrained 
> LSP set up. Or with fast reroute, or failover from primary to secondary 
> LSPs, when one of the links in the bundle fails.

We use RSVP over link-agg when the links are truly parallel (e.g. 8x10GE
DWDM between a pair of routers over the same fiber path), and RSVP over
individual members when the paths are not (e.g. 5 different fiber paths
between New York and DC, each with its own latency and survivability
characteristics). A big advantage to the individual member approach is 
that you can prioritize certain LSPs (premium customers, transport 
customres, etc) over your lower latency paths.

One big problem with link agg is that Juniper has no way to adjust the
RSVP bandwidth of a bundle based on the actual capacity, so for example
if you have a 60G bundle and you lose a 10G member (or two) RSVP will
continue trying to pump 60G into the bundle, with the expected Very Bad
Things (tm) as a result. Of course, if you do lose a member during
non-peak times the link-agg mechanisms do a much faster job of detecting
and recovering from the issue than RSVP does, and you end up avoiding a 
network-wide event with a lot of preemption and resignaling.

A caveat to the individual members approach is that you'll need to make
sure your maximum LSP size doesn't exceed the capacity of any one RSVP
path. This is especially a problem if you do a large amount of traffic
and don't have your IP->RSVP distributed all the way to your edge
(perhaps because your vendor is shipping edge aggregation devices
without mpls support *coughjuniperexcough* :P). For example if you have
a big pile of parallel 10G RSVP links, you'll need to make sure no
individual LSP exceeds (or even gets near :P) 10Gbps. The only way to do
this is to make multiple parallel LSPs, and set the maximum bandwidth of
the LSPs to that of your smallest link. Fortunately Juniper does a
reasonably good job of balancing the traffic across multiple LSPs, but
it can get slightly obnoxious to size how many LSPs you'll need between
router pairs. We do this with an MPLS automesh commit script and
monitoring to make sure our LSP bandwidth stays the right size.

Also note that if you have any Crisco 6500/7600 MPLS speakers in your
network, those boxes do not understand any bandwidth value bigger than
10G (either for LSP bandwidth of RSVP bandwidth), so as soon as you add
one to the mix all of the convenience of big LSPs over RSVP on link-aggs
goes right out the window. When dealing with them, the best solution
seems to be to use a link-agg (because Cisco does NOT balance over RSVP
anywhere near as well as Juniper, and if you don't your packets will
just run back and forth between links like a hamster on meth) and then
run multiple 10G SVIs across that link-agg. This has the effect of being
just like using individual 10G members for the purposes of RSVP, so
you'll still need to apply all the LSP sizing and parallel LSP creation
rules above. Also note that there are several more issues caused by the
NxSVI approach on Crisco, but that is a rant for a different mailing
list. :)

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)