[c-nsp] BGP multipath load balancing.. broken sessions upon hash change

Pete Lumbis alumbis at gmail.com
Wed Sep 9 09:17:21 EDT 2015


I'm not aware of a resilient hashing scenario that handles this at all.
Generally something like 1/n (n = number of existing next hops) will change
and impact the flows that got changed.

The problem is that the number of buckets is fixed, which allows the system
to know how to repopulate.

In my example, there are exactly 12 buckets. To fill in the gap gracefully
on failure the system ensures that all 12 buckets are always populated.

When there is an addition you can't add buckets, so you have to reshuffle
to make the new next hop fit within the limited number of buckets.

For example:


A, B, C, D, A, B, C, D, A, B, C, D

Server E is added

A, B, C, D, *E, *A, B, C, D, *E, *A, B


This is a rough idea of how it works. I still have 12 buckets but now I've
bumped the existing next hops. Bucket 5 now goes to server E, not server A.

That I'm aware of, there are two real solutions to this problem:

 1.) Deal with it. Plan service additions during low traffic periods


 2.) Implement a software loadbalancing layer. For a given service each
node would be able to hash and say "this flow should have come to me, or it
should have gone to node $x" and redirect as necessary. This is better than
a full front end LB since it's only balancing traffic for that anycast IP.


-Pete

On Tue, Sep 8, 2015 at 2:19 PM, Peter Kranz <pkranz at unwiredltd.com> wrote:

> Hi Pete,
>
>                 Thank you very much for this response. It appears to
> resilient hashing handles the concept of node removal without causing a
> re-calculation. How well does it handle the scenario where you are adding a
> new node, or where a failed node returns?
>
>
>
> -Peter
>
>
>
> *From:* Pete Lumbis [mailto:alumbis at gmail.com]
> *Sent:* Thursday, September 03, 2015 2:02 PM
> *To:* Peter Kranz <pkranz at unwiredltd.com>
> *Cc:* cisco-nsp at puck.nether.net
> *Subject:* Re: [c-nsp] BGP multipath load balancing.. broken sessions
> upon hash change
>
>
>
> What you need is resilient hashing, which is supported on the Broadcom
> Trident 2 chipset by all the vendors that use it (Nexus 3k, Arista
> platforms, Dell S4048/S6000 with Cumulus Linux). I'm not aware of Cisco
> custom chips that do this.
>
> The way resilient hashing works is that it pre-populates a large number of
> buckets, say 1024 and then takes your list of next hops and just repeats
> them.
>
> A, B, C, D, A, B, C, D, A, B, C, D....
>
> If a next hop fails, it just plugs in the hole with the still living next
> hops. Say B fails.
>
> A, *A*, C, D, A, *C*, C, D, A, *D*, C, D....
>
> Anything that was going to B dies anyway, but you don't have to re-shuffle
> the existing buckets.
>
> The downside is that if you add a new nexthop you have to shuffle again,
> but you get what you pay for :)
>
>
>
> -Pete
>
>
>
> On Wed, Sep 2, 2015 at 4:49 PM, Peter Kranz <pkranz at unwiredltd.com> wrote:
>
> I’m using bgp maximum-paths and several peers announcing the same /32 to
> create a poor man’s load balancer. This works well with up to 16 peers
> after
> which the CEF number of buckets is exceeded.
>
> However, if the number of connected peers change, all sessions break, which
> I would like to avoid.
>
> For example:
> - 10 machines are advertising a path to the /32
> - SSH is opened to one machine via the advertised IP address
> - 1 machine stops advertising, bringing the pool to 9
> - SSH connection breaks a little while later
>
>  Conversely when adding another machine to the pool, a similar experience:
> - 9 machines are advertising a path to the /32
> - SSH is opened to one machine via the advertised IP address
> - 1 machines starts advertising, bringing the pool to 10
> - SSH connection breaks immediately
>
> Is there a solution to keep the client session sticky to the BGP peer it
> was
> initially started on? I am using per-destination load balancing. My
> suspicion is that upon a change in the number of connected peers, the CEF
> hash buckets are reset and renumbered, breaking all connections.
>
> Peter Kranz
> www.UnwiredLtd.com
> Desk: 510-868-1614 x100
> Mobile: 510-207-0000
> pkranz at unwiredltd.com
>
>
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
>
>


More information about the cisco-nsp mailing list