[c-nsp] ASR / IOS XE: CEF load-sharing algorithms changed?
Rodney Dunn
rodunn at cisco.com
Tue Nov 25 12:34:59 EST 2008
Elmar,
I think I recreated it.
Can you remove this and check it again:
ip cef accounting non-recursive
Then check the programming via:
sh platform software ip f0 cef summ
get the id:
sh plat hard cpp active feature cef prefix ip z.z.z.z | incl Next
to check it.
Rodney
On Wed, Nov 05, 2008 at 01:24:48PM +0100, Elmar K. Bins wrote:
> Re again,
>
> I am running into trouble with the CEF load sharing algorithm
> on the ASR / IOS-XE platform. We've had this kind of setup
> with 7301s for four years now, and it's never given us any
> trouble. Distributed traffic pretty evenly (whenever it was
> not only one or two top-talkers hitting us).
>
> With the new ASR / IOS-XE (1.1.2 currently, but I have found
> nothing in the release notes of later versions) traffic
> distribution has become in favour of the server with the
> lowest IP address - very much so. It's getting 85% of all
> packets.
>
> The setup in brief (all IPv4):
>
> z.z.z.z = Service address
>
> a.a.a.a, a.a.a.b, a.a.a.c = Interface addresses of three servers,
> a<b<c; these serve as next-hops
> a.a.a.d = Interface address of the ASR
>
> External routing gets z.z.z.z to the ASR.
>
> +--------+ ----(a.a.a.a)-[srv1]
> (Internet) --- | Router |-(a.a.a.d)---+---(a.a.a.b)-[srv2]
> +--------+ ----(a.a.a.c)-[srv3]
>
>
> z.z.z.z is the only target address, all external traffic goes there,
> and it does go to a specific port. This is a DNS setup, so we can
> also assume that 99.9% of the protocols seen is UDP/53.
>
> Routing on the Router is as follows:
>
> rt#sh ip route static
> ip route z.z.z.z 255.255.255.255 a.a.a.a
> ip route z.z.z.z 255.255.255.255 a.a.a.b
> ip route z.z.z.z 255.255.255.255 a.a.a.c
>
> rt#sh ip cef z.z.z.z
> z.z.z.z/32
> nexthop a.a.a.a GigabitEthernet0/0/3
> nexthop a.a.a.b GigabitEthernet0/0/3
> nexthop a.a.a.c GigabitEthernet0/0/3
>
>
> rt#sh run | i cef
> ip cef load-sharing algorithm tunnel 000FFEED
>
>
> On 7301s, typical distribution is 3:4:3 or something like that.
> On the ASR I see 10:1:2 (on srv1:srv2:srv3).
>
> This did change immediately through the replacement of the 7301 by the ASR.
> My colleague tells me, we have not one but several (like a dozen) top
> talkers (out of several million), just like before the router swap.
>
> What could cause this phenomenon?
>
> 1. Traffic pattern has changed.
> -> my colleague denies this
>
> 2. The tunnel balancing algorithm (which to my knowledge includes
> source/dest IP addresses _and_ ports) has been altered.
>
> 3. The tunnel balancing algorithm (which to my knowledge includes
> source/dest IP addresses _and_ ports) is now buggy.
>
>
> Experiment 1
>
> Changing the algorithm to "include-ports source".
>
> Did not change the traffic pattern a bit. I didn't expect a
> change, since AFAIK it would do the same as the "tunnel" algorithm.
>
>
> Experiment 2
>
> I added a.a.a.d to srv1, a.a.a.e to srv2 and a.a.a.f to srv3 and
> the appropriate routes:
>
> rt#sh ip route static
> ip route z.z.z.z 255.255.255.255 a.a.a.a
> ip route z.z.z.z 255.255.255.255 a.a.a.b
> ip route z.z.z.z 255.255.255.255 a.a.a.c
> ip route z.z.z.z 255.255.255.255 a.a.a.d
> ip route z.z.z.z 255.255.255.255 a.a.a.e
> ip route z.z.z.z 255.255.255.255 a.a.a.f
>
> rt#sh ip cef z.z.z.z
> z.z.z.z/32
> nexthop a.a.a.a GigabitEthernet0/0/3
> nexthop a.a.a.b GigabitEthernet0/0/3
> nexthop a.a.a.c GigabitEthernet0/0/3
> nexthop a.a.a.d GigabitEthernet0/0/3
> nexthop a.a.a.e GigabitEthernet0/0/3
> nexthop a.a.a.f GigabitEthernet0/0/3
>
>
> This changed the distribution pattern from 10:1:2 to a somewhat
> better 5:1:2.
>
> It still shows a strong favouring of the server with the smallest
> IP address.
>
>
> Experiment 3
>
> I removed the z.z.z.z -> a.a.a.d route, so that Server 1 would
> only have 1/5 of the routing table pointing to it, while Servers
> 2 and 3 get twice as many slots in routing and forwarding table.
> I'll spare you the cef output here.
>
> This changed the distribution pattern - not at all, at least not
> noticeably.
>
>
> I wonder what I have stumbled onto here, and whether someone around
> or at Cisco knows about a change in the algorithms that would lead
> to such an effect.
>
> I would also be very interested in some paper that really explained
> the load-sharing algorithms, since everything one can find about the
> tunnel algorithm is:
>
> "The tunnel keyword sets the load-balancing algorithm to one
> that can be used in tunnel environments or in environments
> where there are only a few IP source and destination address
> pairs. "
>
>
> I appreciate any help - the server is still holding, but it's
> really bad Karma, and I'd like to find a way to do my L3 poor
> man's load balancing in a working fashion.
>
> Elmar.
> _______________________________________________
> cisco-nsp mailing list cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
More information about the cisco-nsp
mailing list