Re: Wither irtf-rr

From: Fred Baker (fred@cisco.com)
Date: Fri Dec 15 2000 - 20:46:14 EST


first thought is "include me". Second thought: include Pepe Garcia
<pepe@cisco.com>, Avri Doria, Bruce Davie, Bruce Cole, and Dave Katz.

At 08:18 PM 12/15/00 -0500, Russ White wrote:
>Fred:
>
>Radia and I chatted about this for about 4 hours on Tuesday
>night, and decided that we at least need to form up a group to
>think about this. It seems that irtf-rr is the right forum for
>this, but she is somewhat concerned about the 'old timers' in bgp
>land jumping in and destroying the effort. We actually made up an
>(incomplete) list of people we thought would be well placed to
>look at the problem (Radia didn't really include herself on the
>list much, but she did include you).
>
>Hal Peterson (HFR land) has the current list. I'm not really
>certain where to go with it from here. Hal is setting up an
>alias, though, and we are thinking about how to invite people
>into the fray without getting religious about the problem at
>hand.
>
>Thoughts?
>
>:-)
>
>Russ
>
>On Fri, 15 Dec 2000, Fred Baker wrote:
>
> > At 04:27 PM 12/14/00 -0500, abha ahuja wrote:
> > >Great! We'd be happy to work with them... What are their concerns?
> >
> > varied.
> >
> > We're looking at the growth in routes, which is now roughly at the same
> > rate it was in 1993 prior to the deployment of CIDR. We believe that this
> > is primarily related to multihoming, and may be amenable to adjustment
> > using policies. Geoff Huston has suggested that we put a TTL on routes
> > advertised - when transit contracts are in place, maybe I want my
> immediate
> > neighbor to get a certain /28 route, but there is no need to send it
> beyond
> > him although other routes will go beyond him. Attacking the same
> problem, I
> > suggested another approach - if my route to you and my route to your left
> > nostril have the same next hop, I don't need to store the fact in my FIB,
> > nor advertise it beyond me. There is probably another way to solve it, and
> > it may be better.
> >
> > We are also looking at convergence times. Vendors are in the process of
> > deploying a different understanding of a certain paragraph in BGP-4, which
> > would bring global convergence on a withdraw or a new route to roughly 30
> > seconds. But for many networks, 30 seconds and 30 years are still roughly
> > comparable; they would prefer convergence on the order of seconds. The
> > summary offered by Ran Atkinson was that "current convergence times
> leave a
> > lot of things in weird states for long periods of time." I have suggested
> > using some variation on the DUAL algorithm in a next generation BGP in
> > order to improve convergence; I have suggested a variation on EIGRP as a
> > starting point for thinking about the algorithms. I'll attach an email
> sent
> > earlier this week, following the IAB discussion of same. Again, there may
> > be a different or better approach.
> >
> > Once we solve data distribution, I do believe that a better
> organization of
> > the information itself, and the policies, would be useful. I characterize
> > the way we use BGP now as "building skyscrapers out of tinkertoys". I
> would
> > be very much in favor of improved approaches. NIMROD, I believe, has a lot
> > of the right pieces, although I can't describe MPLS as my favorite
> > implementation of it. That will take more exploration than I personally
> > have begun.
> >
> >
> > At 03:55 PM 12/13/00 -0800, I wrote:
> > >attached is a story, which as of last night I understand needs some
> > >updating, something about 30 seconds. I'd be curious as to your comments
> > >on the rest of it. The rest of this is an email I sent internally, and I
> > >would be interested in your comments.
> > >
> > >The reference to EIGRP is to a Cisco proprietary IGP. It is essentially
> > >distance vector, with two clicks on that. One is that it uses the DUAL
> > >Algorithm to improve convergence properties; the second is that it (like
> > >SPF-based protocols) builds a lattice of routers and hangs routes off
> them.
> > >
> > >If you are not familiar with, or don't remember, DUAL, it was proposed by
> > >JJ Garcia in the late 1980's as a way of improving distance vector
> > >protocols like RIP etc. In essence, there is a lot that we fix in
> > >RIP/IGRP/BGP/etc using some combination of heuristics that almost work -
> > >poison reverse (aka withdraw), split horizon, and hold-downs. hold-downs
> > >tend to trump anything else; if I can avoid learning anything until after
> > >the network converges, I only learn good information. But to acheive
> that,
> > >they often have to be impossibly long. DUAL installs what amounts to a
> > >dynamic hold-down which is exactly as long as it needs to be and no
> longer.
> > >
> > >The idea behind DUAL, in simplest form, is that I put a sequence
> number, a
> > >timer, a list of unresponsive neighbors, a list of neighbors needing an
> > >acknowledgement, and a state on each route. If I get new information
> about
> > >a route which I need to propagate, and the state of the route is "not
> > >busy", I make the state "busy", bump the sequence number, advertise the
> > >sequenced routed to my neighbors, and list them as needing a response. If
> > >I don't receive a response in mumble time units, I retransmit. to that
> > >neighbor. I do that until the neighbor is deemed to have failed or
> until I
> > >get a response. The neighbor therefore receives the route; either it is
> > >new information or it is not, and either the router has neighbors besides
> > >me or not. If it is new information and he has neighbors besides me, he
> > >repeats the sequence to his neighbors. It it is not new information or he
> > >has no other neighbors, he replies "got it". The guy who advertised the
> > >route to him now removes that neighbor from his list of folks he needs
> > >responses from; when the list goes empty, he makes the route "not busy"
> > >and responds to the folks who advertised it to him. While the route is
> > >busy, it is treated as "in hold down"; when it becomes unbusy, it is
> > >clearly able to change. The nice thing about this is that we can assure
> > >convergence with respect to any given advertised route within a fairly
> > >nominal time period.
> > >
> > >The bit about the lattice of routers makes for some interesting
> > >information management. If I consider a router, from a routing
> > >perspective, to be a network element that advertises a bag of
> information,
> > >each route within which is an identified smaller bag of information, then
> > >knowing that I have lost a router implies that I have lost all the routes
> > >that it specifically is my connection towards. I therefore only need to
> > >make sure that the path to the router is stable, not the path to each of
> > >its routes. Hence I can separate between "hello" exchanges and "I have
> > >information for you" exchanges.
> > >
> > >I'm not sure how much I want to bite off here, or in what sized chunks. I
> > >believe that our experience with DUAL would help us to implement it in
> > >some variation of BGP, which would improve convergence. The latter is
> also
> > >interesting; from a NIMROD perspective, if I replaced the word "router"
> > >with "addressable entity, which might be a router or any set of routers,
> > >including one or more AS's", I might be able to build that towards
> some of
> > >the concepts from NIMROD. There might be value in that direction.
> > >
> > >Here is the other email I mentioned:
> > >
> > >
> > >At 03:27 PM 12/6/00 -0800, I wrote:
> > >>finally getting back to you. Sitting on an airplane lets you do one
> thing
> > >>for a while...
> > >>
> > >>At 10:25 AM 11/22/00 -0800, Pepe (Joseph) Garcia wrote:
> > >>>I don't know if your ideas include new protocols, or improvements
> > >>>to the current ones. I'm hoping it's both.
> > >>>
> > >>>Currently I'm pushing to focus on big performance improvements for
> > >>>the current protocols: OSPF, IS-IS, EIGRP, and BGP. Since in the
> > >>>past there's been a lot of emphasis on adding new features, but at
> > >>>the expense of performance and quality ;-(.
> > >>>
> > >>>I know a lot of people have bought into the MPLS track, but I
> > >>>wonder why we haven't made improvements in what people now call
> > >>>"classical IP routing." I'd like to see if we can provide another
> > >>>choice that's more familiar to people: IP ;-).
> > >>>
> > >>>Sorry for the rambling. At any rate, I'm all ears... I definitely
> > >>>need your ideas/vision in this area. I'm ready and willing to
> > >>>push for new ideas.
> > >>
> > >>Violent agreement on all fronts. Specific to MPLS, I have no problem
> with
> > >>using MPLS for what it brings to the party - the ability to enumerate
> > >>routes, which perhaps then means the ability to traffic engineer them or
> > >>build VPNs out of them - but I have a real problem with making the
> > >>Internet depend on them. They might make sense for optical networks, for
> > >>which a lambda has a lot of the same characteristics as an MPLS path.
> But
> > >>I see them as frosting, something added on to the fundamentals, not as
> > >>themselves fundamental. I looked around for some already-written emails
> > >>to forward to you - heaven knows I have been going on about this for a
> > >>while - but I don't see them. So I'll take it from the top.
> > >>
> > >>First, let's enumerate what problems we might want to fix. I think they
> > >>come down to:
> > >>
> > >> convergence time
> > >> complexity
> > >> number of routes
> > >> quality of routes
> > >> traffic engineering
> > >>
> > >>by "convergence time", I mean the time it takes for a reroute to
> > >>converge. If you look at www.irtf.org, click on the routing research
> > >>group, and then on the chair's introductory slide show, you will find a
> > >>number of measurements he has taken that suggest that typical BGP
> > >>convergence is on the order of 90 seconds. That's a long time. He
> > >>suggests that it is for a variety of reasons, not the least of which is
> > >>that BGP doesn't particularly try to converge quickly - it is straight
> > >>distance vector, and makes distance vector's problems worse in a number
> > >>of ways.
> > >>
> > >>by "complexity", I am not referring to the current growth in BGP routes,
> > >>which is exponential. I am referring to the complexity of installing and
> > >>configuring a routing protocol. OSPF is a bucket of bolts, and anyone
> who
> > >>turns it on winds up configuring a lot of stuff. BGP isn't properly a
> > >>routing protocol at all - it is an information distribution protocol,
> and
> > >>the actual routing is done by the route policy. Route policies are often
> > >>pretty complex; you can limit yourself to saying "I will accept only
> > >>these routes from this guy" if you like, but route policy often gets
> into
> > >>preferring this to that for arcane reasons, and so one. The
> complexity is
> > >>a real issue in hiring - I'm told that almost every BGP Policy
> contains a
> > >>serious bug o some kind, and generally our customers don't have the
> > >>expertise to realize that, let alone debug them. Hence tools we have
> > >>developed to help them. But what I worry about is that we are now
> > >>building skyscrapers out of tinker-toys; it would be really nice if we
> > >>could collect some of the information architecturally and build them out
> > >>of larger building blocks.
> > >>
> > >>That said, the growth in IP routes is pretty bad, and I suspect that
> > >>there are some pretty simple things we can do to limit that. Many of the
> > >>routes are multihomed /24 and smaller prefixes being punched through
> > >>alternate carriers. Many times, I suspect, the result of the route
> > >>calculation is that a /31 or a /24 has the same next hop as some larger
> > >>aggregate. Whenever that is true, I will argue that we have no strong
> > >>reason to set up a separate MPLS tunnel for it, to store the route in
> our
> > >>own FIB, or pass the route to anyone else. I'd love to see a simple
> > >>"simplify routes" mode that tells BGP to look for such cases and not
> pass
> > >>them along when they apply. There are some issues there, such as the
> fact
> > >>that if we literally did that, the network that the smaller prefix was
> > >>out of might not advertise the prefix on. We can overcome that, I think.
> > >>
> > >>Another is the quality of the routes we calculate. Generally, what we
> > >>work towards is the one that gives us the least number of hops or some
> > >>related administrative metric. In BGP, we tend to go for the shortest AS
> > >>path, to ensure convergence, and when folks want something different,
> > >>they have to go through conniption fits to defeat routing - ask Digital
> > >>Island, etc. But it turns out that we actually want something rather
> > >>different: what we would really like is the route that gives us the most
> > >>available path or the shortest round trip time. Example: Geoff Huston
> > >>gave me two traceroutes that go from Australia to Singapore. The
> shortest
> > >>in number of AS's goes via the US west coast, meaning it crosses the
> > >>pacific (130 ms delay one way) twice; the longer number of AS's has an
> > >>end to end delay on the order of 50 ms or less. It seems to me that
> > >>creating non-replicating paths is simple if we can by policy (a) not
> > >>accept routes from a peer that contain our AS number in the path and (b)
> > >>not offer routes to a peer that contain its AS number, or more generally
> > >>not accept routes that loop and not offer routes that will cause our
> > >>neighbor to loop. Identifying the "best" route is therefore not a matter
> > >>of selecting the path that has the least number of AS's, but selecting
> > >>the route that is "best" from some other perspective, such as round trip
> > >>or one-way delay.
> > >>
> > >>And then there is traffic engineering. We do what we do with MPLS in
> > >>order to achieve that. But I do find myself wondering if there is
> > >>something we could do that would improve matters in IP. For example,
> > >>while I understand the oscillation that happened in BBN SPF twenty years
> > >>ago, I'm not sure the experience is as relevant as we think. We have
> > >>shied away from anything time-based or load-based as oscillatory. I
> > >>wonder whether the influx of bandwidth affects that at all. Suppose, for
> > >>example, that route metrics told me the mean drop rate per kilobit
> > >>transferred, with a minimum value of ping round trip delay of the line?
> > >>what I wind up with by default is the network path with the shortest
> RTT,
> > >>and when the line starts dropping, we start spreading load. Suppose that
> > >>BGP had a "round trip to target" attribute, and which contained either
> > >>the mean traceroute delay to the network or the sum of the TCP RTTs of
> > >>the BGP sessions going to the network, and that we had a verb that said
> > >>"when presented with two or more potential routing paths, take the path
> > >>with the shortest cumulative delay"?
> > >>
> > >>
> > >>Several of these can be handled within the existing routing protocols,
> > >>and would be options that might be worth exploring. Improving BGP
> > >>convergence, I think, requires an improved protocol. There, I would
> argue
> > >>for the algorithms used in EIGRP, which are designed to improve
> > >>convergence in a distance vector protocol. Suppose we used something
> very
> > >>EIGRP-ish - in a standard protocol - as a way to distribute routing
> > >>information (nodes on the tree being AS's rather than routers, at least
> > >>in non-IBGP cases), but basically distributing BGP routing
> information? I
> > >>think we could pretty readily make the case that convergence improves.
> > >>
> > >>Just a start. Maybe you have some notions.
> >
>
>_____________________________
>riw@cisco.com <>< Grace Alone



This archive was generated by hypermail 2b29 : Mon Aug 04 2003 - 04:10:04 EDT