Re: Wither irtf-rr

From: Russ White (ruwhite@cisco.com)
Date: Fri Dec 15 2000 - 21:44:55 EST


I will let Hal know the additions. We will, I hope, have things
up and running quickly.

Russ

On Fri, 15 Dec 2000, Fred Baker wrote:

> first thought is "include me". Second thought: include Pepe Garcia
> <pepe@cisco.com>, Avri Doria, Bruce Davie, Bruce Cole, and Dave Katz.
>
> At 08:18 PM 12/15/00 -0500, Russ White wrote:
> >Fred:
> >
> >Radia and I chatted about this for about 4 hours on Tuesday
> >night, and decided that we at least need to form up a group to
> >think about this. It seems that irtf-rr is the right forum for
> >this, but she is somewhat concerned about the 'old timers' in bgp
> >land jumping in and destroying the effort. We actually made up an
> >(incomplete) list of people we thought would be well placed to
> >look at the problem (Radia didn't really include herself on the
> >list much, but she did include you).
> >
> >Hal Peterson (HFR land) has the current list. I'm not really
> >certain where to go with it from here. Hal is setting up an
> >alias, though, and we are thinking about how to invite people
> >into the fray without getting religious about the problem at
> >hand.
> >
> >Thoughts?
> >
> >:-)
> >
> >Russ
> >
> >On Fri, 15 Dec 2000, Fred Baker wrote:
> >
> > > At 04:27 PM 12/14/00 -0500, abha ahuja wrote:
> > > >Great! We'd be happy to work with them... What are their concerns?
> > >
> > > varied.
> > >
> > > We're looking at the growth in routes, which is now roughly at the same
> > > rate it was in 1993 prior to the deployment of CIDR. We believe that this
> > > is primarily related to multihoming, and may be amenable to adjustment
> > > using policies. Geoff Huston has suggested that we put a TTL on routes
> > > advertised - when transit contracts are in place, maybe I want my
> > immediate
> > > neighbor to get a certain /28 route, but there is no need to send it
> > beyond
> > > him although other routes will go beyond him. Attacking the same
> > problem, I
> > > suggested another approach - if my route to you and my route to your left
> > > nostril have the same next hop, I don't need to store the fact in my FIB,
> > > nor advertise it beyond me. There is probably another way to solve it, and
> > > it may be better.
> > >
> > > We are also looking at convergence times. Vendors are in the process of
> > > deploying a different understanding of a certain paragraph in BGP-4, which
> > > would bring global convergence on a withdraw or a new route to roughly 30
> > > seconds. But for many networks, 30 seconds and 30 years are still roughly
> > > comparable; they would prefer convergence on the order of seconds. The
> > > summary offered by Ran Atkinson was that "current convergence times
> > leave a
> > > lot of things in weird states for long periods of time." I have suggested
> > > using some variation on the DUAL algorithm in a next generation BGP in
> > > order to improve convergence; I have suggested a variation on EIGRP as a
> > > starting point for thinking about the algorithms. I'll attach an email
> > sent
> > > earlier this week, following the IAB discussion of same. Again, there may
> > > be a different or better approach.
> > >
> > > Once we solve data distribution, I do believe that a better
> > organization of
> > > the information itself, and the policies, would be useful. I characterize
> > > the way we use BGP now as "building skyscrapers out of tinkertoys". I
> > would
> > > be very much in favor of improved approaches. NIMROD, I believe, has a lot
> > > of the right pieces, although I can't describe MPLS as my favorite
> > > implementation of it. That will take more exploration than I personally
> > > have begun.
> > >
> > >
> > > At 03:55 PM 12/13/00 -0800, I wrote:
> > > >attached is a story, which as of last night I understand needs some
> > > >updating, something about 30 seconds. I'd be curious as to your comments
> > > >on the rest of it. The rest of this is an email I sent internally, and I
> > > >would be interested in your comments.
> > > >
> > > >The reference to EIGRP is to a Cisco proprietary IGP. It is essentially
> > > >distance vector, with two clicks on that. One is that it uses the DUAL
> > > >Algorithm to improve convergence properties; the second is that it (like
> > > >SPF-based protocols) builds a lattice of routers and hangs routes off
> > them.
> > > >
> > > >If you are not familiar with, or don't remember, DUAL, it was proposed by
> > > >JJ Garcia in the late 1980's as a way of improving distance vector
> > > >protocols like RIP etc. In essence, there is a lot that we fix in
> > > >RIP/IGRP/BGP/etc using some combination of heuristics that almost work -
> > > >poison reverse (aka withdraw), split horizon, and hold-downs. hold-downs
> > > >tend to trump anything else; if I can avoid learning anything until after
> > > >the network converges, I only learn good information. But to acheive
> > that,
> > > >they often have to be impossibly long. DUAL installs what amounts to a
> > > >dynamic hold-down which is exactly as long as it needs to be and no
> > longer.
> > > >
> > > >The idea behind DUAL, in simplest form, is that I put a sequence
> > number, a
> > > >timer, a list of unresponsive neighbors, a list of neighbors needing an
> > > >acknowledgement, and a state on each route. If I get new information
> > about
> > > >a route which I need to propagate, and the state of the route is "not
> > > >busy", I make the state "busy", bump the sequence number, advertise the
> > > >sequenced routed to my neighbors, and list them as needing a response. If
> > > >I don't receive a response in mumble time units, I retransmit. to that
> > > >neighbor. I do that until the neighbor is deemed to have failed or
> > until I
> > > >get a response. The neighbor therefore receives the route; either it is
> > > >new information or it is not, and either the router has neighbors besides
> > > >me or not. If it is new information and he has neighbors besides me, he
> > > >repeats the sequence to his neighbors. It it is not new information or he
> > > >has no other neighbors, he replies "got it". The guy who advertised the
> > > >route to him now removes that neighbor from his list of folks he needs
> > > >responses from; when the list goes empty, he makes the route "not busy"
> > > >and responds to the folks who advertised it to him. While the route is
> > > >busy, it is treated as "in hold down"; when it becomes unbusy, it is
> > > >clearly able to change. The nice thing about this is that we can assure
> > > >convergence with respect to any given advertised route within a fairly
> > > >nominal time period.
> > > >
> > > >The bit about the lattice of routers makes for some interesting
> > > >information management. If I consider a router, from a routing
> > > >perspective, to be a network element that advertises a bag of
> > information,
> > > >each route within which is an identified smaller bag of information, then
> > > >knowing that I have lost a router implies that I have lost all the routes
> > > >that it specifically is my connection towards. I therefore only need to
> > > >make sure that the path to the router is stable, not the path to each of
> > > >its routes. Hence I can separate between "hello" exchanges and "I have
> > > >information for you" exchanges.
> > > >
> > > >I'm not sure how much I want to bite off here, or in what sized chunks. I
> > > >believe that our experience with DUAL would help us to implement it in
> > > >some variation of BGP, which would improve convergence. The latter is
> > also
> > > >interesting; from a NIMROD perspective, if I replaced the word "router"
> > > >with "addressable entity, which might be a router or any set of routers,
> > > >including one or more AS's", I might be able to build that towards
> > some of
> > > >the concepts from NIMROD. There might be value in that direction.
> > > >
> > > >Here is the other email I mentioned:
> > > >
> > > >
> > > >At 03:27 PM 12/6/00 -0800, I wrote:
> > > >>finally getting back to you. Sitting on an airplane lets you do one
> > thing
> > > >>for a while...
> > > >>
> > > >>At 10:25 AM 11/22/00 -0800, Pepe (Joseph) Garcia wrote:
> > > >>>I don't know if your ideas include new protocols, or improvements
> > > >>>to the current ones. I'm hoping it's both.
> > > >>>
> > > >>>Currently I'm pushing to focus on big performance improvements for
> > > >>>the current protocols: OSPF, IS-IS, EIGRP, and BGP. Since in the
> > > >>>past there's been a lot of emphasis on adding new features, but at
> > > >>>the expense of performance and quality ;-(.
> > > >>>
> > > >>>I know a lot of people have bought into the MPLS track, but I
> > > >>>wonder why we haven't made improvements in what people now call
> > > >>>"classical IP routing." I'd like to see if we can provide another
> > > >>>choice that's more familiar to people: IP ;-).
> > > >>>
> > > >>>Sorry for the rambling. At any rate, I'm all ears... I definitely
> > > >>>need your ideas/vision in this area. I'm ready and willing to
> > > >>>push for new ideas.
> > > >>
> > > >>Violent agreement on all fronts. Specific to MPLS, I have no problem
> > with
> > > >>using MPLS for what it brings to the party - the ability to enumerate
> > > >>routes, which perhaps then means the ability to traffic engineer them or
> > > >>build VPNs out of them - but I have a real problem with making the
> > > >>Internet depend on them. They might make sense for optical networks, for
> > > >>which a lambda has a lot of the same characteristics as an MPLS path.
> > But
> > > >>I see them as frosting, something added on to the fundamentals, not as
> > > >>themselves fundamental. I looked around for some already-written emails
> > > >>to forward to you - heaven knows I have been going on about this for a
> > > >>while - but I don't see them. So I'll take it from the top.
> > > >>
> > > >>First, let's enumerate what problems we might want to fix. I think they
> > > >>come down to:
> > > >>
> > > >> convergence time
> > > >> complexity
> > > >> number of routes
> > > >> quality of routes
> > > >> traffic engineering
> > > >>
> > > >>by "convergence time", I mean the time it takes for a reroute to
> > > >>converge. If you look at www.irtf.org, click on the routing research
> > > >>group, and then on the chair's introductory slide show, you will find a
> > > >>number of measurements he has taken that suggest that typical BGP
> > > >>convergence is on the order of 90 seconds. That's a long time. He
> > > >>suggests that it is for a variety of reasons, not the least of which is
> > > >>that BGP doesn't particularly try to converge quickly - it is straight
> > > >>distance vector, and makes distance vector's problems worse in a number
> > > >>of ways.
> > > >>
> > > >>by "complexity", I am not referring to the current growth in BGP routes,
> > > >>which is exponential. I am referring to the complexity of installing and
> > > >>configuring a routing protocol. OSPF is a bucket of bolts, and anyone
> > who
> > > >>turns it on winds up configuring a lot of stuff. BGP isn't properly a
> > > >>routing protocol at all - it is an information distribution protocol,
> > and
> > > >>the actual routing is done by the route policy. Route policies are often
> > > >>pretty complex; you can limit yourself to saying "I will accept only
> > > >>these routes from this guy" if you like, but route policy often gets
> > into
> > > >>preferring this to that for arcane reasons, and so one. The
> > complexity is
> > > >>a real issue in hiring - I'm told that almost every BGP Policy
> > contains a
> > > >>serious bug o some kind, and generally our customers don't have the
> > > >>expertise to realize that, let alone debug them. Hence tools we have
> > > >>developed to help them. But what I worry about is that we are now
> > > >>building skyscrapers out of tinker-toys; it would be really nice if we
> > > >>could collect some of the information architecturally and build them out
> > > >>of larger building blocks.
> > > >>
> > > >>That said, the growth in IP routes is pretty bad, and I suspect that
> > > >>there are some pretty simple things we can do to limit that. Many of the
> > > >>routes are multihomed /24 and smaller prefixes being punched through
> > > >>alternate carriers. Many times, I suspect, the result of the route
> > > >>calculation is that a /31 or a /24 has the same next hop as some larger
> > > >>aggregate. Whenever that is true, I will argue that we have no strong
> > > >>reason to set up a separate MPLS tunnel for it, to store the route in
> > our
> > > >>own FIB, or pass the route to anyone else. I'd love to see a simple
> > > >>"simplify routes" mode that tells BGP to look for such cases and not
> > pass
> > > >>them along when they apply. There are some issues there, such as the
> > fact
> > > >>that if we literally did that, the network that the smaller prefix was
> > > >>out of might not advertise the prefix on. We can overcome that, I think.
> > > >>
> > > >>Another is the quality of the routes we calculate. Generally, what we
> > > >>work towards is the one that gives us the least number of hops or some
> > > >>related administrative metric. In BGP, we tend to go for the shortest AS
> > > >>path, to ensure convergence, and when folks want something different,
> > > >>they have to go through conniption fits to defeat routing - ask Digital
> > > >>Island, etc. But it turns out that we actually want something rather
> > > >>different: what we would really like is the route that gives us the most
> > > >>available path or the shortest round trip time. Example: Geoff Huston
> > > >>gave me two traceroutes that go from Australia to Singapore. The
> > shortest
> > > >>in number of AS's goes via the US west coast, meaning it crosses the
> > > >>pacific (130 ms delay one way) twice; the longer number of AS's has an
> > > >>end to end delay on the order of 50 ms or less. It seems to me that
> > > >>creating non-replicating paths is simple if we can by policy (a) not
> > > >>accept routes from a peer that contain our AS number in the path and (b)
> > > >>not offer routes to a peer that contain its AS number, or more generally
> > > >>not accept routes that loop and not offer routes that will cause our
> > > >>neighbor to loop. Identifying the "best" route is therefore not a matter
> > > >>of selecting the path that has the least number of AS's, but selecting
> > > >>the route that is "best" from some other perspective, such as round trip
> > > >>or one-way delay.
> > > >>
> > > >>And then there is traffic engineering. We do what we do with MPLS in
> > > >>order to achieve that. But I do find myself wondering if there is
> > > >>something we could do that would improve matters in IP. For example,
> > > >>while I understand the oscillation that happened in BBN SPF twenty years
> > > >>ago, I'm not sure the experience is as relevant as we think. We have
> > > >>shied away from anything time-based or load-based as oscillatory. I
> > > >>wonder whether the influx of bandwidth affects that at all. Suppose, for
> > > >>example, that route metrics told me the mean drop rate per kilobit
> > > >>transferred, with a minimum value of ping round trip delay of the line?
> > > >>what I wind up with by default is the network path with the shortest
> > RTT,
> > > >>and when the line starts dropping, we start spreading load. Suppose that
> > > >>BGP had a "round trip to target" attribute, and which contained either
> > > >>the mean traceroute delay to the network or the sum of the TCP RTTs of
> > > >>the BGP sessions going to the network, and that we had a verb that said
> > > >>"when presented with two or more potential routing paths, take the path
> > > >>with the shortest cumulative delay"?
> > > >>
> > > >>
> > > >>Several of these can be handled within the existing routing protocols,
> > > >>and would be options that might be worth exploring. Improving BGP
> > > >>convergence, I think, requires an improved protocol. There, I would
> > argue
> > > >>for the algorithms used in EIGRP, which are designed to improve
> > > >>convergence in a distance vector protocol. Suppose we used something
> > very
> > > >>EIGRP-ish - in a standard protocol - as a way to distribute routing
> > > >>information (nodes on the tree being AS's rather than routers, at least
> > > >>in non-IBGP cases), but basically distributing BGP routing
> > information? I
> > > >>think we could pretty readily make the case that convergence improves.
> > > >>
> > > >>Just a start. Maybe you have some notions.
> > >
> >
> >_____________________________
> >riw@cisco.com <>< Grace Alone
>

_____________________________
riw@cisco.com <>< Grace Alone



This archive was generated by hypermail 2b29 : Mon Aug 04 2003 - 04:10:04 EDT