[c-nsp] Route Reflector Design

Wed Jul 2 09:50:47 EDT 2008

I'm going to add some elaborations and alternatives in between your 
excellent comments, if you don't mind...

Jeff Aitken wrote:

> A common, but by no means the only, strategy is as follows:
> 
> 1. All routers participate in a single, flat IGP.  The only routes carried
>    in the IGP are loopbacks and links between routers.  All other routes are
>    carried in BGP.  This keeps things simple and promotes fast convergence.

Lesson learned: if you can, put all of the loopbacks into an 
aggregateable range, and all of the inter-router links in an 
aggregateable range.  Makes rACLs much easier when you deploy them 
(tomorrow).

IGP metric design can take many shapes.  Planning your metrics early can 
make for excellent stability in the face of issues and outages, and can 
keep leased line costs low.

> 3. All lower-level routers in a "region" are client peers of the cores 
>    that serve that region (where 'region' could mean POP, city, country,
>    etc., depending on your network).

We took this a step further, for future-proofing, courtesy of guidance 
from AOL/ATDN and their excellent NANOG presentation on migrating from 
OSPF to ISIS.  All of the lower-level routers are client peers of the 
cores, and are fully meshed within the region; the cores do NOT reflect 
routes from client to client.  This helps quench MED oscillation issues.

> 4. All routes advertised via BGP have their next-hop reset where they
>    enter the network.  Typically this is on the edge routers, which are
>    client peers of the local core routers, but can be done anywhere.  The
>    end result is that no matter where on the network you stand, every BGP
>    route has a next-hop address that corresponds to a router loopback that
>    you know how to reach via your IGP.

It's simplest to reset ALL routes, but you might want to look at doing 
it on MOST, leaving a hook to exclude some special-case routes such as 
blackhole routes.

You can also avoid the next-hop rewrite as long as the link containing 
the next hop is in BGP (or your IGP, but not recommended).  I haven't 
proven my theory, but my theory says that NOT rewriting the next-hop 
allows MPLS (if you're running it) to label-switch packets all the way 
to the egress interface.  A rewritten next-hop would invoke PHP at the 
next-to-edge router, and the edge router would have to do a FIB lookup. 
  Am I wrong?  Possibly.  Would there be a benefit?  I think so.

pt