[j-nsp] BGP route-reflection question

Thu May 29 21:58:48 EDT 2003

Dmitri,

Two points that must be made clearer, in my view.

1) Cluster id's are required to prevent looping in hierarchical RR designs,
regardless of whether or not the clients are originator_id-aware.  Since the
RRs will not match the originator_id's they will not be able to tell that
they have already reflected a route.  This would be analogous to saying "I
wish Juniper didn't require my ASN to be prepended to the AS PATH."  There
are reasons - good ones - based on well know DV mechanisms that require
enforcement of these rules.  Sure, if you hate Split Horizon you can disable
it in RIP, but you better have a love for loops!  Since IBGP has no path
state, is recursive to the endpoints, and is largely unaware of the
loopiness (or lack thereof) of the underlying transport, certain limitations
are imposed to ensure NLRI loop-freedom.  One is full mesh, or more
accurately, update non-transitivity.  Break this rule, and there is nothing
in IBGP that will tell a generic BGP speaker that propogated an external
route that it just learned that same route from an internal peer.  RRs
"bend" this rule, but impose new ones.  Break them at your own risk!

2) Danny made a subtle, but very important point - one that those of us who
were overambitious with our intracluster RR redunancy goals where sad to
learn for ourselves.  The more reflection performed to the same client, the
more routes it must store/process/index/sort/match against policy/etc.  If
you have three RRs, then the major current implementations will store all
three of the same thing (with there being enough difference that it forces
BGP to package them differently, but doesn't affect the decision process all
that much).  This could mean 330,000 routes for a singly-connected, full BGP
feed.  With two full feeds, this number can double - and so on.  At what
point does your router run out of gas?  Do you want your router to have to
store, package, sort, index, scan, etc 220,000 routes or 660,000?  Which do
you think is easier?  How much redunancy do you need?

I say these things because I have lived them.  Direct iBGP sessions have
little utility compared to Lo0-Lo0 peerings.  If you have the latter, than 2
RRs with the same cluster id should be all you need for a router with degree
2 (two uplinks).  Anything more provides more pain than pleasure...

Just my .02

-chris

PS

The guys who "invented" RR were pretty thorough in exploring most of these
issues.  In fact, with the exception of persistent oscillation (more a MED
prob than RR/confed), there are no known issues (outside of abstract,
loosely applied theory and misconfig/buggy code or load/processing
pathologies) that are known to cause loops or divergence of an iBGP network.
And its been a few years since the first RR draft was posted!  ;)

> -----Original Message-----
> From: Dmitri Kalintsev [mailto:dek at hades.uz] 
> Sent: Thursday, May 29, 2003 8:12 PM
> To: juniper-nsp at puck.nether.net
> Subject: Re: [j-nsp] BGP route-reflection question
> 
> 
> Hmm, this has turned out to be a somewhat 
> hotter-than-anticipated discussion, so I went to the source, 
> as any good Luke would. The RFC2796
> says:
> 
>    "In a simple configuration the backbone could be divided into many
>    clusters. Each RR would be configured with other RRs as 
> Non-Client peers
>    (thus all the RRs will be fully meshed.). The Clients will 
> be configured
>    to maintain IBGP session only with the RR in their 
> cluster. Due to route
>    reflection, all the IBGP speakers will receive reflected routing
>    information."
> 
> So, having a client talking to two RRs in different clusters 
> contradicts this RFC. We're back to the square one.
> 
> What I want to say is that in an ideal world I would have 
> appreciated the ability NOT to set the cluster ID, reverting 
> back to the originator-id loop detection mechanism. I think 
> that the network designer should be given the right to choose 
> his own poison, and feel that the way Juniper's config 
> imposes the use of cluster-ids when configuring an RR client 
> is a weeny bit pushy. ;^P
> 
> Just my 2c.
> --
> D.K.
> 
> On Thu, May 29, 2003 at 09:25:48AM +0100, Guy Davies wrote:
> >  
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > 
> > Hi Dmitri,
> > 
> > I have to say that I don't necessarily *recommend* using different 
> > cluster IDs in the same cluster.  I merely said that it is 
> a means to 
> > achieving what you wanted.  I knew that Hannes specifically and 
> > possibly Juniper generally recommends doing this but I am 
> with Danny 
> > on this and personally recommend using the same cluster ID 
> and doing 
> > all iBGP from lo0 to lo0. IMHO, using different cluster IDs 
> wins you 
> > little in a well structured network and can cost you a lot (as 
> > described by Danny).
> > 
> > No offence intended Hannes :-)
> > 
> > Regards,
> > 
> > Guy
> > 
> > > -----Original Message-----
> > > From: Danny McPherson [mailto:danny at tcb.net]
> > > Sent: Thursday, May 29, 2003 1:05 AM
> > > To: juniper-nsp at puck.nether.net
> > > Subject: Re: [j-nsp] BGP route-reflection question
> > > 
> > > 
> > > On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek at hades.uz> wrote:
> > > 
> > > > P.S. I've noticed yesterday that the other vendor is 
> now also says 
> > > > that having more than one RR in the same cluster is "not 
> > > > recommended". *Sigh*, the world has changed, hasn't it? ;)
> > > 
> > > Folks should be careful here, I'm not sure that this is truly a 
> > > "recommended" design, per se, as it can effect lots of things 
> > > significantly. For example, less optimal BGP update packing and 
> > > subsequently, slower convergence & much higher CPU resource 
> > > utilization, etc...  In addition, it increases Adj-RIB-In 
> sizes [on 
> > > many boxes] and can have a significant impact on steady 
> state memory 
> > > utilization. Imagine multiple levels of reflection or 
> more than two 
> > > reflectors for a given cluster, etc..  The impact of 
> propagating and 
> > > maintaining redundant paths with slightly different attribute 
> > > pairings, especially in complex topologies, should be heavily 
> > > weighed.
> > > 
> > > What I'd _probably recommend is a common cluster_id for all RRs 
> > > withing a cluster, a full mesh of iBGP sessions between 
> clients and 
> > > loopback iBGP peering everywhere such that if the 
> client<->RR1 link 
> > > fails there's an alternative path for the BGP session via 
> RR2 (after 
> > > all, the connectivity is there anyway) and nothings disrupted.  
> > > There are lots of other variable to be considered as 
> well, but IMO, 
> > > simply using different cuslter_ids isn't a clean solution.
> > > 
> > > -danny
> ---end quoted text--- _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net 
> http://puck.nether.net/mailman/listinfo/junipe> r-nsp
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://puck.nether.net/pipermail/juniper-nsp/attachments/20030529/70d5bc66/attachment-0001.htm