[j-nsp] BGP route-reflection question

Fri May 30 12:45:48 EDT 2003

Hi Martin,

I guess now I should go back to the issue that I had that prompted me with
the good side of being able to disable to use of cluster-ids. Consider the
following configuration:

RR1---RR2
 \    /
  \C1/
   +----Important LAN---

Links RR1 - RR2 is POS, both links to C1 is GigE, and C1 is a L3 switch.
Both RR1 and RR2 provide independent links to the rest of the network.

Now to the configs (sorry, but all configs in the example will be cisco):

RR1:
---
int lo0
ip add 10.0.0.1 255.255.255.255
!
int pos1/0
ip add 1.1.1.1 255.255.255.252
!
int Gig2/0
ip add 2.2.2.1 255.255.255.0 <= NOTE THE /24 mask
no ip proxy-arp
!
router ospf 1
network 1.1.1.0 0.0.0.255 area 0
!
router bgp 11111
bgp cluster-id 10
neigh 10.0.0.2 remote-as 11111
neigh 10.0.0.2 upda Lo0

RR2:
---
int lo0
ip add 10.0.0.2 255.255.255.255
!
int pos1/0
ip add 1.1.1.2 255.255.255.252
!
int Gig2/0
ip add 2.2.2.2 255.255.255.0 <= NOTE THE /24 mask
no ip proxy-arp
!
router ospf 1
network 1.1.1.0 0.0.0.255 area 0
!
router bgp 11111
bgp cluster-id 10
neigh 10.0.0.1 remote-as 11111
neigh 10.0.0.1 upda Lo0

C1:
---
int lo0
ip add 10.0.0.3 255.255.255.255
!
int VLAN10
ip add 2.2.2.3 255.255.255.0 <= NOTE THE /24 mask
!
int VLANx
desc Important VLAN#1
ip add <whatever>

Requirement:

C1 connects a few networks in quite an important LAN to the internet via
RR1/RR2. iBGP is a requirement between the C1 and RR1/RR2 (there is another
exit from this LAN, so static routing in from RR1/RR2 is not an option), and
there is NO dynamic routing between RR1/RR2 and C1, nor it's a good idea to
configure it (so presume - static routing ONLY plus iBGP). The C1 also talks
to some other L3 switches in the Important_LAN via OSPF.

Yes, I know that this situation is a nasty stockpile of recipies for
disaster, but that's what I had to deal with.

Um, one more complication: you can't change cluster-id's or delete them.
I'll be very interested to see the elegant solution to this (although now
obsolete, thanks God!) situation.

SY,
--
D.K.

On Thu, May 29, 2003 at 08:58:48PM -0400, Martin, Christian wrote:
> Dmitri,
> 
> Two points that must be made clearer, in my view.
> 
> 1) Cluster id's are required to prevent looping in hierarchical RR designs,
> regardless of whether or not the clients are originator_id-aware.  Since the
> RRs will not match the originator_id's they will not be able to tell that
> they have already reflected a route.  This would be analogous to saying "I
> wish Juniper didn't require my ASN to be prepended to the AS PATH."  There
> are reasons - good ones - based on well know DV mechanisms that require
> enforcement of these rules.  Sure, if you hate Split Horizon you can disable
> it in RIP, but you better have a love for loops!  Since IBGP has no path
> state, is recursive to the endpoints, and is largely unaware of the
> loopiness (or lack thereof) of the underlying transport, certain limitations
> are imposed to ensure NLRI loop-freedom.  One is full mesh, or more
> accurately, update non-transitivity.  Break this rule, and there is nothing
> in IBGP that will tell a generic BGP speaker that propogated an external
> route that it just learned that same route from an internal peer.  RRs
> "bend" this rule, but impose new ones.  Break them at your own risk!
> 
> 2) Danny made a subtle, but very important point - one that those of us who
> were overambitious with our intracluster RR redunancy goals where sad to
> learn for ourselves.  The more reflection performed to the same client, the
> more routes it must store/process/index/sort/match against policy/etc.  If
> you have three RRs, then the major current implementations will store all
> three of the same thing (with there being enough difference that it forces
> BGP to package them differently, but doesn't affect the decision process all
> that much).  This could mean 330,000 routes for a singly-connected, full BGP
> feed.  With two full feeds, this number can double - and so on.  At what
> point does your router run out of gas?  Do you want your router to have to
> store, package, sort, index, scan, etc 220,000 routes or 660,000?  Which do
> you think is easier?  How much redunancy do you need?
> 
> I say these things because I have lived them.  Direct iBGP sessions have
> little utility compared to Lo0-Lo0 peerings.  If you have the latter, than 2
> RRs with the same cluster id should be all you need for a router with degree
> 2 (two uplinks).  Anything more provides more pain than pleasure...
> 
> Just my .02
> 
> -chris
> 
> PS
> 
> The guys who "invented" RR were pretty thorough in exploring most of these
> issues.  In fact, with the exception of persistent oscillation (more a MED
> prob than RR/confed), there are no known issues (outside of abstract,
> loosely applied theory and misconfig/buggy code or load/processing
> pathologies) that are known to cause loops or divergence of an iBGP network.
> And its been a few years since the first RR draft was posted!  ;)
> 
> 
> 
> > -----Original Message-----
> > From: Dmitri Kalintsev [mailto:dek at hades.uz] 
> > Sent: Thursday, May 29, 2003 8:12 PM
> > To: juniper-nsp at puck.nether.net
> > Subject: Re: [j-nsp] BGP route-reflection question
> > 
> > 
> > Hmm, this has turned out to be a somewhat 
> > hotter-than-anticipated discussion, so I went to the source, 
> > as any good Luke would. The RFC2796
> > says:
> > 
> >    "In a simple configuration the backbone could be divided into many
> >    clusters. Each RR would be configured with other RRs as 
> > Non-Client peers
> >    (thus all the RRs will be fully meshed.). The Clients will 
> > be configured
> >    to maintain IBGP session only with the RR in their 
> > cluster. Due to route
> >    reflection, all the IBGP speakers will receive reflected routing
> >    information."
> > 
> > So, having a client talking to two RRs in different clusters 
> > contradicts this RFC. We're back to the square one.
> > 
> > What I want to say is that in an ideal world I would have 
> > appreciated the ability NOT to set the cluster ID, reverting 
> > back to the originator-id loop detection mechanism. I think 
> > that the network designer should be given the right to choose 
> > his own poison, and feel that the way Juniper's config 
> > imposes the use of cluster-ids when configuring an RR client 
> > is a weeny bit pushy. ;^P
> > 
> > Just my 2c.
> > --
> > D.K.
> > 
> > On Thu, May 29, 2003 at 09:25:48AM +0100, Guy Davies wrote:
> > >  
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > > 
> > > Hi Dmitri,
> > > 
> > > I have to say that I don't necessarily *recommend* using different 
> > > cluster IDs in the same cluster.  I merely said that it is 
> > a means to 
> > > achieving what you wanted.  I knew that Hannes specifically and 
> > > possibly Juniper generally recommends doing this but I am 
> > with Danny 
> > > on this and personally recommend using the same cluster ID 
> > and doing 
> > > all iBGP from lo0 to lo0. IMHO, using different cluster IDs 
> > wins you 
> > > little in a well structured network and can cost you a lot (as 
> > > described by Danny).
> > > 
> > > No offence intended Hannes :-)
> > > 
> > > Regards,
> > > 
> > > Guy
> > > 
> > > > -----Original Message-----
> > > > From: Danny McPherson [mailto:danny at tcb.net]
> > > > Sent: Thursday, May 29, 2003 1:05 AM
> > > > To: juniper-nsp at puck.nether.net
> > > > Subject: Re: [j-nsp] BGP route-reflection question
> > > > 
> > > > 
> > > > On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek at hades.uz> wrote:
> > > > 
> > > > > P.S. I've noticed yesterday that the other vendor is 
> > now also says 
> > > > > that having more than one RR in the same cluster is "not 
> > > > > recommended". *Sigh*, the world has changed, hasn't it? ;)
> > > > 
> > > > Folks should be careful here, I'm not sure that this is truly a 
> > > > "recommended" design, per se, as it can effect lots of things 
> > > > significantly. For example, less optimal BGP update packing and 
> > > > subsequently, slower convergence & much higher CPU resource 
> > > > utilization, etc...  In addition, it increases Adj-RIB-In 
> > sizes [on 
> > > > many boxes] and can have a significant impact on steady 
> > state memory 
> > > > utilization. Imagine multiple levels of reflection or 
> > more than two 
> > > > reflectors for a given cluster, etc..  The impact of 
> > propagating and 
> > > > maintaining redundant paths with slightly different attribute 
> > > > pairings, especially in complex topologies, should be heavily 
> > > > weighed.
> > > > 
> > > > What I'd _probably recommend is a common cluster_id for all RRs 
> > > > withing a cluster, a full mesh of iBGP sessions between 
> > clients and 
> > > > loopback iBGP peering everywhere such that if the 
> > client<->RR1 link 
> > > > fails there's an alternative path for the BGP session via 
> > RR2 (after 
> > > > all, the connectivity is there anyway) and nothings disrupted.  
> > > > There are lots of other variable to be considered as 
> > well, but IMO, 
> > > > simply using different cuslter_ids isn't a clean solution.
> > > > 
> > > > -danny
> > ---end quoted text--- _______________________________________________
> > juniper-nsp mailing list juniper-nsp at puck.nether.net 
> > http://puck.nether.net/mailman/listinfo/junipe> r-nsp
> > 
---end quoted text---

-- 
D.K.