[j-nsp] BGP route-reflection question

Thu May 29 20:20:45 EDT 2003

Wouldn't the solution be to use Lo0 to Lo0 peering, as you said?
Then you don't have to worry about the cluster-id problem in the
first place...

-c

On Fri, May 30, 2003 at 11:45:48AM +1000, 'Dmitri Kalintsev' wrote:
> Hi Martin,
> 
> I guess now I should go back to the issue that I had that prompted me with
> the good side of being able to disable to use of cluster-ids. Consider the
> following configuration:
> 
> RR1---RR2
>  \    /
>   \C1/
>    +----Important LAN---
> 
> Links RR1 - RR2 is POS, both links to C1 is GigE, and C1 is a L3 switch.
> Both RR1 and RR2 provide independent links to the rest of the network.
> 
> Now to the configs (sorry, but all configs in the example will be cisco):
> 
> RR1:
> ---
> int lo0
> ip add 10.0.0.1 255.255.255.255
> !
> int pos1/0
> ip add 1.1.1.1 255.255.255.252
> !
> int Gig2/0
> ip add 2.2.2.1 255.255.255.0 <= NOTE THE /24 mask
> no ip proxy-arp
> !
> router ospf 1
> network 1.1.1.0 0.0.0.255 area 0
> !
> router bgp 11111
> bgp cluster-id 10
> neigh 10.0.0.2 remote-as 11111
> neigh 10.0.0.2 upda Lo0
> 
> RR2:
> ---
> int lo0
> ip add 10.0.0.2 255.255.255.255
> !
> int pos1/0
> ip add 1.1.1.2 255.255.255.252
> !
> int Gig2/0
> ip add 2.2.2.2 255.255.255.0 <= NOTE THE /24 mask
> no ip proxy-arp
> !
> router ospf 1
> network 1.1.1.0 0.0.0.255 area 0
> !
> router bgp 11111
> bgp cluster-id 10
> neigh 10.0.0.1 remote-as 11111
> neigh 10.0.0.1 upda Lo0
> 
> 
> C1:
> ---
> int lo0
> ip add 10.0.0.3 255.255.255.255
> !
> int VLAN10
> ip add 2.2.2.3 255.255.255.0 <= NOTE THE /24 mask
> !
> int VLANx
> desc Important VLAN#1
> ip add <whatever>
> 
> Requirement:
> 
> C1 connects a few networks in quite an important LAN to the internet via
> RR1/RR2. iBGP is a requirement between the C1 and RR1/RR2 (there is another
> exit from this LAN, so static routing in from RR1/RR2 is not an option), and
> there is NO dynamic routing between RR1/RR2 and C1, nor it's a good idea to
> configure it (so presume - static routing ONLY plus iBGP). The C1 also talks
> to some other L3 switches in the Important_LAN via OSPF.
> 
> Yes, I know that this situation is a nasty stockpile of recipies for
> disaster, but that's what I had to deal with.
> 
> Um, one more complication: you can't change cluster-id's or delete them.
> I'll be very interested to see the elegant solution to this (although now
> obsolete, thanks God!) situation.
> 
> SY,
> --
> D.K.
> 
> On Thu, May 29, 2003 at 08:58:48PM -0400, Martin, Christian wrote:
> > Dmitri,
> > 
> > Two points that must be made clearer, in my view.
> > 
> > 1) Cluster id's are required to prevent looping in hierarchical RR designs,
> > regardless of whether or not the clients are originator_id-aware.  Since the
> > RRs will not match the originator_id's they will not be able to tell that
> > they have already reflected a route.  This would be analogous to saying "I
> > wish Juniper didn't require my ASN to be prepended to the AS PATH."  There
> > are reasons - good ones - based on well know DV mechanisms that require
> > enforcement of these rules.  Sure, if you hate Split Horizon you can disable
> > it in RIP, but you better have a love for loops!  Since IBGP has no path
> > state, is recursive to the endpoints, and is largely unaware of the
> > loopiness (or lack thereof) of the underlying transport, certain limitations
> > are imposed to ensure NLRI loop-freedom.  One is full mesh, or more
> > accurately, update non-transitivity.  Break this rule, and there is nothing
> > in IBGP that will tell a generic BGP speaker that propogated an external
> > route that it just learned that same route from an internal peer.  RRs
> > "bend" this rule, but impose new ones.  Break them at your own risk!
> > 
> > 2) Danny made a subtle, but very important point - one that those of us who
> > were overambitious with our intracluster RR redunancy goals where sad to
> > learn for ourselves.  The more reflection performed to the same client, the
> > more routes it must store/process/index/sort/match against policy/etc.  If
> > you have three RRs, then the major current implementations will store all
> > three of the same thing (with there being enough difference that it forces
> > BGP to package them differently, but doesn't affect the decision process all
> > that much).  This could mean 330,000 routes for a singly-connected, full BGP
> > feed.  With two full feeds, this number can double - and so on.  At what
> > point does your router run out of gas?  Do you want your router to have to
> > store, package, sort, index, scan, etc 220,000 routes or 660,000?  Which do
> > you think is easier?  How much redunancy do you need?
> > 
> > I say these things because I have lived them.  Direct iBGP sessions have
> > little utility compared to Lo0-Lo0 peerings.  If you have the latter, than 2
> > RRs with the same cluster id should be all you need for a router with degree
> > 2 (two uplinks).  Anything more provides more pain than pleasure...
> > 
> > Just my .02
> > 
> > -chris
> > 
> > PS
> > 
> > The guys who "invented" RR were pretty thorough in exploring most of these
> > issues.  In fact, with the exception of persistent oscillation (more a MED
> > prob than RR/confed), there are no known issues (outside of abstract,
> > loosely applied theory and misconfig/buggy code or load/processing
> > pathologies) that are known to cause loops or divergence of an iBGP network.
> > And its been a few years since the first RR draft was posted!  ;)
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Dmitri Kalintsev [mailto:dek at hades.uz] 
> > > Sent: Thursday, May 29, 2003 8:12 PM
> > > To: juniper-nsp at puck.nether.net
> > > Subject: Re: [j-nsp] BGP route-reflection question
> > > 
> > > 
> > > Hmm, this has turned out to be a somewhat 
> > > hotter-than-anticipated discussion, so I went to the source, 
> > > as any good Luke would. The RFC2796
> > > says:
> > > 
> > >    "In a simple configuration the backbone could be divided into many
> > >    clusters. Each RR would be configured with other RRs as 
> > > Non-Client peers
> > >    (thus all the RRs will be fully meshed.). The Clients will 
> > > be configured
> > >    to maintain IBGP session only with the RR in their 
> > > cluster. Due to route
> > >    reflection, all the IBGP speakers will receive reflected routing
> > >    information."
> > > 
> > > So, having a client talking to two RRs in different clusters 
> > > contradicts this RFC. We're back to the square one.
> > > 
> > > What I want to say is that in an ideal world I would have 
> > > appreciated the ability NOT to set the cluster ID, reverting 
> > > back to the originator-id loop detection mechanism. I think 
> > > that the network designer should be given the right to choose 
> > > his own poison, and feel that the way Juniper's config 
> > > imposes the use of cluster-ids when configuring an RR client 
> > > is a weeny bit pushy. ;^P
> > > 
> > > Just my 2c.
> > > --
> > > D.K.
> > > 
> > > On Thu, May 29, 2003 at 09:25:48AM +0100, Guy Davies wrote:
> > > >  
> > > > -----BEGIN PGP SIGNED MESSAGE-----
> > > > Hash: SHA1
> > > > 
> > > > Hi Dmitri,
> > > > 
> > > > I have to say that I don't necessarily *recommend* using different 
> > > > cluster IDs in the same cluster.  I merely said that it is 
> > > a means to 
> > > > achieving what you wanted.  I knew that Hannes specifically and 
> > > > possibly Juniper generally recommends doing this but I am 
> > > with Danny 
> > > > on this and personally recommend using the same cluster ID 
> > > and doing 
> > > > all iBGP from lo0 to lo0. IMHO, using different cluster IDs 
> > > wins you 
> > > > little in a well structured network and can cost you a lot (as 
> > > > described by Danny).
> > > > 
> > > > No offence intended Hannes :-)
> > > > 
> > > > Regards,
> > > > 
> > > > Guy
> > > > 
> > > > > -----Original Message-----
> > > > > From: Danny McPherson [mailto:danny at tcb.net]
> > > > > Sent: Thursday, May 29, 2003 1:05 AM
> > > > > To: juniper-nsp at puck.nether.net
> > > > > Subject: Re: [j-nsp] BGP route-reflection question
> > > > > 
> > > > > 
> > > > > On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek at hades.uz> wrote:
> > > > > 
> > > > > > P.S. I've noticed yesterday that the other vendor is 
> > > now also says 
> > > > > > that having more than one RR in the same cluster is "not 
> > > > > > recommended". *Sigh*, the world has changed, hasn't it? ;)
> > > > > 
> > > > > Folks should be careful here, I'm not sure that this is truly a 
> > > > > "recommended" design, per se, as it can effect lots of things 
> > > > > significantly. For example, less optimal BGP update packing and 
> > > > > subsequently, slower convergence & much higher CPU resource 
> > > > > utilization, etc...  In addition, it increases Adj-RIB-In 
> > > sizes [on 
> > > > > many boxes] and can have a significant impact on steady 
> > > state memory 
> > > > > utilization. Imagine multiple levels of reflection or 
> > > more than two 
> > > > > reflectors for a given cluster, etc..  The impact of 
> > > propagating and 
> > > > > maintaining redundant paths with slightly different attribute 
> > > > > pairings, especially in complex topologies, should be heavily 
> > > > > weighed.
> > > > > 
> > > > > What I'd _probably recommend is a common cluster_id for all RRs 
> > > > > withing a cluster, a full mesh of iBGP sessions between 
> > > clients and 
> > > > > loopback iBGP peering everywhere such that if the 
> > > client<->RR1 link 
> > > > > fails there's an alternative path for the BGP session via 
> > > RR2 (after 
> > > > > all, the connectivity is there anyway) and nothings disrupted.  
> > > > > There are lots of other variable to be considered as 
> > > well, but IMO, 
> > > > > simply using different cuslter_ids isn't a clean solution.
> > > > > 
> > > > > -danny
> > > ---end quoted text--- _______________________________________________
> > > juniper-nsp mailing list juniper-nsp at puck.nether.net 
> > > http://puck.nether.net/mailman/listinfo/junipe> r-nsp
> > > 
> ---end quoted text---
> 
> -- 
> D.K.
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> http://puck.nether.net/mailman/listinfo/juniper-nsp