<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2654.45">
<TITLE>RE: [j-nsp] BGP route-reflection question</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=2>Dmitri,</FONT>
</P>
<P><FONT SIZE=2>Two points that must be made clearer, in my view.</FONT>
</P>
<P><FONT SIZE=2>1) Cluster id's are required to prevent looping in hierarchical RR designs, regardless of whether or not the clients are originator_id-aware. Since the RRs will not match the originator_id's they will not be able to tell that they have already reflected a route. This would be analogous to saying "I wish Juniper didn't require my ASN to be prepended to the AS PATH." There are reasons - good ones - based on well know DV mechanisms that require enforcement of these rules. Sure, if you hate Split Horizon you can disable it in RIP, but you better have a love for loops! Since IBGP has no path state, is recursive to the endpoints, and is largely unaware of the loopiness (or lack thereof) of the underlying transport, certain limitations are imposed to ensure NLRI loop-freedom. One is full mesh, or more accurately, update non-transitivity. Break this rule, and there is nothing in IBGP that will tell a generic BGP speaker that propogated an external route that it just learned that same route from an internal peer. RRs "bend" this rule, but impose new ones. Break them at your own risk!</FONT></P>
<P><FONT SIZE=2>2) Danny made a subtle, but very important point - one that those of us who were overambitious with our intracluster RR redunancy goals where sad to learn for ourselves. The more reflection performed to the same client, the more routes it must store/process/index/sort/match against policy/etc. If you have three RRs, then the major current implementations will store all three of the same thing (with there being enough difference that it forces BGP to package them differently, but doesn't affect the decision process all that much). This could mean 330,000 routes for a singly-connected, full BGP feed. With two full feeds, this number can double - and so on. At what point does your router run out of gas? Do you want your router to have to store, package, sort, index, scan, etc 220,000 routes or 660,000? Which do you think is easier? How much redunancy do you need?</FONT></P>
<P><FONT SIZE=2>I say these things because I have lived them. Direct iBGP sessions have little utility compared to Lo0-Lo0 peerings. If you have the latter, than 2 RRs with the same cluster id should be all you need for a router with degree 2 (two uplinks). Anything more provides more pain than pleasure...</FONT></P>
<P><FONT SIZE=2>Just my .02</FONT>
</P>
<P><FONT SIZE=2>-chris</FONT>
</P>
<P><FONT SIZE=2>PS</FONT>
</P>
<P><FONT SIZE=2>The guys who "invented" RR were pretty thorough in exploring most of these issues. In fact, with the exception of persistent oscillation (more a MED prob than RR/confed), there are no known issues (outside of abstract, loosely applied theory and misconfig/buggy code or load/processing pathologies) that are known to cause loops or divergence of an iBGP network. And its been a few years since the first RR draft was posted! ;)</FONT></P>
<BR>
<BR>
<P><FONT SIZE=2>> -----Original Message-----</FONT>
<BR><FONT SIZE=2>> From: Dmitri Kalintsev [<A HREF="mailto:dek@hades.uz">mailto:dek@hades.uz</A>] </FONT>
<BR><FONT SIZE=2>> Sent: Thursday, May 29, 2003 8:12 PM</FONT>
<BR><FONT SIZE=2>> To: juniper-nsp@puck.nether.net</FONT>
<BR><FONT SIZE=2>> Subject: Re: [j-nsp] BGP route-reflection question</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Hmm, this has turned out to be a somewhat </FONT>
<BR><FONT SIZE=2>> hotter-than-anticipated discussion, so I went to the source, </FONT>
<BR><FONT SIZE=2>> as any good Luke would. The RFC2796</FONT>
<BR><FONT SIZE=2>> says:</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> "In a simple configuration the backbone could be divided into many</FONT>
<BR><FONT SIZE=2>> clusters. Each RR would be configured with other RRs as </FONT>
<BR><FONT SIZE=2>> Non-Client peers</FONT>
<BR><FONT SIZE=2>> (thus all the RRs will be fully meshed.). The Clients will </FONT>
<BR><FONT SIZE=2>> be configured</FONT>
<BR><FONT SIZE=2>> to maintain IBGP session only with the RR in their </FONT>
<BR><FONT SIZE=2>> cluster. Due to route</FONT>
<BR><FONT SIZE=2>> reflection, all the IBGP speakers will receive reflected routing</FONT>
<BR><FONT SIZE=2>> information."</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> So, having a client talking to two RRs in different clusters </FONT>
<BR><FONT SIZE=2>> contradicts this RFC. We're back to the square one.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> What I want to say is that in an ideal world I would have </FONT>
<BR><FONT SIZE=2>> appreciated the ability NOT to set the cluster ID, reverting </FONT>
<BR><FONT SIZE=2>> back to the originator-id loop detection mechanism. I think </FONT>
<BR><FONT SIZE=2>> that the network designer should be given the right to choose </FONT>
<BR><FONT SIZE=2>> his own poison, and feel that the way Juniper's config </FONT>
<BR><FONT SIZE=2>> imposes the use of cluster-ids when configuring an RR client </FONT>
<BR><FONT SIZE=2>> is a weeny bit pushy. ;^P</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Just my 2c.</FONT>
<BR><FONT SIZE=2>> --</FONT>
<BR><FONT SIZE=2>> D.K.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> On Thu, May 29, 2003 at 09:25:48AM +0100, Guy Davies wrote:</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > -----BEGIN PGP SIGNED MESSAGE-----</FONT>
<BR><FONT SIZE=2>> > Hash: SHA1</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > Hi Dmitri,</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > I have to say that I don't necessarily *recommend* using different </FONT>
<BR><FONT SIZE=2>> > cluster IDs in the same cluster. I merely said that it is </FONT>
<BR><FONT SIZE=2>> a means to </FONT>
<BR><FONT SIZE=2>> > achieving what you wanted. I knew that Hannes specifically and </FONT>
<BR><FONT SIZE=2>> > possibly Juniper generally recommends doing this but I am </FONT>
<BR><FONT SIZE=2>> with Danny </FONT>
<BR><FONT SIZE=2>> > on this and personally recommend using the same cluster ID </FONT>
<BR><FONT SIZE=2>> and doing </FONT>
<BR><FONT SIZE=2>> > all iBGP from lo0 to lo0. IMHO, using different cluster IDs </FONT>
<BR><FONT SIZE=2>> wins you </FONT>
<BR><FONT SIZE=2>> > little in a well structured network and can cost you a lot (as </FONT>
<BR><FONT SIZE=2>> > described by Danny).</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > No offence intended Hannes :-)</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > Regards,</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > Guy</FONT>
<BR><FONT SIZE=2>> > </FONT>
<BR><FONT SIZE=2>> > > -----Original Message-----</FONT>
<BR><FONT SIZE=2>> > > From: Danny McPherson [<A HREF="mailto:danny@tcb.net">mailto:danny@tcb.net</A>]</FONT>
<BR><FONT SIZE=2>> > > Sent: Thursday, May 29, 2003 1:05 AM</FONT>
<BR><FONT SIZE=2>> > > To: juniper-nsp@puck.nether.net</FONT>
<BR><FONT SIZE=2>> > > Subject: Re: [j-nsp] BGP route-reflection question</FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > > On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek@hades.uz> wrote:</FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > > > P.S. I've noticed yesterday that the other vendor is </FONT>
<BR><FONT SIZE=2>> now also says </FONT>
<BR><FONT SIZE=2>> > > > that having more than one RR in the same cluster is "not </FONT>
<BR><FONT SIZE=2>> > > > recommended". *Sigh*, the world has changed, hasn't it? ;)</FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > > Folks should be careful here, I'm not sure that this is truly a </FONT>
<BR><FONT SIZE=2>> > > "recommended" design, per se, as it can effect lots of things </FONT>
<BR><FONT SIZE=2>> > > significantly. For example, less optimal BGP update packing and </FONT>
<BR><FONT SIZE=2>> > > subsequently, slower convergence & much higher CPU resource </FONT>
<BR><FONT SIZE=2>> > > utilization, etc... In addition, it increases Adj-RIB-In </FONT>
<BR><FONT SIZE=2>> sizes [on </FONT>
<BR><FONT SIZE=2>> > > many boxes] and can have a significant impact on steady </FONT>
<BR><FONT SIZE=2>> state memory </FONT>
<BR><FONT SIZE=2>> > > utilization. Imagine multiple levels of reflection or </FONT>
<BR><FONT SIZE=2>> more than two </FONT>
<BR><FONT SIZE=2>> > > reflectors for a given cluster, etc.. The impact of </FONT>
<BR><FONT SIZE=2>> propagating and </FONT>
<BR><FONT SIZE=2>> > > maintaining redundant paths with slightly different attribute </FONT>
<BR><FONT SIZE=2>> > > pairings, especially in complex topologies, should be heavily </FONT>
<BR><FONT SIZE=2>> > > weighed.</FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > > What I'd _probably recommend is a common cluster_id for all RRs </FONT>
<BR><FONT SIZE=2>> > > withing a cluster, a full mesh of iBGP sessions between </FONT>
<BR><FONT SIZE=2>> clients and </FONT>
<BR><FONT SIZE=2>> > > loopback iBGP peering everywhere such that if the </FONT>
<BR><FONT SIZE=2>> client<->RR1 link </FONT>
<BR><FONT SIZE=2>> > > fails there's an alternative path for the BGP session via </FONT>
<BR><FONT SIZE=2>> RR2 (after </FONT>
<BR><FONT SIZE=2>> > > all, the connectivity is there anyway) and nothings disrupted. </FONT>
<BR><FONT SIZE=2>> > > There are lots of other variable to be considered as </FONT>
<BR><FONT SIZE=2>> well, but IMO, </FONT>
<BR><FONT SIZE=2>> > > simply using different cuslter_ids isn't a clean solution.</FONT>
<BR><FONT SIZE=2>> > > </FONT>
<BR><FONT SIZE=2>> > > -danny</FONT>
<BR><FONT SIZE=2>> ---end quoted text--- _______________________________________________</FONT>
<BR><FONT SIZE=2>> juniper-nsp mailing list juniper-nsp@puck.nether.net </FONT>
<BR><FONT SIZE=2>> <A HREF="http://puck.nether.net/mailman/listinfo/junipe" TARGET="_blank">http://puck.nether.net/mailman/listinfo/junipe</A>> r-nsp</FONT>
<BR><FONT SIZE=2>> </FONT>
</P>
</BODY>
</HTML>