[c-nsp] IOS-XR BGP RR MCID (Multiple Cluster ID)

Mon Mar 12 10:19:51 EDT 2018

In iBGP infrastructures I used or built the use of common/unique cluster IDs is not saving any memory and is used solely for preventing a RR to learn its own advertisements from the network.  

adam

netconsultings.com
::carrier-class solutions for the telecommunications industry::

> -----Original Message-----
> From: Saku Ytti [mailto:saku at ytti.fi]
> Sent: Monday, March 12, 2018 1:06 PM
> To: adamv0025 at netconsultings.com
> Cc: Job Snijders; Mark Tinka; Cisco Network Service Providers
> Subject: Re: [c-nsp] IOS-XR BGP RR MCID (Multiple Cluster ID)
> 
> Routing loop to me sounds like operational problem, that things are broken.
> That will not happen. Otherwise we're saying every network has routing
> loops, because if you consider all RIB in every box, there are tons of loops. I
> think we all agree most networks are loop free :>
> 
> You are saving DRAM, that's it.
> 
> In your case you're not even saving DRAM as cluster doesnt peer with itself,
> so for you it's just additional complexity with no upside.
> Harder to generate config, as you need to teach some system the relation
> which node is in which cluster. Where as cluster-id==loop0 is 0-knowledge.
> 
> 
> On 12 March 2018 at 14:42,  <adamv0025 at netconsultings.com> wrote:
> > Ok I agree if a speaker is not connect to both (all) RRs in a cluster then you
> need to make up for that by connecting RRs to each other.
> >
> > Well isn't avoiding routing loops ultimately saving DRAM?
> > I'd argue the cluster-id comparison is either about preventing acceptance
> of one's own advertisement (RRs talking in circle) or about preventing
> learning clients routes from a different RR (your diagram) - hence preventing
> routing loops and saving DRAM.
> >
> > adam
> >
> > netconsultings.com
> > ::carrier-class solutions for the telecommunications industry::
> >
> >> -----Original Message-----
> >> From: Saku Ytti [mailto:saku at ytti.fi]
> >> Sent: Monday, March 12, 2018 11:54 AM
> >> To: adamv0025 at netconsultings.com
> >> Cc: Job Snijders; Mark Tinka; Cisco Network Service Providers
> >> Subject: Re: [c-nsp] IOS-XR BGP RR MCID (Multiple Cluster ID)
> >>
> >> On 12 March 2018 at 13:41,  <adamv0025 at netconsultings.com> wrote:
> >>
> >> Typical reason for RR1, RR2 to have iBGP to each other is when they
> >> are in forwarding path and are not dedicated RR, but also have
> >> external BGP to them.
> >>
> >> And no, clusterID are not used for loop prevention, they are used to
> >> save DRAM. There will be no routing loops by using arbitrary RR
> >> topology with clusterID==loopback, BGP best path selection does not
> >> depend on non- unique clusterID to choose loop free path.
> >>
> >> In your case, if the cluster isn't even peering with itself, then
> >> there truly is no purpose for clusterID.
> >>
> >> > The point' I'm trying to make is that I don't see a reason why RR1
> >> > and RR2 in
> >> a common cluster should have a session to each other and also why RR1
> >> in one cluster should have session to RR2s in all other clusters.
> >> > (and if RR1 and RR2 share a common cluster ID then session between
> >> > them
> >> is a complete nonsense).
> >> > Then if you go and shut PE1's session to RR1 and then go a shut
> >> > PE2's
> >> session to RR2 then it's just these two PEs affected and well what
> >> can I say you better think twice next time or consider automation.
> >> > One can't possibly bend the backbone architecture out of shape
> >> > because of
> >> all the cases where someone comes in a does something stupid (this
> >> complexity has to me moved somewhere else in my opinion -for instance
> >> to a system that won't allow you to commit something stupid).
> >> >
> >> > Regarding the scale - well there are setups there with couple
> >> > millions of
> >> just customer VPN prefixes.
> >> >
> >> > Regarding the Cluster-IDs - yes these are used for loop prevention
> >> > but only
> >> among RRs relying routes to each other -if  PE is in the loop then
> >> Originator-ID should do the job just fine.
> >> >
> >> >
> >> > adam
> >> >
> >> > netconsultings.com
> >> > ::carrier-class solutions for the telecommunications industry::
> >> >
> >> >> -----Original Message-----
> >> >> From: Saku Ytti [mailto:saku at ytti.fi]
> >> >> Sent: Monday, March 12, 2018 10:43 AM
> >> >> To: adamv0025 at netconsultings.com
> >> >> Cc: Job Snijders; Mark Tinka; Cisco Network Service Providers
> >> >> Subject: Re: [c-nsp] IOS-XR BGP RR MCID (Multiple Cluster ID)
> >> >>
> >> >> Hey,
> >> >>
> >> >>
> >> >> RR1---RR2
> >> >> |           |
> >> >> PE1----+
> >> >>
> >> >>
> >> >> 1) PE1 sends 1M routes to RR2, RR2
> >> >>
> >> >> CaseA) Same clusterID
> >> >> 1) RR1 and RR2 have 1M entries
> >> >>
> >> >> CaseB) Unique clusterID
> >> >> 1) RR1 and RR2 have 2M entries
> >> >>
> >> >>
> >> >>
> >> >> Cluster is promise that every client peers with exactly same set
> >> >> of RRs, so there is no need to for RRs to share client routes
> >> >> inside cluster, as they have already received it directly.
> >> >>
> >> >>
> >> >> Of course if client1 loses connection to RR2 and client2 loses
> >> >> connection to RR1, client<->client2 do not se each other's routes.
> >> >>
> >> >> For same reason, you're not free to choose 'my nearest two RR'
> >> >> with same cluster-id, as you must always peer with every box in
> >> >> same cluster-id. So you lose topological flexibility, increase
> >> >> operational complexity, increase failure- modes. But you do save
> >> >> that sweet sweet
> >> DRAM.
> >> >>
> >> >>
> >> >> Most blogs I read and even some vendor documents propose clusterID
> >> >> to avoid loops, I think this is the real reason people use them,
> >> >> when RR was setup, people didn't know what clusterID is for, and
> >> >> later stayed committed on that initial false rationale and
> >> >> invented new rationales to justify their position.
> >> >>
> >> >> Premature optimisation is source of great many evil. Optimise for
> >> >> simplicity when you can, increase  complexity when you must.
> >> >>
> >> >>
> >> >>
> >> >> On 12 March 2018 at 12:34,  <adamv0025 at netconsultings.com> wrote:
> >> >> >> Job Snijders
> >> >> >> Sent: Sunday, March 11, 2018 12:21 PM
> >> >> >>
> >> >> >> Folks - i'm gonna cut short here: by sharing the cluster-id
> >> >> >> across
> >> >> > multiple
> >> >> >> devices, you lose in topology flexibility, robustness, and simplicity.
> >> >> >>
> >> >> >
> >> >> > Gent's I have no idea what you're talking about.
> >> >> > How can one save or burn RAM if using or not using shared
> >> >> > cluster-IDs respectively???
> >> >> > The only scenario I can think of is if your two RRs say RR1 and
> >> >> > RR2 in a POP serving a set of clients (by definition a cluster
> >> >> > btw) -if these two RRs have an iBGP session to each other -
> >> >> > which is a big NONO when you are using out of band RRs, no
> seriously.
> >> >> > Remember my previous example about separate iBGP
> infrastructures
> >> >> > one formed out of all clients connecting to RR1 in local POP and
> >> >> > then all RR1s in all POPs peering with each other in full mesh
> >> >> > and then the same infrastructure involving RR2s?
> >> >> > Well these two iBGP infrastructures should work as ships in the night.
> >> >> > If one infrastructure breaks at some point you still get all
> >> >> > your prefixes to clients/RRs in affected POPs via the other
> infrastructure.
> >> >> > That said both of these iBGP infrastructures need to carry the
> >> >> > same set of prefixes, so the memory and cpu resources needed are
> >> >> > proportional to the amount of information carried only.
> >> >> > -but none of these need to carry the set of prefixes twice, see
> below.
> >> >> >
> >> >> > Yes you could argue if A loses session to RR1 and B loses
> >> >> > session to
> >> >> > RR2 then A and B can't communicate, but the point is PEs just
> >> >> > don't lose sessions to RRs -these are iBGP sessions that can
> >> >> > route around, so the only scenario where this happens is
> >> >> > misconfiguration and trust me you'll know right away that you broke
> something.
> >> >> > Then you can argue that ok what if I have A to RR1-pop1 to
> >> >> > RR1-pop2 to
> >> B
> >> >> > AND  A to RR2-pop1 to RR2-pop2 to B   AND  say RR1-pop1 as well as
> >> RR2-
> >> >> pop-2
> >> >> > fail at the same then A and B can't communicate.
> >> >> > Fair point that will certainly happen, but what is the
> >> >> > likelihood of that happening? Well it's MTBF of RR1-POP-1 times
> >> >> > MTBF of
> >> >> > RR1-POP-1 which is fine for me and I bet for most folks out there.
> >> >> >
> >> >> >
> >> >> > adam
> >> >> >
> >> >> > netconsultings.com
> >> >> > ::carrier-class solutions for the telecommunications industry::
> >> >> >
> >> >> > _______________________________________________
> >> >> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> >> >> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> >> >> > archive at http://puck.nether.net/pipermail/cisco-nsp/
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>   ++ytti
> >> >
> >>
> >>
> >>
> >> --
> >>   ++ytti
> >
> 
> 
> 
> --
>   ++ytti