[c-nsp] IOS-XR BGP RR MCID (Multiple Cluster ID)

Sun Mar 11 09:40:57 EDT 2018

On 11/Mar/18 15:07, Job Snijders wrote:

> "32 years and I've not been hit by a bus, so I can stop looking?"
> or perhaps, https://en.wikipedia.org/wiki/Appeal_to_tradition ? :-D

Glad I'm not the only one crossing streets :-).

> One example where shared Cluster-ID is painful, is the fact that a set
> of route-reflector-clients MUST peer with ALL the devices that share
> that specific cluster ID. If one client's IBGP session is missing
> (perhaps due to misconfiguration), and another client's session is down
> (perhaps due to a XPIC or forwarding-plane problem), the view on the
> IBGP state becomes incomplete. With unique Cluster-IDs this failure
> scenario doesn't exist. The fact that 2 down IBGP sessions can cause
> operational issues shows that shared Cluster-ID designs are fragile.

That is a valid failure scenario for shared Cluster-ID's.

Device configuration stability is a network operation function that
every operator manages. We are happy with how we manage ours that this
is not a concern, particularly as different teams manage different
components of the network. iBGP is a low-touch section of our backbone,
restricted to specific task groups, compared to eBGP.

For forwarding plane issues, I can only speak for our network - there is
a significant amount of data plane abstraction in our core network
design, that a failure in the forwarding plane of a client will not
result in a loss of IGP toward any RR. The only way this becomes an
issue is if the entire forwarding plane (centralized or distributed) on
the client were to fail, in which case, the whole router is down anyway.

The forwarding plane issue becomes very fragile if there is a direct
relationship between a physical port/line card and the RR. We don't have
that condition, and for that reason, a share Cluster-ID topology is of
very little to no risk for our specific architecture.

Yes, this means that if you have a linear physical connectivity
relationship between your clients and the RR's, shared Cluster-ID's are
a bad idea. Either you abstract the physical connectivity between the
clients and RR's, making them port/line card-independent, or you switch
to unique Cluster ID's.

> I didn't say that you can't make things work with shared Cluster-IDs,
> you'll just have to make sure you stay within the constraints and are
> aware of the risks. To me that's just a design choice with no
> operational upsides, I'll happily burn the RAM in exchange for
> flexibility & robustness.

Again, really depends on how you've architected your core network in the
PoP. If it's the "classic" way, agreed, share Cluster-ID's become an
issue. We don't have that constraint, so we can afford to enjoy the
benefits of reduced RAM utilization without the inherent risk of
port/line card-dependent relationships between clients and RR's.

Flexibility and robustness might mean different things to different
people. We've not come across any logical routing iBGP routing problem
that we've not been able to solve with our design. But, I'm glad to hear
that you enjoy the same pleasures :-).

Mark.