[c-nsp] ospf auto-cost reference-bandwidth on modern gigabit networks

Thu Apr 30 04:33:02 EDT 2020

On Wed, 29 Apr 2020 at 16:21, Saku Ytti <saku at ytti.fi> wrote:
>
> Hey,
>
> >      Is there a recommended 'modern default' for ip ospf auto-cost
> > reference-bandwidth, to account for the fact that modern networks have
> > 1g and faster interfaces?
>
> To me this never made any sense. It's a very atypical case where you
> want your topology to be link-bw based. The most common use-case is,
> you want distance metric, i.e. everything is equal, and you want least
> amount of hops. The 2nd most common case is role-based, that you have
> P-P lower metric than P-PE so you don't transit via PE and so forth
> (but each P-P are largely equal, due to topology having few options to
> target PE) and the 3rd most common is to have idealised latency based
> metric, so that you can model best path on nTh failure (to have bw)
> with RSVP.
>
> Reference bandwidth sounds like a very niche scenario where it would
> be sensible.

Hi Saku,

I strongly disagree with this advice to OP.

It isn’t atypical that link-bw is used in my experience, I’d say that
in smaller shops it’s actually the norm. Where one has no dedicate P
nodes for example, all links are PE-to-PE with PEs implicitly acting
as P nodes between a pair of ingress and egress PEs; they’re all the
same function, and network services in smaller shops are also rarely
distributed evenly across the network, and thus bandwidth isn’t equal
in all parts of the network.

One problem with say role-based metrics (e.g. PE-PE, PE-P and P-P) is
that upgrades and failures don’t happen symmetrically. Role based
metrics are a good theory but in practical terms are quite limited, in
particular for smaller shops in my opinion.

Example:
APE has a wavelength from provider A to P-1 and a 2nd wavelength from
provider B to P-2. I’ve asked each provider for a 2nd wavelength from
me PE to P-1 and P-2, to increase the core facing capacity of the PE.

Provider A delivers within 2 weeks, provider B has no more capacity in
my area, and it takes 2 months. I can’t add provider A’s new
link/wavelength into my P-1 facing LAG bundle until the provider B
wavelength is ready. This is because a role-based PE-to-P cost doesn’t
reflect available bandwidth, I can’t make use of the extra capacity
provider A has provided because traffic is ECMP’ed over both P facing
LAGs. My lower capacity LAG to P-2 is still getting half the traffic.
I could temporarily raise the cost on the PE to P-2 facing LAG, but
then why not just have bw-based cost in the first place?

At some point in the future, when the capacity on both P-facing LAGs
has been increased, the 2nd member link within my P-2 facing LAG goes
down. Traffic is still ECMP’ed over both LAGs equally though because
the role-based cost on both LAGs is the same. The P-2 LAG is now
congested. I can add complexity by using knobs like “minimum bundle
links” but my only option there would be to have the LAG to P-2 shut
itself down, even though it still has one working link. I want to keep
the LAG to P-2 up, in case the LAG to P-1 dies completely, but just
shift all my traffic to the P-1 facing LAG until the full bandwidth of
the P-2 facing LAG is restored. That would happen automatically with
bw based IGP cost.

Role based and metric based IGP costs are a good idea in theory. They
are a lot more difficult in practice. Another problem with role based
IGP costs is “who has more capacity between a pair of PEs than those
PEs have to their upstream P nodes”? If you find yourself in that
scenario, it isn’t role based IGP costs you need, it’s a long hard
look in the mirror.

Look at the current post-child of Service Provider networking, Segment
Routing, which everyone seems to love; this is exactly one of the
major reasons many networks need SR-EPE. I ask my external peer for an
additional peering links at the two locations we peer at, they deliver
at location 1 tomorrow, and location 2 in another 6 weeks. But I need
this capacity now because that peer is a major CDN and next week some
massive social/cultural thing is happening at short notice. I need to
“engineer” (as in EPE) a disproportionate amount of traffic towards
the first peering location, which isn’t reflected in the BGP
preference.

The key point is this:
OP needs to choose what is actually practical for his network (which
may be role-based costs), not what is academically superior.

Cheers,
James.