[j-nsp] OSPF reference-bandwidth 1T

James Bensley jwbensley at gmail.com
Thu Feb 7 04:50:48 EST 2019


TLDR; metrics aren't a purely design/academic decision, they are
operational too.

On Thu, 24 Jan 2019 at 09:27, Saku Ytti <saku at ytti.fi> wrote:
> I don't disagree, I just disagree that there are common case where
> bandwidth is most indicative of good SPT.

If by "good" you mean "shortest" (least number of hops) then I
disagree with you, bandwidth is usually indicative of shortest number
of hops (not always but usually). In any reasonable hierarchical
design northbound links aren't going to be of a lower speed than
southbound links. Taking Adams example of a folded Clos network as a
theoretical utopian text-book example, you also wouldn't have
east-west links between leaves and if you did they wouldn't be as fast
or faster than your northbound links. The problem is that in reality
no SP network looks as neat and tidy or simply as a Clos network, see
below....

> Consider I have
>
> 10GE-1:
> PE1 - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - PE2
>
> 10GE-2:
> PE1 - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - PE2
>
> 10GE-3:
> PE1 - P1 - P2 - P3 - P4 - P5 - P6 - P7 - P8 - P9 - P10 - PE2
>
> 1GE:
> PE1 - PE2
>
> In which realistic topology

> a) in 10GE-1 + 1GE, I want to prefer the 10GE between PE?

As soon as you have 1.000000001Gpbs of traffic to shift (see my
previous email). And this is where reality kicks in - why would you
have a PE with a 10G and 1G uplink? In the hypothetical Clos design
you simply wouldn't have mixed speed links facing northbound, in the
real SP networking world you wouldn't have a 10G uplink if you didn't
have >1Gbps of provisioned downstream connectivity, otherwise you're
wasting capex/opex (except for rare circumstances like a carrier
promotion selling 10G for the price of 1G or something, but you
probably hadn't planned for that). So, assuming there is a reason you
have bandwidth asymmetrical uplinks in your topology its probably
downstream bandwidth related. It could also be upstream relted though;
upstream link upgrades don't happen in a fixed time or perfectly
symmetrically, maybe the road cloure is delayed, route planning
changes, PoP closure, transmission equipment upgrade, you end up
upgrading one northbound circuit in 3 motnths and the other takes 12
months. To go full circle to your original point bandwidth is
dictating the "best" SPT here where "best" means "to avoid congestion
during normal operations, not times of excepional operations which is
when we look to QoS for help".

This is what happens in the "real world" and not Clos networks. We
might want diverse connections to a remote PoP and only one carrier
has 10G of capacity there, so our backup link has to be 1G. We
actually have more than 1G of provisioned downstream connectivity but
that is all we can get unless we want 2x10G from the same carrier and
no resilience. Maybe we can bond a few 1G links from the 2nd carrier
and have 10G + 5G backup. To be clear I don't approve of such a
design, my point is that in the real world, where things aren't
simple, circuit costs are higher than expected, we don't have enough
100G or 10G ports, the project has been under budgeted, the lead time
on the new router from vendor is 12 months not the promised 3, we end
up with these kinds of weird asymmetrical topologies and we have to
use a bandwidth based metric to route traffic.

> b) in 10GE-2 + 1GE, I want to balance between the paths

So, from a purely technical perspective, if you did per flow load
balancing it would work. Should you do it? I'd say Hell no. But not
because of anything to do with IGPs. The operational complexity of
troubleshooting such a topology is too high in this scenario; Imagine
if each one of those 10G links between P nodes was from a different
carrier it would be a case of service credits lining ready to be given
away.

> c) in 10GE-3 + 1GE, I want to prefer the 1GE

You actually have some bandwidth critical services which are <= 1Gbps.

> All these seem nonsensical, what actually is meant '1GE has role Z,
> 10GE has role X, have higher metric for role Z', regardless what the
> actual bandwidth is. I just happens that bandwidth approximates role
> in that topology, but desired topology is likely achieved with
> distance vector or simple role topology and bandwidth is not relevant
> information.

To me they aren't nonsensical, they are "not ideal"  for a specific
purpose i.e. sub-optimal for latency, or operationally more complex.
Going right back to basics; the reason we have a metric at all in the
IGP is because there is some reason why the shortest path (number of
hops) from A to B isn't the most optimal path, so we're using the
metric as a weight to influence the SPT calculation. So the question
is why isn't the STP optimal for you? In the hypothetical Clos model
it is, in real life it isn't, so we're always trying to get as close
to that as we can. Metrics aren't just a purely design/academic
decision (function based or role based), they are operational too;
e.g. breaking up a failure domain or breaking up a change request
domain.

I've had to move traffic away from a P/PE node because traffic around
the core ring was disproportionately distributed such that the failure
of one P node had a much larger impact that other P nodes. As I
mentioned in my previous email, these issues only go away when you
have the kind of luxuries that I, and I expect you, have like your own
dedicate transmission network or enough influence to tell a carrier
where to lay fibre next.

Cheers,
James.


More information about the juniper-nsp mailing list