[j-nsp] Spine & leaf

Wed Jun 27 14:55:13 EDT 2018

> On Jun 27, 2018, at 8:40 AM, Thomas Bellman <bellman at nsc.liu.se> wrote:
> 
> On 2018-06-26 21:38, David Sinn wrote:
> 
>> OSPF scales well to many multiples of 1000's of devices.
> 
> Is that true even for Clos (spine & leaf) networks, and in a single area?

Yes for multi-tiered Clos, as that was the original ask and where my reference is coming from.  However, it is not in a single area.  But if you are doing a multi-tiered then the areas can fall out fairly naturally from the topology.

> My understanding, solely based on what others have told me, is that
> the flooding of LSAs in a Clos network can start to overwhelm routers
> already at a few hundred devices, as each time e.g. a spine sends out
> an LSA, all of the other spines will hear that from each of the leaves,
> all more or less simultaneously.  And since the OSPF protocol limits
> lifetimes of LSAs to 1 hour, you will get a constant stream of updates.

Even Quagga/FRR has some measure of skew in the refresh interval to insure that the re-advertisements aren't all clumped into one batch of updates from a given node.  And even having a really bad day where you loose power to a good number of devices that basically all start up at the same time doesn't cause measure-able congruence of updates.  The skew in type-1's as the boxes randomly bring up adjacencies and re-advertise their reachability gives you a good spread across the refreshes over time which generally grows over time.  So while there is a continual background, even a single core PPC CPU's on commodity boxes can handle it with multiple of the available white-box OS's.

> My own experience is only with pretty small networks (we currently have
> 2 spines and 11 leaves in our area, the rest of the university have a
> couple dozen routers, and the OSPF database in our area contains ~1600
> LSAs).  Thus, I can only repeat what others have told me, but I'm
> curious to hear real-world experience from people running larger
> OSPF networks.

Some of this is also around picking the right optimizations.  Broadcast segments between routers is default but leads to more LSA's, so point-to-point is your friend and allows for faster return to service when a link flaps[1].  Multiple areas and summarization when your topology gets really large is also needed.  And don't be afraid of not having a unified area 0 in a Clos because spines don't need to talk to one another.  The last one is no worse off then what you run into with the eBGP approach and re-using AS's.

David

[1] Loss of link at scale in a multi-tier Clos is a fact of life as MTBF/count of devices means you see many regular "rare" failures and you will have measurable loss of links continually.