[c-nsp] OSPF: inconsistent SPF/LSA throttle timers

Mon Feb 24 06:43:03 EST 2014

> 
>we are currently running a small OSPF network with about 50+ boxes using
>default IOS timers. We would like to tune LSA/SPF throttle timers.
>
>Now, because some of the boxes have a decent CPU (ASR1000, 6500, etc)
>running
>more important traffic, and others have a small cpu, like ME3400 layer 3
>switches, we wanted to tune the OSPF timers on those important boxes
>specifically. SPF + RIB update on the ME3400 amounts for up to 250 ms,
>which is why we prefer to leave them as-is, or at least, roll those
>changes out gradually.

A general remark: There are two kinds of timers: Those which control LSA
flooding and reception, and those which control the SPF throttling. You
can use different SPF-throttle timers across the domain, at the "danger"
of running into transient loops in case intermediate nodes with lower
timers have not completed their SPF (you mention this later). So if you
have some nodes kicking SPF off after 50 msec, and others wait 5 seconds,
your end-to-end convergence might not have improved at all, but you are
aware of this.

For LSA origination/lsa-throttle, you have a dependency on the
min-lsa-arrival timer, which is, by default (and by RFC) 1 second. If you
tune your lsa-throttle in such a way that a node might flood two instances
of the same LSA within one second, other nodes which still run with the
min-lsa-arrival of 1 second will discard the latter instances, which is
not good as it requires retransmissions and impacts end-to-end convergence.

as for the default settings: you do want to tune down lsa-throttle on all
devices, by default IOS waits 5 second before sending another LSA, which
could be bad for flapping links.. and at least oder IOS releases waited ~5
second before starting SPF, have not followed any changes here lately..

>
>Here comes the question: a concern was raised, that if we are running an
>OSPF area with different LSA/SPF timers on the boxes, we may hit a race
>condition where a box discards an incoming LSA, and the OSPF database only
>recovers after the LSA is refreshed (after the 30 or 50 minutes) or it
>doesn't recover at all when using "flood-reduction" [1].

Yes. So the first thing you want to do is to make sure that all devices in
the flooding domain have "timers lsa arrival" with a low value like 20
msec configured. Then you can tune down the hold-interval value of "timers
throttle lsa all" down. a common deployment is

timers throttle lsa all 0 20 1000
 timers lsa arrival 20
 timers pacing flood 15

If you have devices which don't allow tuning the min-lsa-arrival timer,
you need to use a higher hold-down (I.e. "timers throttle lsa all 0 1000
1000").

As for SPF throttle, a common FC tuning is "timers throttle spf 50 100
5000"...

You don't want to configure flood reduction, I haven't seen this enabled
anywhere ever..

>
>Personally, I don't think thats the case. Also, none of the documentation
>I read suggests that. It would also mean that we cannot change the
>throttle
>timers in a live network gradually, but need to shutdown the entire OSPF
>network to adjust the timers. I can't believe thats the case.

No, that's a myth.. you might want to configure "timers lsa arrival" on
all boxes before you start with lsa-throttle.

>
>I do agree that it would be better to have a fully consistent
>configuration,
>and we will try to achieve that at one point, but until then, does anyone
>run a production network with inconsistent LSA/SPF throttle timers?

it's done, especially in a multi-vendor environment where not every device
supports SPF/LSA throttle (as do very old Cisco devices)..

>
>I have my doubts that big heterogeneous networks really use consistent
>SPF/LSA timers across all boxes.

mostly consistent, I would say. not every implementation supports the same
throttling algorithm, but the initial wait timers are sync'ed..

	oli