[c-nsp] loop guard still useful?

Tue Jan 19 10:26:04 EST 2016

> Of course no solution is perfect, you just have to pick solution which
> is least bad.

Agreed, but to cover all corner cases, you may have to combine different
solutions, unless your XSTP protocol fixes those for you (which rapid-pvst
doesn't).

> I view autonego least bad, compared to UDLD. UDLD is L2 BPDU and as
> such huge attack vector on 7600. If you want to protect yourself from
> this attack vector, you configure L2 MLS ratelimiter for BPDUs, but if
> you configure rate-limiter and run UDLD, then attacker can congest the
> rate-limiter easily and cause UDLD to detect fault and go down.

If the attacker is directly connected to that particular port where we
are discussing about running UDLD or not, then there are a bunch of other,
more important problems to fix. In any case he can just block his RX
channel or spoof whatever UDLD message he wants, which is way simpler than
congesting a rate-limiter.

If there is a switch in my control at the other end of the link, I
would argue that its that switches job to drop/terminate UDLD frames on
the untrusted ports.

The general idea is to run UDLD between switches in your administrative
domain, same as STP, not towards untrusted entities.

> There is no amount of software features that fixes software defects.

What we are doing here is handle predicable hardware and link
failures with software features like UDLD and fix stupid
assumptions in STP by using loop-guard.

> It's just logical fallacy, you can't say 'well NPU
> software might break, then let's add this UDLD software, to cover that
> problem'.

Many of those root causes are not SW related, but HW failures (like my
7600 linecard).

> Now you have two software's which can fail.

Both UDLD and loop guard are very simple features and very easy
to test. Especially when compared with complexity monsters like
rapid-pvst or MST, which in this context and discussion we are
running anyway.

In case UDLD is buggy, the worst case scenario would be:
- switch crash or
- switch isolation

A network meltdown in the entire layer 2 domain because of the
lack of UDLD feels like a bigger issue to me than a single
switch crash or isolation.

> It's just recursive problem trying to fix software defect by having
> another software feature running.

In order to be able to get up in the morning I have to assume that
that the number of hot code paths on my boxes with unfixed
catastrophic defects amount to less than 50%. I also assume that the
airbag in my car is mory likely to save my live than to end it.

Therefor, if I have a shadow of a doubt that there is scenario
in which my network will melt down, I will protect against that
scenario even though this means using more code paths.

So in your MPLS core you run ISIS/OSPF + LDP + RSVP +
BGP, but no BFD because of the complexity of the latter?

Regards,

Lukas