[c-nsp] RAM thing

Phil Mayers p.mayers at imperial.ac.uk
Mon Feb 17 19:47:53 EST 2014


On 17/02/2014 20:53, Saku Ytti wrote:
> On (2014-02-17 19:13 +0000), Phil Mayers wrote:
>
>> As someone else has pointed out, the Cisco description is of a
>> sudden hard failure triggered by a power cycle, not some kind of
>> progressive degradation AFAICT.
>>
>> Do you have information to the contrary?
>
> No. It was more of a general question of what type of memory failure is
> acceptable and will we pay premium for product which has more graceful
> failure-modes.

Honestly, I think we're a long way from the hardware failure modes being 
the main issue of modern networking devices... most of it can't even do 
the job it's advertised for, for months or years after release until 
software stabilises.

Personally I think the blatant inability to deliver reliable software is 
more of a threat than hardware failure right now. But then I'm feeling 
particularly grumpy as I have 9 support cases open with 3 vendors right 
now...

(At this precise moment in time, I'd settle for an edge switch which can 
do decent DHCP/IPv6 security without costing over £4k and being <90cm 
deep, a core router with MPLS and working netflow, and a firewall that 
didn't crash when you typed "show session". The quality of the RAM 
inside is so far down my list of complaints it's not even funny...)

But yes, in theory, I don't mind paying more for ECC RAM and things like 
GOLD, ability to degrade to a subset of fabric channels, and so on. To 
what degree is hard to quantify - they're not optional extras on the 
platforms that have them, and we buy those platforms for other, feature 
reasons.

> I recently ran into (likely) memory issue which caused sporadic corruption in
> IP header, had it occurred anywhere else than IP headers or were we IPv6 only,
> it would have been invisible to me. Is there 98.7% probability that some kit
> in my network is currently corrupting packets outside header?

Well, a 98.7% probability per-packet is obviously catastrophic.

0.1% is pretty terrible too.

10e-12 is OTOH negligible.

So clearly it's not binary yes/no. I suspect for most operators there's 
a sweet spot in pricing that is a function of what upper-layers (and 
thus customers) will tolerate, price, and what else you lose in the 
tradeoff.


More information about the cisco-nsp mailing list