[c-nsp] Parity Errors and Cosmic Rays

Chris Roberts croberts at bongle.co.uk
Thu May 5 12:03:25 EDT 2005


> Is this actually a common problem? Or at least common enough 
> that I should expect to see it every other month or so? It 
> seems strange that this router has run for years and we've 
> never seen a memory parity error and now we've seen three in 
> three months.
> 

Sometime last year, we started seeing memory parity errors on our 7507s.
This was affecting one card. This gradually spread over the course of around
a month to 3 cards in the same platform, the first two of which were
replaced. This then spread to another chassis in the same rack, which then
started losing cards at the same rate over the course of a month. (See my
mails to this list at around the same time with around the same kind of
content as yours). I'd run 7505s at other ISPs for ~5 or more years and
never seen anything like this. Cisco simply wanted to replace each of the
offending items of hardware, however this was not fixing the spread. We then
lost a PA-GE with parity errors in one of our 7206s in another rack in the
same suite.

After much sobbing we took the 7507s out and upgraded our 6509s to Sup720s,
which so far have been rock solid, besides some installation issues and
teething problems. I realise this isn't a possibility for everyone though.

Some things that were suggested at the time:
* Cosmic rays
* Static protection in your data centres
* Metal filings getting into kit from people chopping floor tiles and such
and getting into the aircon
* Failing PSUs

Also, our offending 7507s were getting old (3-4 years apparently), but had
always been rock solid. I suspect it may have just been age that killed them
in the end, we never did find any trace of any of the above, although
obviously static and cosmic rays are hard to prove. At the time it was also
suggested that the TAC would be able to test the returned cards and provide
you with some kind of breakdown of the failure mode of the card and let you
know which components they had to replace, but that they would be loathe to
do this. Sure enough we requested the TAC do this, and they were loathe to
do it, and we've never followed this up as we still have most of the dead
cards and didn't RMA them, but I guess that might be something you may want
to do.

> Any thoughts?
> 
> Thanks,
> John

Cheers,
Chris.

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.859 / Virus Database: 585 - Release Date: 14/02/2005
 



More information about the cisco-nsp mailing list