[c-nsp] Parity Errors and Cosmic Rays
Joe McGuckin
joe at via.net
Thu May 5 14:53:14 EDT 2005
Don't forget Zinc whiskers as a potential culprit.
On 5/5/05 9:03 AM, "Chris Roberts" <croberts at bongle.co.uk> wrote:
>> Is this actually a common problem? Or at least common enough
>> that I should expect to see it every other month or so? It
>> seems strange that this router has run for years and we've
>> never seen a memory parity error and now we've seen three in
>> three months.
>>
>
> Sometime last year, we started seeing memory parity errors on our 7507s.
> This was affecting one card. This gradually spread over the course of around
> a month to 3 cards in the same platform, the first two of which were
> replaced. This then spread to another chassis in the same rack, which then
> started losing cards at the same rate over the course of a month. (See my
> mails to this list at around the same time with around the same kind of
> content as yours). I'd run 7505s at other ISPs for ~5 or more years and
> never seen anything like this. Cisco simply wanted to replace each of the
> offending items of hardware, however this was not fixing the spread. We then
> lost a PA-GE with parity errors in one of our 7206s in another rack in the
> same suite.
>
> After much sobbing we took the 7507s out and upgraded our 6509s to Sup720s,
> which so far have been rock solid, besides some installation issues and
> teething problems. I realise this isn't a possibility for everyone though.
>
> Some things that were suggested at the time:
> * Cosmic rays
> * Static protection in your data centres
> * Metal filings getting into kit from people chopping floor tiles and such
> and getting into the aircon
> * Failing PSUs
>
> Also, our offending 7507s were getting old (3-4 years apparently), but had
> always been rock solid. I suspect it may have just been age that killed them
> in the end, we never did find any trace of any of the above, although
> obviously static and cosmic rays are hard to prove. At the time it was also
> suggested that the TAC would be able to test the returned cards and provide
> you with some kind of breakdown of the failure mode of the card and let you
> know which components they had to replace, but that they would be loathe to
> do this. Sure enough we requested the TAC do this, and they were loathe to
> do it, and we've never followed this up as we still have most of the dead
> cards and didn't RMA them, but I guess that might be something you may want
> to do.
>
>> Any thoughts?
>>
>> Thanks,
>> John
>
> Cheers,
> Chris.
>
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.859 / Virus Database: 585 - Release Date: 14/02/2005
>
>
> _______________________________________________
> cisco-nsp mailing list cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
--
Joe McGuckin
ViaNet Communications
994 San Antonio Road
Palo Alto, CA 94303
Phone: 650-213-1302
Cell: 650-207-0372
Fax: 650-969-2124
More information about the cisco-nsp
mailing list