[nsp] VIP & RSP crashes: determining memory location

From: Steven W. Raymond (steven_raymond@eli.net)
Date: Mon Mar 04 2002 - 13:01:41 EST


Can anyone shed some light on understanding the output of crashinfo
files, particularly as it pertains to determining which memory component
is the culprit in parity error crashes?
It is my understanding that memory parity errors can occur in the
following 7500 VIP & RSP components:
1) DRAM
2) SRAM
3) Processor cache memory

When a parity error happens, crashinfo files like the following two
examples are generated:

Error: primary data cache, fields: data, SysAD
virtual addr 0x63656803, physical addr(21:3) 0x256800, vAddr(14:12)
0x2000
virtual address corresponds to main:mainheap, cache word 0

             Low Data High Data Par Low Data High Data Par
L1 Data : 0:0x6320DE70 0x6320DE60 0x55 1:0x6320DE70 0x6320DE70 0x55
           2:0x6320DE70 0x6320DE70 0x55 3:0x6320DE70 0x6320DE70 0x55

             Low Data High Data Par Low Data High Data Par
DRAM Data: 0:0x6320DE70 0x6320DE60 0x55 1:0x6320DE70 0x6320DE70 0x55
           2:0x6320DE70 0x6320DE70 0x55 3:0x6320DE70 0x6320DE70 0x55

or

Error: primary data cache, fields: data,
virtual addr 0x61F7E015, physical addr(21:3) 0x37E010, vAddr(14:12)
0x2000
virtual address corresponds to main:mainheap, cache word 2

             Low Data High Data Par Low Data High Data Par
L1 Data : 0:0x613885A0 0x613885A0 0xEE 1:0x62272B80 0x613885A0 0x9E
           2:0x61BCE810 0x61FCE810 0x99 3:0x61FCE810 0x61FCE810 0x99

             Low Data High Data Par Low Data High Data Par
DRAM Data: 0:0x61FCE810 0x61FCE810 0x99 1:0x62272B80 0x61FCE810 0x99
           2:0x61BCE810 0x61FCE810 0x99 3:0x61FCE810 0x61FCE810 0x99

Question:
Is it possible (or even fruitful) to try and use this information to
determine in which memory component the parity error happened? I would
prefer to replace just the failing DIMMs or SRAM, rather than taking the
shotgun approach of replacing the whole VIP when a parity error occurs.

Cisco usually recommends waiting for the second parity error before
replacement. It would help if I could precisely identify where the
faults are occurring, but am stymied because one day a crash will seem
to come from a VIP, and two weeks later a different parity error happens
on the RSP or even another VIP. The problems appear to shift all over
the place when one component is failing.

Any advice on this topic is appreciated.

Thank you



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:13:07 EDT