[c-nsp] Cisco VIP2-50+PA-2FE-TX second ethernet port bug

Rodney Dunn rodunn at cisco.com
Fri Apr 8 11:08:35 EDT 2005


Gert,

It had been a bit since I had looked at the parity error
detection logic for the VIP's so I reviewed it again:

http://www.cisco.com/en/US/customer/products/hw/modules/ps2643/products_tech_note09186a0080094b15.shtml

Look at the "VIP PCI Bus Parity Error" section.

This says we've detected bad parity on the PCI bus coming
from the PA in bay 0.

The VIP 4 would report bad parity on the PCI bus a little different
and from what I can tell from that bug it does appear to be
bad parity on the PCI bus there too.

Let me try this in the lab again today.

Has anyone seen this on anything other than 12.0S based
code?

Rodney



On Fri, Apr 08, 2005 at 04:11:16PM +0200, Gert Doering wrote:
> Hi,
> 
> On Fri, Apr 08, 2005 at 09:10:16AM -0400, Rodney Dunn wrote:
> > I think I asked about this before and nobody resonded.
> 
> Sorry, it's still sitting in my TODO list.  Unfortunately, I've lost the
> crash dumps, so I couldn't send them to you...
> 
> > What is happening?
> > Vague comments don't help me.
> 
> What we see is:
> 
>   - 7507, RSP4
>   - VIP2-50 in VIP slot 0  (2 different VIP2's tried, different VIP slots tried)
>   - PA-2FE-TX  in PA bay 0 (2 different PA-2FEs tried, bay 1 is empty)
> 
> FastE0/0/0 runs perfectly smooth.
> 
> I can enable FastE0/0/1, but as soon as there is noticeable traffic,
> like a full BGP session coming up, or a "ping / sweep range of sizes",
> the VIP2-50 crashes.
> 
> IOS is 12.0(27)S2  
> 
> (but I *think* the crashes happened with 12.0(25)S or 12.0(26)S).
> 
> 
> If the crash happens, what ends up in syslog (*this* I have saved :) )
> is the following:
> 
> ------------------- snip ------------------
> Jan 26 18:17:14 c7500 94: %SYS-5-CONFIG_I: Configured from console by gert on console
> Jan 26 18:17:48 c7500 100: %VIP2-3-MSG: slot0 VIP-3-PCI_BUS0_SYSERROR: PCI bus 0 system error.
> Jan 26 18:17:48 c7500 101: %VIP2-1-MSG: slot0 PMA error register = 0082381800000000
> Jan 26 18:17:48 c7500 102: %VIP2-1-MSG: slot0     PCI master address = 0823818
> Jan 26 18:17:48 c7500 103: %VIP2-1-MSG: slot0 PA Bay 0 Upstream PCI-PCI Bridge, Handle=0
> Jan 26 18:17:48 c7500 104: %VIP2-1-MSG: slot0 DEC21050 bridge chip, config=0x0
> Jan 26 18:17:48 c7500 105: %VIP2-1-MSG: slot0 (0x00):dev, vendor id       = 0x00011011
> Jan 26 18:17:48 c7500 106: %VIP2-1-MSG: slot0 (0x04):status, command      = 0x42800147
> Jan 26 18:17:48 c7500 107: %VIP2-1-MSG: slot0          Signaled System Error  on primary bus
> Jan 26 18:17:48 c7500 108: %VIP2-1-MSG: slot0 (0x08):class code, revid    = 0x06040002
> Jan 26 18:17:48 c7500 109: %VIP2-1-MSG: slot0 (0x0C):hdr, lat timer, cls  = 0x00010000
> Jan 26 18:17:48 c7500 110: %VIP2-1-MSG: slot0 (0x18):sec lat,cls & bus no = 0x00010100
> Jan 26 18:17:48 c7500 111: %VIP2-1-MSG: slot0 (0x1C):sec status, io base  = 0x82807020
> Jan 26 18:17:48 c7500 112: %VIP2-1-MSG: slot0          Detected Parity Error  on secondary bus
> Jan 26 18:17:48 c7500 113: %VIP2-1-MSG: slot0 (0x20):mem base & limit     = 0x01F00000
> Jan 26 18:17:48 c7500 114: %VIP2-1-MSG: slot0 (0x24):prefetch membase/lim = 0x0000FE00
> Jan 26 18:17:48 c7500 115: %VIP2-1-MSG: slot0 (0x3C):bridge ctrl          = 0x00030000
> Jan 26 18:17:48 c7500 116: %VIP2-1-MSG: slot0 (0x40):arb/serr, chip ctrl  = 0x00100000
> Jan 26 18:17:48 c7500 117: %VIP2-1-MSG: slot0 (0x44):pri/sec trgt wait t. = 0x00000000
> Jan 26 18:17:49 c7500 118: %VIP2-1-MSG: slot0 (0x48):sec write attmp ctr  = 0x00FFFFFF
> Jan 26 18:17:49 c7500 119: %VIP2-1-MSG: slot0 (0x4C):pri write attmp ctr  = 0x00FFFFFF
> Jan 26 18:18:02 c7500 120: %VIP2-3-MSG: slot0 VIP-3-SVIP_RELOAD: SVIP Reload is called. 
> Jan 26 18:18:02 c7500 121: %VIP2-3-MSG: slot0 VIP-3-SYSTEM_EXCEPTION: VIP System Exception occurred sig=22, code=0x0, context=0x60A95688
> Jan 26 18:18:02 c7500 122:  
> Jan 26 18:18:03 c7500 123: %DBUS-3-DBUSINTERRSWSET: Slot 0, Internal Error due to VIP crash
> Jan 26 18:18:31 c7500 124: %SYS-3-CPUHOG: Task ran for 9420 msec (59/14), process = OIR Handler, PC = 4043FA88.
> Jan 26 18:18:31 c7500 125: -Traceback= 4043FA90
> ------------------- snip ------------------
> 
> it certainly looks like a "defective PA or VIP2", but as I said, we've
> already re-seated the PA, then swapped VIP2 and PA, and the new PA-2FE was 
> tested with a large "sweep range" ping in a 7200 (both ports, with no ill 
> effects).
> 
> > The only think I know about is:
> > 
> > CSCsa50332
> > Externally found moderate defect: Assigned (A)
> > VIP4-80 with PA-2FE-TX may crash with parity error
> 
> I'm not sure what the exact difference between the VIP2-50 and VIP4-80
> is (the "performance PDF" claims same performance numbers, so maybe 
> those aren't *that* different), but it looks like it...
> 
> > I've seen this with one customer and we've tried
> > very hard to recreate this in the lab and have
> > not been able to do it yet.
> 
> I had reported my problems on this list (cisco-nsp), and some other
> writers told me "we have not seen the problem, our PA-2FE works fine
> in a VIP2-50".  So it's working for some, and not for others :(
> 
> My hardware details:
> 
> ------------------- snip ----------------------
> Slot 0:
> 	Physical slot 0, ~physical slot 0xF, logical slot 0, CBus 0
> 	Microcode Status 0x4
> 	Master Enable, LED, WCS Loaded
> 	Board is analyzed 
> 	Pending I/O Status: None
> 	EEPROM format version 1
> 	VIP2 R5K controller, FRU: VIP2-50=, HW rev 2.03, board revision A0
> 	Serial number: 18952807  Part number: 73-2167-06
> 	Test history: 0x00        RMA number: 00-00-00
> 	Flags: cisco 7000 board; 7500 compatible
> 
> 	EEPROM contents (hex):
> 	  0x20: 01 1E 02 03 01 21 32 67 49 08 77 06 00 00 00 00
> 	  0x30: 50 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00
> 
> 	Slot database information:
> 	Flags: 0x4	Insertion time: 0x3EC8 (34w2d ago)
> 
> 	Controller Memory Size: 128 MBytes DRAM, 4096 KBytes SRAM
> 
> 	PA Bay 0 Information:
> 		Dual Port Fast Ethernet (RJ45), 2 ports, FRU: PA-2FE-TX=
>                 EEPROM format version 4
> 		HW rev 1.00, Board revision B0
> 		Serial number: JAE064006VR  Part number: 73-5419-06 
> 
> ------------------- snip ----------------------
> 
> PA Bay 1 is empty.
> 
> gert
> -- 
> USENET is *not* the non-clickable part of WWW!
>                                                            //www.muc.de/~gert/
> Gert Doering - Munich, Germany                             gert at greenie.muc.de
> fax: +49-89-35655025                        gert at net.informatik.tu-muenchen.de


More information about the cisco-nsp mailing list