[c-nsp] sup720's crashing and hanging

matt carter matt at iseek.com.au
Fri Aug 25 21:34:28 EDT 2006


hi all,

while there is so much discussion about sup720 i would like to run a wierd
problem by you. i have recently got two additional 7606 sup 720 3bxl,
nothing special... but i find that no matter what IOS i try, whether i boot
any of 12.2(18)SXF/SXF4 or 12.2(33)SRA or 12.2(17d)SXB11a from the CF disk0:
or the SRAM supervisor bootflash:  the boxes sometimes boot and sometimes
hang at the ATA monitor library or after decompressing the image or maybe
software crash here and there.... (except SXF4 which crashes a lot). both
sups are using rommon 8.4(2). ios has been verified after copy and flash
cards reformatted in the device to be sure to be sure but no matter, in any
case the SXB image was shipped on the sup bootflash by cisco and it hangs
too. generally i find a trend that if you let them auto boot using "boot
system disk0:" they will boot, but not ALWAYS, occasionally they will hang..
however, if you break to rommon and boot, its like the opposite, they will
nearly always hang, only occasionally booting successuflly. obviously a box
that doesnt ALWAYS boot isnt suitable for production..

the secondary duty manager for the aussie tac has his eyes on it and
apparently cisco got a bunch of senior engineers together on wednesday or so
for a "power solve" (having remote console access arranged etc) but we're no
closer to understanding what the hell is going on really.. my SE says he's
never seen anything like it. cisco have rma'ed the sups and they are on
their way, but im not convinced, because i got another 3bxl from previously
working router with rommon 8.1(3) and i see similar hanging after decompress
etc across a range of IOS.. thats a previous production sup in both of these
chassis. could i have some chassis problem? i feel ive run out of common
denominators to eliminate... or am i missing something blindlingly obvious
here?

this is a matrix i sent to cisco of booting 12.2(18)SXF 
(two routers powered on simeltaneously)
(12.2(18)SXF4 12.2(17d)SXB11a 12.2(33)SRA hang just as bad)

test conducted		left router		right router
test1 auto boot		boot ok		boot ok
test2 auto boot		boot ok		boot ok
test3 auto boot 		boot ok		boot ok
test4 auto boot		software crash	software crash
test5 boot disk0:		boot ok		boot ok
test6 boot disk0:		hang decompress	hang decompress
test7	boot disk0:		hang ata monitor	hang decompress
test8 boot disk0:		hang decompress	boot ok

i took this small video where you can see both chassis powered on
simeltaneously but one boots and one doesnt. (test8) .. as to which one
boots and which one hangs it chops and changes ..
http://matt.matt.name/tmp/test8-1.wmv .. when it hang after decompress, the
sup lights go from all orange to status=green like its about to kick into
the next phase but it just sits there forever. we left it for a few hours to
be sure to be sure.. of course we have taken everything out of the chassis
except the sup.


rommon 2 > dir bootflash:
         File size           Checksum   File name
  41123924 bytes (0x2738054)  0xd1c02453    s72033-psv-mz.122-17d.SXB11a.bin
rommon 3 > boot bootflash:
Loading image, please wait ...
 

Self decompressing the image :
#################################################
############################################################################
####
############################################################################
####
##################################################################### [OK]

*HUNG*


Initializing ATA monitor library...
string is disk0:s72033-ipservices_wan-mz.122-18.SXF.bin
Loading image, please wait ...


Initializing ATA monitor library...

Self extracting the image... [OK]
Self decompressing the image :
#################################################
################################################ [OK]

*HUNG*

if we actually get passed that stage and the IOS boots we might end up with
a software crash  eg

*** System received a Software forced crash *** signal= 0x17, code= 0x24,
context= 0x4219eb04 PC = 0x402956cc, Cause = 0x1020, Status Reg = 0x34008002

*** System received a Software forced crash *** signal= 0x17, code= 0x24,
context= 0x4219eb04 PC = 0x402956cc, Cause = 0x1020, Status Reg = 0x34008002

*** System received a Software forced crash *** signal= 0x17, code= 0x24,
context= 0x4219eb04 PC = 0x402956cc, Cause = 0x1020, Status Reg = 0x34008002


;--matt






More information about the cisco-nsp mailing list