[c-nsp] 6504-E crash after bringing up lots of BGP sessions

Andy B. globichen at gmail.com
Thu Dec 3 18:32:12 EST 2009


On Thu, Dec 3, 2009 at 11:54 PM, Eninja <eninja at gmail.com> wrote:
> Andy,
>
> Your snipped 'sh ver' post is inadequate to understand the root cause of
> this problem.
>
> Unicast or broadcast a full 'sh ver' (prior to a reload), 'sh stack', and
> crashinfo files from both SP and RP if available.
>
> eninja
>

Unfortunately that's all the information I've got. No crashinfo has
been generated and while being live inside the console, it did nothing
but reload and the output was:


System Bootstrap, Version 12.2(17r)SX5, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 2006 by cisco Systems, Inc.

System Bootstrap, Version 8.5(2)
Copyright (c) 1994-2007 by cisco Systems, Inc.
Cat6k-Sup720/SP processor with 1048576 Kbytes of main memory

and really nothing before, except showing lots of BGP "Up" messages
from the other routers inside the same AS.


#sh stacks
Minimum process stacks:
 Free/Size   Name
 5704/6000   OIR IOS Process
 5536/6000   IPC Zone Manager
 5688/6000   ICC Retry Q
 4480/6000   IPC delayed init
 5704/6000   CDP BLOB
 5648/6000   FM HA Sync
 5656/6000   L3 Manager HA
 5632/6000   Draco FIB process
 4536/6000   Delayed Init Late Reg
 3528/6000   eobc_init_process
 5208/6000   ICC Slave Comp. Up
 5584/6000   PM MP Process
 2008/3000   EARL INFO CAPABILITY process
 5568/6000   DHCPD Receive
 5480/6000   C6K ENV RP init
 5024/6000   SPAN Subsystem
 5416/6000   PostOfficeNet
11464/12000  Router Init
10896/12000  CDP Protocol
11704/12000  cdp init process
 8320/12000  Init
 5112/6000   Draco DFS Port Registation Proc
 4880/6000   IPC LC Port Opener
 3864/6000   Update prst
 5392/6000   RADIUS INITCONFIG
 4856/6000   LCC Configure
 4984/6000   SLB RF Active Proc
 4688/6000   CEF Reloader
 4144/6000   draco-oir-process:slot 1
 4224/6000   draco-oir-process:slot 3
 4808/6000   BGP Accepter
 4272/6000   BGP Open
 3992/6000   draco-oir-process:slot 4
 2704/3000   Rom Random Update Process
 4800/6000   TFTP Read Process
34824/36000  TCP Command
 5552/6000   Link Status process
 8528/12000  Virtual Exec
 8432/12000  SSH Process
 8016/12000  Exec

Interrupt level stacks:
Level    Called Unused/Size  Name
  1     1289528   7632/9000  Inband Interrupt
  2      379375   7592/9000  EOBC Interrupt
  3       10555   8456/9000  Management Interrupt
  4     1579543   8600/9000  Console Uart
  5           0   9000/9000  Mistral Error Interrupt
  7     2637841   8584/9000  NMI Interrupt Handler

***************************************************
******* Information of Last System Crash **********
***************************************************


Using bootflash:crashinfo.

%Error opening bootflash:crashinfo (File not found)

***************************************************
****** Information of Last System Crash - SP ******
***************************************************


The last crashinfo failed to be written.
Please verify the exception crashinfo configuration
the filesytem devices, and the free space on the
filesystem devices.
Using crashinfo_FAILED.

%Error opening crashinfo_FAILED (File not found)
#



Weeks ago, when the same crash happened, I caught this error message
from the console:


*** System received a Software forced crash ***
signal= 0x17, code= 0x24, context= 0x42352a54
PC = 0x402d1e6c, Cause = 0x3020, Status Reg = 0x34008002

System Bootstrap, Version 8.5(2)
Copyright (c) 1994-2007 by cisco Systems, Inc.
Cat6k-Sup720/SP processor with 1048576 Kbytes of main memory


I only saw it once. It never came back on other crashes. A little
research told me that this error does not make sense, because all I
could find was a password reset issue. Nobody has physical access to
this router but me.

I should mention that this router worked fine for more than 15 months.
We are constantly adding new peers and customers to it, so the
workload is growing. But as I said, this is not the busiest router in
my network.

As of now I really have no idea where to look or how I could at least
narrow down the problem.


Andy


More information about the cisco-nsp mailing list