[c-nsp] IOS Upgrade to SXI3

Charles Spurgeon c.spurgeon at mail.utexas.edu
Tue Dec 15 18:51:18 EST 2009


On Fri, Dec 11, 2009 at 07:44:33AM -0800, Bautista, Noel wrote:

> We're contemplating on upgrading our SUP 720 3BXL from
> 12.2(18)SXF15a native IOS to 12.2(33)SXI3 modular IOS but I read
> from the releasenotes that the "Install" command has been
> deprecated.  On Cisco's Safe Harbor IOS Release, they have tested
> and recommend upgrading to modular 12.2(33)SXI3.  There's no
> explanation on why they deprecated the "install" command and I'm
> waiting for our Cisco SE response.  I'd appreciate any feedback from
> those people who have upgraded to SXI3, in modular or otherwise.

We upgraded three core routers to monolithic 12.2(33)SXI3 on Sunday,
Dec 13.

One of the upgraded routers started throwing SNMP input queue errors
after several hours of runtime. All three routers are polled by the
same servers asking for the same OIDs, but only one of the upgraded
routers has thrown any SNMP errors: 
"Dec 14 14:19:50: %SNMP-3-INPUT_QFULL_ERR: Packet dropped due to input queue full"

SNMP graphing stopped working coincident with these error msgs.

In an attempt to clear the errors we applied these commands that
were found when looking for info on this error:
snmp-server view public-view iso included
snmp-server view public-view ciscoMemoryPoolMIB excluded

Roughly coincident with applying those snmp config lines the SP CPU
went to 100 percent load, where it has remained stuck ever since. RP
CPU is running normally.

We have opened a TAC case, run a number of debugs, removed all SNMP
commands, etc. But the SP CPU is still pegged and we haven't been able
to find a smoking gun.

The biggest process load on the SP appears to be from an Async write
process:
--------------------
NOCA9-sp#show proc cpu | exc 0.00
Load for five secs: 100%/13%; one minute: 99%; five minutes: 99%
Time source is hardware calendar, 10:46:59.677 CST Mon Dec 14 2009

CPU utilization for five seconds: 100%/13%; one minute: 99%; five minutes: 99%
 PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process 
  42       52936      2280      23217  0.63%  0.07%  0.01%   0 Per-minute Jobs  
  93    51573408   1269609      40621 67.46% 65.15% 64.79%   0 Async write proc 
 111     2197532   3855803        569  1.91%  1.88%  1.91%   0 slcp process   
--------------------

We ran debug on SNMP packets and requests and found that the SNMP
traffic consists of well-behaved SNMP queries from just our set of
servers, polling only the MIB vars needed and there are no high
quantities of requests.

Meanwhile, there are an insane number of VeryBig buffers on the RP and
equally insane numbers of Medium buffers on the SP being created:
--------------------
RP
--------------------
VeryBig buffers, 4520 bytes (total 1013, permanent 10, peak 1016 @ 14:51:06):
     12 in free list (0 min, 100 max allowed)
     584335 hits, 21308 misses, 15077 trims, 16080 created
     14417 failures (0 no memory)

--------------------
SP
--------------------
Medium buffers, 256 bytes (total 30359, permanent 3000, peak 30359 @ 00:00:00):
     66 in free list (64 min, 3000 max allowed)
     1659825 hits, 9193 misses, 33 trims, 27392 created
     0 failures (0 no memory)

Other than this, we have not been able to find any other useful info.

Also, we have been seeing errors on a port-channel associated with one
of the other routers that was upgraded to SXI3. 

There have been bursts of errors received on the upstream router from
the upgraded router on the two 10GigE ints that make up the port
channel. As far as we can tell these ints were running clean until
SXI3 was loaded, but we're still investigating this issue.

-Charles

Charles E. Spurgeon / UTnet
UT Austin ITS / Networking
c.spurgeon at its.utexas.edu / 512.475.9265


More information about the cisco-nsp mailing list