[c-nsp] 3750: SNMP-3-INPUT_QFULL_ERR, ssh session dies, show tech support fails, switch stack crashes on reload

Sebastian Beutel sebastian.beutel at rus.uni-stuttgart.de
Wed May 7 09:37:37 EDT 2014


Hi Jeff,

On Mon, May 05, 2014 at 10:46:28PM -0400, Jeff Kell wrote:
> On 5/5/2014 11:10 AM, Darren O'Connor wrote:
> > Never seen it myself, but googling around brings up a few things.
> >
> > Did this recently start? Any other switch on the same code having the
> > same issues or not? Generally if five different devices all start having
> > the same issue an external issue is to blame. Maybe your SNMP server is
> > sending a particular packet that this IOS code doesn't like?
> >
> > Have you tried restarting SNMP itself on the switch?
> 
> Are these stacks of more than two switches?  And are they the original
> 3750Gs, or something else?
>
Until now the five affected stacks consisted from 3 or 4 switches of
different combinations of C3750-48TS and C3750G-24TS-1U.
 
>
> We have had recurring problems with a 4-stack of 3750-48Gs that for
> various reasons end up with MALLOC errors (out of memory) and you can no
> longer establish an SSH, Telnet, nor even serial console connection
> "%Low on memory, try again later".
>
We did not see any error message indicating a lack of memory. And even
though a running ssh session crashed it still was possible to reconnect to
the device. 

> 
> This started with the 12.2 train and has continued into the 15.x train. 
> We are NOT yet on the latest-and-greatest which as explained to me by
> our account rep is a result of adding "bells and whistles" to the IOS
> while these original 3750s are already memory constrained.  Supposedly
> this was addressed in the most recent 15.x release to be more
> "conservative" about memory utilization.  However, our stack is
> presently "stuck" in the "Low on memory, try again later" state and will
> require a hard reload (power cycle).  Supposedly this only affects
> stacks of > 2 switches.  Simply power cycling the current stack the last
> time around lasted about an hour before running out of memory again. 
> They continue to forward packets (thankfully) but you can't do anything
> with them at all.  We plan an update to the latest 15.x release at the
> next maintenance window, but since this stack powers one of our primary
> server farms (top-of-racks), we can't just arbitrarily power cycle them.
> 
> TAC has been less than useful, and this started over a year ago, but
> seems to recur more often in the 15.x train.
> 
> If this sounds familiar, I can provide some case numbers of past
> attempts to remedy this... but previously a power-cycle would clear it
> up for a few months (while the 15.x train is down to hours).
>
We allready did some research on the cisco bug database but found nothing
that came close to what we observed. I also don't believe that we have a
problem with dead memory and the like. Anyhow a "sho mem stat" did not
suggested any lack of memory (as far as i understood the output of it). 

Thanks,
      Sebastian.


More information about the cisco-nsp mailing list