[c-nsp] memory leaking in IOS 12.2(58)SE1 on 2960's

Wed Jul 20 15:35:19 EDT 2011

Hi Jiri,

When you mention logs are useless, do you mean you did not find
anything in the logs after logging on to the switch which freed up
some memory?

Any chance to collect the following command from the switch which
freed up some memory during the night?
sh mem allocating-process totals

This might sound stupid but can you confirm by looking at the uptime
that the switch did not crash? If it did, please collect the crashinfo
files and send them so I can take a look.

While monitoring the memory usage, if you see regular increase,
collect the following commands several times so you can compare them
later to see which process allocates most memory.
sh proc mem sorted
sh mem allocating-process totals

Best regards,
Andras

On Wed, Jul 20, 2011 at 1:22 PM, Jiri Prochazka
<jiri.prochazka at superhosting.cz> wrote:
> Hi Andras,
>
> All I was able to get from the switch was '%% Low on memory; try again
> later', so I had no chance to get any usefull info.
>
> None of them really crashed, even now (a few days after the issue raised)
> all are forwarding everything without any interruption. The only (doh)
> problem is that they are refusing any remote/local management.
>
> We have aproximately 40 2960's in our network, all were upgraded to
> 12.2(58)SE1 at the same night 42 days ago. Till this day four of them have
> shown this error (first one a week ago, the rest during the last 7 days).
>
> I will definitely implement graphing of memory usage and monitor this. Logs
> are useless, as there is absolutely none info regarding to this behaviour.
>
>
> update: Wow, one of 'crashed' switches surprisingly managed to free some
> memory over the night and there is no problem with remote login now!
>
> DC.Cisco.138#show mem
>                Head    Total(b)     Used(b)     Free(b)   Lowest(b)
> Largest(b)
> Processor    27A819C    21585348    19502124     2083224     1330816
>  1396804
>      I/O    2C00000     4194304     2385892     1808412     1647292
> 1803000
> Driver te    1A00000     1048576          44     1048532     1048532
>  1048532
>
>
>
> DC.Cisco.138#show proc mem sorted
> Processor Pool Total:   21585348 Used:   19506548 Free:    2078800
>      I/O Pool Total:    4194304 Used:    2385788 Free:    1808516
> Driver te Pool Total:    1048576 Used:         40 Free:    1048536
>
>  PID TTY  Allocated      Freed    Holding    Getbufs    Retbufs Process
>   0   0   20966064    3684020   13930872          0          0 *Init*
>   0   0  349880992  303545656    1758488    4520010     421352 *Dead*
>   0   0          0          0     722384          0          0 *MallocLite*
>  67   0     531728      17248     463548          0          0 Stack Mgr
> Notifi
>  81   0     488448        232     332392          0          0 HLFM address
> lea
>  104   0    6002260    6886956     234548          0          0 HACL Acl
> Manager
>  151   0    1161020     437668     214108          0          0 DTP Protocol
>  59   0     198956   34501644     208516          0          0 EEM ED ND
>  163   0     196740          0     203900          0          0 VMATM
> Callback
>  219   0     775680   39872788     186548          0          0 MLDSN L2MCM
>  16   0     312148     762860     145736          0     104780 Entity MIB
> API
>
>
>
> Thank you,
>
>
> Jiri
>
>
>
> Dne 20.7.2011 0:08, Tóth András napsal(a):
>>
>> Hi Jiri,
>>
>> Did you have a chance to collect the output of 'sh log' after logging
>> in via console? If yes, please send it over.
>> Did you observe a crash of the switch or only the error message?
>> How many times did you see this so far? How often is it happening?
>> How many 2960 switches running 12.2(58)SE1 do you have in total and on
>> how many did you see this?
>>
>> If the switch is working fine now, I would recommend monitoring the
>> memory usage and the rate of increase. Check the logs around that time
>> to see if you find anything related, such as dot1x errors, etc.
>>
>> Also, consider collecting the following commands when the error
>> message is seen again and open a Cisco TAC case if possible.
>> sh log
>> sh proc mem sorted
>> sh mem summary
>> sh mem allocating-process totals
>> sh tech
>>
>> Best regards,
>> Andras
>>
>>
>> On Tue, Jul 19, 2011 at 4:34 PM, Jiri Prochazka
>> <jiri.prochazka at superhosting.cz>  wrote:
>>>
>>> Hi,
>>>
>>> a month ago I have upgraded a few dozens of our access layer 2960's to
>>> the
>>> latest version of IOS (12.2(58)SE1) and during the last few days three of
>>> these upgraded switches suddently have stopped responding to SSH&  telnet
>>> access. Traffic coming from/to ports is still regulary forwarded.
>>>
>>> Connecting over the serial port gives me '%% Low on memory; try again
>>> later'
>>> into the log. The only solution I came to is to reload the switch.
>>>
>>>
>>> Does anybody else have similar problem with this version of IOS?
>>>
>>>
>>> As far as I know, we don't use any special configuration. One feature is
>>> nearly hitting the limit (127 STP instances), but we didn't have any
>>> problems with this so far.
>>>
>>>
>>>
>>> Thank you for your thoughts.
>>>
>>>
>>>
>>> --
>>> ---
>>>
>>> Kind regards,
>>>
>>>
>>> Jiri Prochazka
>>>
>>> _______________________________________________
>>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>>
>>
>> _______________________________________________
>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>
>
>
> --
> ---
>
> Kind regards,
>
>
> Jiri Prochazka
>
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>