[c-nsp] memory leaking in IOS 12.2(58)SE1 on 2960's

Mon Jul 25 04:40:35 EDT 2011

Hi Adnras,

Dne 20.7.2011 21:35, Tóth András napsal(a):
> Hi Jiri,
>
> When you mention logs are useless, do you mean you did not find
> anything in the logs after logging on to the switch which freed up
> some memory?
>

Yup, there were no signs of anything unusual in the log. logging 
severity is set to notifications.

> Any chance to collect the following command from the switch which
> freed up some memory during the night?
> sh mem allocating-process totals

DC.Cisco.138#sh mem allocating-process totals					
	Total(b)	Used(b)	Free(b) 	Lowest(b)	  Largest(b)
Processor	21585348	19547768	2037580	133081	1374036

     PC        	   Total  	 Count  	 Name
0x015D73F4    	2202188	277	 Process Stack
0x0032C018    	1213820	1050	 *Packet Header*
0x005B1364    	743256	74	 Flashfs Sector
0x00F81528    	712840	8	 Init
0x00E7B38C    	523328	85	 Init
0x01546F8C    	496176	36	 TW Buckets
0x0048A008    	439340	1	 Init
0x01443754    	393480	6	 STP Port Control Block Chunk
0x01011B34    	292956	3149	 IPC Zone
0x0032F68C    	262720	6	 pak subblock chunk
0x00A6BA2C    	262232	2	 CEF: hash table
0x00489FD8    	256300	1	 Init
0x0079E27C    	250672	2	 PM port_data
0x0158BD78    	207900	275	 Process
0x00339870    	203148	57	 *Hardware IDB*
0x01011BDC    	196740	3	 IPC Message Hea
0x0016CDD0    	196740	3	 Mat Addr Tbl Ch
0x004EE5A8    	196652	1	 HRM: destination array
0x015F68A8    	191876	3	 EEM ED ND
0x00E5C79C    	184320	2	 event_trace_tbs
0x0032C06C    	164640	4	 *Packet Data*
0x00809DC8    	163884	1	 Init
0x00949AF4    	145484	399	 MLDSN L2MCM
0x004F6FA8    	135652	29	 HULC_MAD_SD_MGR
0x01030A50    	133468	383	 Virtual Exec
0x013F2930    	132728	7	 VLAN Manager
0x0000E8BC    	132132	11	 DTP Protocol
0x00AD52E0    	131976	4	 VRFS: MTRIE n08
0x00336804    	131116	1	 *Init*
0x014271B0    	130376	12	 SNMP SMALL CHUN
0x007910A8    	129948	51	 PM port sub-block
0x016F4304    	125244	1820	 Init
0x009561E4    	110676	399	 MLDSN L2MCM
0x0048A020    	109868	1	 Init

Unfortunately I'm not familiar with usual values these processes should 
allocate.

>
> This might sound stupid but can you confirm by looking at the uptime
> that the switch did not crash? If it did, please collect the crashinfo
> files and send them so I can take a look.

The switch did not crash, it's uptime is over 6 weeks now.

>
> While monitoring the memory usage, if you see regular increase,
> collect the following commands several times so you can compare them
> later to see which process allocates most memory.
> sh proc mem sorted
> sh mem allocating-process totals
>

Memory graphing is being implemented now. As soon as I have relevant 
graphs, I will gather info given by these commands.

Thank you,

Jiri

>
> Best regards,
> Andras
>
>
> On Wed, Jul 20, 2011 at 1:22 PM, Jiri Prochazka
> <jiri.prochazka at superhosting.cz>  wrote:
>> Hi Andras,
>>
>> All I was able to get from the switch was '%% Low on memory; try again
>> later', so I had no chance to get any usefull info.
>>
>> None of them really crashed, even now (a few days after the issue raised)
>> all are forwarding everything without any interruption. The only (doh)
>> problem is that they are refusing any remote/local management.
>>
>> We have aproximately 40 2960's in our network, all were upgraded to
>> 12.2(58)SE1 at the same night 42 days ago. Till this day four of them have
>> shown this error (first one a week ago, the rest during the last 7 days).
>>
>> I will definitely implement graphing of memory usage and monitor this. Logs
>> are useless, as there is absolutely none info regarding to this behaviour.
>>
>>
>> update: Wow, one of 'crashed' switches surprisingly managed to free some
>> memory over the night and there is no problem with remote login now!
>>
>> DC.Cisco.138#show mem
>>                 Head    Total(b)     Used(b)     Free(b)   Lowest(b)
>> Largest(b)
>> Processor    27A819C    21585348    19502124     2083224     1330816
>>   1396804
>>       I/O    2C00000     4194304     2385892     1808412     1647292
>> 1803000
>> Driver te    1A00000     1048576          44     1048532     1048532
>>   1048532
>>
>>
>>
>> DC.Cisco.138#show proc mem sorted
>> Processor Pool Total:   21585348 Used:   19506548 Free:    2078800
>>       I/O Pool Total:    4194304 Used:    2385788 Free:    1808516
>> Driver te Pool Total:    1048576 Used:         40 Free:    1048536
>>
>>   PID TTY  Allocated      Freed    Holding    Getbufs    Retbufs Process
>>    0   0   20966064    3684020   13930872          0          0 *Init*
>>    0   0  349880992  303545656    1758488    4520010     421352 *Dead*
>>    0   0          0          0     722384          0          0 *MallocLite*
>>   67   0     531728      17248     463548          0          0 Stack Mgr
>> Notifi
>>   81   0     488448        232     332392          0          0 HLFM address
>> lea
>>   104   0    6002260    6886956     234548          0          0 HACL Acl
>> Manager
>>   151   0    1161020     437668     214108          0          0 DTP Protocol
>>   59   0     198956   34501644     208516          0          0 EEM ED ND
>>   163   0     196740          0     203900          0          0 VMATM
>> Callback
>>   219   0     775680   39872788     186548          0          0 MLDSN L2MCM
>>   16   0     312148     762860     145736          0     104780 Entity MIB
>> API
>>
>>
>>
>> Thank you,
>>
>>
>> Jiri
>>
>>
>>
>> Dne 20.7.2011 0:08, Tóth András napsal(a):
>>>
>>> Hi Jiri,
>>>
>>> Did you have a chance to collect the output of 'sh log' after logging
>>> in via console? If yes, please send it over.
>>> Did you observe a crash of the switch or only the error message?
>>> How many times did you see this so far? How often is it happening?
>>> How many 2960 switches running 12.2(58)SE1 do you have in total and on
>>> how many did you see this?
>>>
>>> If the switch is working fine now, I would recommend monitoring the
>>> memory usage and the rate of increase. Check the logs around that time
>>> to see if you find anything related, such as dot1x errors, etc.
>>>
>>> Also, consider collecting the following commands when the error
>>> message is seen again and open a Cisco TAC case if possible.
>>> sh log
>>> sh proc mem sorted
>>> sh mem summary
>>> sh mem allocating-process totals
>>> sh tech
>>>
>>> Best regards,
>>> Andras
>>>
>>>
>>> On Tue, Jul 19, 2011 at 4:34 PM, Jiri Prochazka
>>> <jiri.prochazka at superhosting.cz>    wrote:
>>>>
>>>> Hi,
>>>>
>>>> a month ago I have upgraded a few dozens of our access layer 2960's to
>>>> the
>>>> latest version of IOS (12.2(58)SE1) and during the last few days three of
>>>> these upgraded switches suddently have stopped responding to SSH&    telnet
>>>> access. Traffic coming from/to ports is still regulary forwarded.
>>>>
>>>> Connecting over the serial port gives me '%% Low on memory; try again
>>>> later'
>>>> into the log. The only solution I came to is to reload the switch.
>>>>
>>>>
>>>> Does anybody else have similar problem with this version of IOS?
>>>>
>>>>
>>>> As far as I know, we don't use any special configuration. One feature is
>>>> nearly hitting the limit (127 STP instances), but we didn't have any
>>>> problems with this so far.
>>>>
>>>>
>>>>
>>>> Thank you for your thoughts.
>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>>
>>>> Kind regards,
>>>>
>>>>
>>>> Jiri Prochazka
>>>>
>>>> _______________________________________________
>>>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>>>
>>>
>>> _______________________________________________
>>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>>
>>
>>
>> --
>> ---
>>
>> Kind regards,
>>
>>
>> Jiri Prochazka
>>
>> _______________________________________________
>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>
>
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>