[c-nsp] memory leaking in IOS 12.2(58)SE1 on 2960's
Jiri Prochazka
jiri.prochazka at superhosting.cz
Mon Jul 25 04:40:35 EDT 2011
Hi Adnras,
Dne 20.7.2011 21:35, Tóth András napsal(a):
> Hi Jiri,
>
> When you mention logs are useless, do you mean you did not find
> anything in the logs after logging on to the switch which freed up
> some memory?
>
Yup, there were no signs of anything unusual in the log. logging
severity is set to notifications.
> Any chance to collect the following command from the switch which
> freed up some memory during the night?
> sh mem allocating-process totals
DC.Cisco.138#sh mem allocating-process totals
Total(b) Used(b) Free(b) Lowest(b) Largest(b)
Processor 21585348 19547768 2037580 133081 1374036
PC Total Count Name
0x015D73F4 2202188 277 Process Stack
0x0032C018 1213820 1050 *Packet Header*
0x005B1364 743256 74 Flashfs Sector
0x00F81528 712840 8 Init
0x00E7B38C 523328 85 Init
0x01546F8C 496176 36 TW Buckets
0x0048A008 439340 1 Init
0x01443754 393480 6 STP Port Control Block Chunk
0x01011B34 292956 3149 IPC Zone
0x0032F68C 262720 6 pak subblock chunk
0x00A6BA2C 262232 2 CEF: hash table
0x00489FD8 256300 1 Init
0x0079E27C 250672 2 PM port_data
0x0158BD78 207900 275 Process
0x00339870 203148 57 *Hardware IDB*
0x01011BDC 196740 3 IPC Message Hea
0x0016CDD0 196740 3 Mat Addr Tbl Ch
0x004EE5A8 196652 1 HRM: destination array
0x015F68A8 191876 3 EEM ED ND
0x00E5C79C 184320 2 event_trace_tbs
0x0032C06C 164640 4 *Packet Data*
0x00809DC8 163884 1 Init
0x00949AF4 145484 399 MLDSN L2MCM
0x004F6FA8 135652 29 HULC_MAD_SD_MGR
0x01030A50 133468 383 Virtual Exec
0x013F2930 132728 7 VLAN Manager
0x0000E8BC 132132 11 DTP Protocol
0x00AD52E0 131976 4 VRFS: MTRIE n08
0x00336804 131116 1 *Init*
0x014271B0 130376 12 SNMP SMALL CHUN
0x007910A8 129948 51 PM port sub-block
0x016F4304 125244 1820 Init
0x009561E4 110676 399 MLDSN L2MCM
0x0048A020 109868 1 Init
Unfortunately I'm not familiar with usual values these processes should
allocate.
>
> This might sound stupid but can you confirm by looking at the uptime
> that the switch did not crash? If it did, please collect the crashinfo
> files and send them so I can take a look.
The switch did not crash, it's uptime is over 6 weeks now.
>
> While monitoring the memory usage, if you see regular increase,
> collect the following commands several times so you can compare them
> later to see which process allocates most memory.
> sh proc mem sorted
> sh mem allocating-process totals
>
Memory graphing is being implemented now. As soon as I have relevant
graphs, I will gather info given by these commands.
Thank you,
Jiri
>
> Best regards,
> Andras
>
>
> On Wed, Jul 20, 2011 at 1:22 PM, Jiri Prochazka
> <jiri.prochazka at superhosting.cz> wrote:
>> Hi Andras,
>>
>> All I was able to get from the switch was '%% Low on memory; try again
>> later', so I had no chance to get any usefull info.
>>
>> None of them really crashed, even now (a few days after the issue raised)
>> all are forwarding everything without any interruption. The only (doh)
>> problem is that they are refusing any remote/local management.
>>
>> We have aproximately 40 2960's in our network, all were upgraded to
>> 12.2(58)SE1 at the same night 42 days ago. Till this day four of them have
>> shown this error (first one a week ago, the rest during the last 7 days).
>>
>> I will definitely implement graphing of memory usage and monitor this. Logs
>> are useless, as there is absolutely none info regarding to this behaviour.
>>
>>
>> update: Wow, one of 'crashed' switches surprisingly managed to free some
>> memory over the night and there is no problem with remote login now!
>>
>> DC.Cisco.138#show mem
>> Head Total(b) Used(b) Free(b) Lowest(b)
>> Largest(b)
>> Processor 27A819C 21585348 19502124 2083224 1330816
>> 1396804
>> I/O 2C00000 4194304 2385892 1808412 1647292
>> 1803000
>> Driver te 1A00000 1048576 44 1048532 1048532
>> 1048532
>>
>>
>>
>> DC.Cisco.138#show proc mem sorted
>> Processor Pool Total: 21585348 Used: 19506548 Free: 2078800
>> I/O Pool Total: 4194304 Used: 2385788 Free: 1808516
>> Driver te Pool Total: 1048576 Used: 40 Free: 1048536
>>
>> PID TTY Allocated Freed Holding Getbufs Retbufs Process
>> 0 0 20966064 3684020 13930872 0 0 *Init*
>> 0 0 349880992 303545656 1758488 4520010 421352 *Dead*
>> 0 0 0 0 722384 0 0 *MallocLite*
>> 67 0 531728 17248 463548 0 0 Stack Mgr
>> Notifi
>> 81 0 488448 232 332392 0 0 HLFM address
>> lea
>> 104 0 6002260 6886956 234548 0 0 HACL Acl
>> Manager
>> 151 0 1161020 437668 214108 0 0 DTP Protocol
>> 59 0 198956 34501644 208516 0 0 EEM ED ND
>> 163 0 196740 0 203900 0 0 VMATM
>> Callback
>> 219 0 775680 39872788 186548 0 0 MLDSN L2MCM
>> 16 0 312148 762860 145736 0 104780 Entity MIB
>> API
>>
>>
>>
>> Thank you,
>>
>>
>> Jiri
>>
>>
>>
>> Dne 20.7.2011 0:08, Tóth András napsal(a):
>>>
>>> Hi Jiri,
>>>
>>> Did you have a chance to collect the output of 'sh log' after logging
>>> in via console? If yes, please send it over.
>>> Did you observe a crash of the switch or only the error message?
>>> How many times did you see this so far? How often is it happening?
>>> How many 2960 switches running 12.2(58)SE1 do you have in total and on
>>> how many did you see this?
>>>
>>> If the switch is working fine now, I would recommend monitoring the
>>> memory usage and the rate of increase. Check the logs around that time
>>> to see if you find anything related, such as dot1x errors, etc.
>>>
>>> Also, consider collecting the following commands when the error
>>> message is seen again and open a Cisco TAC case if possible.
>>> sh log
>>> sh proc mem sorted
>>> sh mem summary
>>> sh mem allocating-process totals
>>> sh tech
>>>
>>> Best regards,
>>> Andras
>>>
>>>
>>> On Tue, Jul 19, 2011 at 4:34 PM, Jiri Prochazka
>>> <jiri.prochazka at superhosting.cz> wrote:
>>>>
>>>> Hi,
>>>>
>>>> a month ago I have upgraded a few dozens of our access layer 2960's to
>>>> the
>>>> latest version of IOS (12.2(58)SE1) and during the last few days three of
>>>> these upgraded switches suddently have stopped responding to SSH& telnet
>>>> access. Traffic coming from/to ports is still regulary forwarded.
>>>>
>>>> Connecting over the serial port gives me '%% Low on memory; try again
>>>> later'
>>>> into the log. The only solution I came to is to reload the switch.
>>>>
>>>>
>>>> Does anybody else have similar problem with this version of IOS?
>>>>
>>>>
>>>> As far as I know, we don't use any special configuration. One feature is
>>>> nearly hitting the limit (127 STP instances), but we didn't have any
>>>> problems with this so far.
>>>>
>>>>
>>>>
>>>> Thank you for your thoughts.
>>>>
>>>>
>>>>
>>>> --
>>>> ---
>>>>
>>>> Kind regards,
>>>>
>>>>
>>>> Jiri Prochazka
>>>>
>>>> _______________________________________________
>>>> cisco-nsp mailing list cisco-nsp at puck.nether.net
>>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>>>
>>>
>>> _______________________________________________
>>> cisco-nsp mailing list cisco-nsp at puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>>
>>
>>
>> --
>> ---
>>
>> Kind regards,
>>
>>
>> Jiri Prochazka
>>
>> _______________________________________________
>> cisco-nsp mailing list cisco-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>
>
> _______________________________________________
> cisco-nsp mailing list cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
More information about the cisco-nsp
mailing list