[c-nsp] 6504-E IOS SSH/memory issues
Jared Mauch
jared at puck.nether.net
Mon Mar 24 09:28:19 EDT 2014
On Mar 24, 2014, at 9:16 AM, Patrick M. Hausen <hausen at punkt.de> wrote:
> I admit that I rarely log off, but rather just close the window running my SSH connection.
> Bad admin. ;-) But any sane OS should timeout the TCP connection eventually and
> then terminate the process waiting on that socket.
>
> IOS version is 15.1(2)SY1 advanced enterprise.
>
> How can I proceed finding and eliminating the root cause? Rebooting the box to clean
> up is an option if planned ahead, but not a suitable permanent fix (i.e. rebooting regularly
> is out of the question).
>
Sounds like a bug where it's not being deallocated.
There are a bunch of "show mem" and "show proc mem" commands that will help you
diagnose this so when you open the TAC case the folks there will have to actually do
real work as opposed to their "blame the customer, shut-up and reload, etc.." mindset
that persists with IOS devices.
Here's what you want to track:
Router#show memory allocating-process totals
Who is holding all the memory. This includes the PC of where it was allocated, so devs know
who is doing the leak alloc, then they just have to track where it actually gets freed (or not).
Track if things are holding the wrong amount of memory, eg:
Router#show proc mem sorted
In my case on a 6500 the BGP Router is the largest. I've seen it be 2nd compared to something else.
What you really care about is the 'Holding' column.
You also want to check that the 'Dead' memory isn't growing:
Router# show mem dead totals
In my experience Cisco does a poor job of tracking these types of defects and does almost zero testing of SSH in the lab. We have often found that the overhead of the SSH process means some commands can and will crash the router when issued via SSH but operate fine when using telnet or console.
This should help you get on the right path. Remember, software bugs happen, no software is perfect so expectations that you will never have to upgrade are not rational.
- Jared
More information about the cisco-nsp
mailing list