[cisco-voip] New to me RTMT Alert: CoreDumpFileFound

Wed Aug 19 14:43:26 EDT 2009

You are looking at the performance data.  What you want to be doing is  
collecting the files via Trace and Log Collection.  That data will be  
for TAC to use to determine if there was a cpu spike, etc that  
triggered the crash.

Error won't be of much use at this point but if you end up opening a  
TAC SR the SDLs may have something.   It depends on the reason for the  
crash.  If it was resource starvation, then the traces won't be  
necessary.  If it was a bug related to call flow, etc then the traces  
may not be enough to pinpoint the reason.

-Ryan

On Aug 19, 2009, at 2:37 PM, Jeff Ruttman wrote:

Thanks Ryan.

At the least I'm making sure I know how to collect all this info as I've
not done it before.  On the RisDC Perfmon files, in the one that covers
the time of my core dump, there's oodles of counters to select.  Would I
have to select them all to capture them all?  I don't see a "select
all."  Or is there some other way to capture them all other than
checking a million checkboxes?

Also, Traces are enabled on my servers at the "error" level.  That's
where they should be by default?

Thanks
jeff

-----Original Message-----
From: Ryan Ratliff [mailto:rratliff at cisco.com]
Sent: Wednesday, August 19, 2009 12:16 PM
To: Jeff Ruttman
Cc: cisco-voip at puck.nether.net
Subject: Re: [cisco-voip] New to me RTMT Alert: CoreDumpFileFound

Your core was the ccm service.  The SDL OOS event was from the other
two nodes reporting they lost connection to the node that crashed.
The media resource events were those devices re-homing from the server
where the crash occurred to their backup server.

If this is the first time you've seen a crash then you need to decide if
you want to get root cause on it or not.

If you do want to pursue finding out what caused the crash then the
first thing you need to do is collect the CCM traces from all nodes
that cover the 15 minutes leading up to the crash.   Keep these zipped
up somewhere safe in case they are needed.   Also get the event viewer
(system and application) from the server where the crash occurred.
You'll also want the RisDC perfmon files from the day of the crash.

You can analyze the core yourself and if a bug search doesn't turn
anything up open a TAC SR and provide the dump analysis along with the
files listed above.
To analyze the core first do 'utils core list' and use that output in a
'utils core analyze' from the CLI of the server.  If that server is a
primary for phone registration then I'd advise waiting until after hours
to do the analyze.  If it's a backup and not heavily utilized then you
should be safe.

-Ryan

On Aug 19, 2009, at 1:02 PM, Jeff Ruttman wrote:

Greetings,

I received the coredumpfilefound message  below closely followed by 2 of
these:
SDLLinkOutOfService event generated. Current outstanding sdl oos
alarms: SDLLinkOOS LocalNodeId : 3 LocalApplicationID : 100
RemoteIPAddress : 10.10.3.51 RemoteNodeID : 4 RemoteApplicationID :
100 LinkID : 3:100:4:100 NodeID : ma3-ccm02 TimeStamp : Wed Aug 19
11:11:12 CDT 2009 The alert is generated on Wed Aug 19 11:11:41 CDT
2009 on node 10.14.3.50.

Followed by ResisteredMediaDevices decrease and increase RTMT messages.

Can anyone tell me what I'm seeing here?  Something like this? (Don't
laugh...I'm trying I'm trying! :))

The Cisco Log Partition Monitoring Tool on node dr-ccm03 had some sort
of problem causing the dumpfile to be generated.  The problem also
caused this SDLLink out of service--some sort of connectivity problem
between the problem node and our Pub and other Sub.  Then the media
devices increase/decrease messages suggest that the system has recovered
from the initial problem?

Maybe the only worthwhile questions are:  Should I be worried about this
message and what might I do about it?

Thanks
jeff

From: RTMT_Admin at ec2802.elderc.org [mailto:RTMT_Admin at ec2802.elderc.org]
Sent: Wednesday, August 19, 2009 11:12 AM
To: Jeff
Subject: [RTMT-ALERT-StandAloneCluster] CoreDumpFileFound

CoreDumpFileFound TotalCoresFound : 1 CoreDetails : The following lists
up to 6 cores dumped by corresponding applications. Core1 :
Cisco CallManager (core.10667.6.ccm.1250698260) AppID : Cisco Log
Partition Monitoring Tool ClusterID : NodeID : dr-ccm03 . The alarm is
generated on Wed Aug 19 11:11:11 CDT 2009.

CONFIDENTIALITY NOTICE: The information contained in this email
including attachments is intended for the specific delivery to and use
by the individual(s) to whom it is addressed, and includes information
which should be considered as private and confidential. Any review,
retransmission, dissemination, or taking of any action in reliance upon
this information by anyone other than the intended recipient is
prohibited. If you have received this message in error, please reply to
the sender immediately and delete the original message and any copy of
it from your computer system. Thank you.
_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip

CONFIDENTIALITY NOTICE: The information contained in this email  
including attachments is intended for the specific delivery to and use  
by the individual(s) to whom it is addressed, and includes information  
which should be considered as private and confidential. Any review,  
retransmission, dissemination, or taking of any action in reliance  
upon this information by anyone other than the intended recipient is  
prohibited. If you have received this message in error, please reply  
to the sender immediately and delete the original message and any copy  
of it from your computer system. Thank you.