[cisco-voip] CallManager Process being restarted (8.5(1))

Ryan Ratliff (rratliff) rratliff at cisco.com
Mon Dec 3 12:22:15 EST 2018


There most certainly will be, but what it is depends on the hardware. It’s a server so there will be some type of management agent that has probably been trying to get somebody’s attention for a few weeks.

I wouldn’t be surprised if there is a non-green led or two if you look at the disks themselves.

-Ryan

On Dec 3, 2018, at 12:18 PM, Nilson Costa <nilsonlino at gmail.com<mailto:nilsonlino at gmail.com>> wrote:

Is there anyway to troubleshoot the disks to see which one is on defect?

Regards

Em seg, 3 de dez de 2018 às 15:08, Ryan Ratliff (rratliff) <rratliff at cisco.com<mailto:rratliff at cisco.com>> escreveu:
#1  0x044a9935 in raise () from /lib/tls/libc.so.6
#2  0x044ab399 in abort () from /lib/tls/libc.so.6
#3  0x0842e457 in preabort () at ProcessCMProcMon.cpp:80
#4  0x0842fe7c in CMProcMon::verifySdlRouterServices () at ProcessCMProcMon.cpp:720

The ccm process is killing itself because it isn’t getting enough resources.

Nov 29 17:26:12 CMBL-03-01 local7 2 : 1: CMBL-03-01.localdomain: Nov 29 2018 19:26:12.340 UTC :  %UC_CALLMANAGER-2-CallManagerFailure: %[HostName=CMBL-03-01][IPAddress=192.168.183.3][Reason=4][Text=CCM Intentional Abort: SignalName: SIPSetupInd, DestPID: SIPD[1:100:67:7]][AppID=Cisco CallManager][ClusterID=StandAloneCluster][NodeID=CMBL-03-01]: Indicates an internal failure in Unified CM

So much good info in the syslog.
Here’s a super-useful tidbit.

Nov 28 03:59:23 CMBL-03-01 local7 2 : 1543: CMBL-03-01.localdomain: Nov 28 2018 05:59:23.840 UTC :  %UC_RTMT-2-RTMT_ALERT: %[AlertName=CallProcessingNodeCpuPegging][AlertDetail= Processor load over configured threshold for configured duration of time . Configured high threshold is 90 % tomcat (2 percent) uses most of the CPU.
 Processor_Info:

 For processor instance 1: %CPU= 99, %User= 2, %System= 2, %Nice= 0, %Idle= 0, %IOWait= 97, %softirq= 0, %irq= 0.

 For processor instance _Total: %CPU= 93, %User= 2, %System= 1, %Nice= 0, %Idle= 7, %IOWait= 90, %softirq= 0, %irq= 0.

 For processor instance 0: %CPU= 86, %User= 2, %System= 1, %Nice= 0, %Idle= 14, %IOWait= 83, %softirq= 0, %irq= 0.

 For processor instance 3: %CPU= 87, %User= 2, %System= 2, %Nice= 0, %Idle= 13, %IOWait= 83, %softirq= 0, %irq= 0.

 For processor instance 2: %CPU= 99, %User= 4, %System= 1, %Nice= 0, %Idle= 0, %IOWait= 96, %softirq= 0, %irq= 0.
 ][AppID=Cisco AMC Service][ClusterID=][NodeID=CMBL-03-01]: RTMT Alert

Looking back just a bit further, and there are a TON of these.

Nov 15 21:22:00 CMBL-03-01 local7 2 : 582: CMBL-03-01.localdomain: Nov 15 2018 23:22:00.256 UTC :  %UC_RTMT-2-RTMT_ALERT: %[AlertName=HardwareFailure][AlertDetail=     At Thu Nov 15 21:22:00 BRST 2018 on node 192.168.183.3, the following HardwareFailure events generated:  hwStringMatch : Nov 15 21:21:26 CMBL-03-01 daemon 4 Director Agent: LSIESG_DiskDrive_Modified 500605B0027C6D50 Command timeout on PD 01(e0xfc/s1) Path 500000e116ac4ce2, CDB: 2a 00 10 98 b9 9d 00 00 08 00 Sev: 3. AppID : Cisco Syslog Agent ClusterID :  NodeID : CMBL-03-01  TimeStamp : Thu Nov 15 21:21:26 BRST 2018   hwStringMatch : Nov 15 21:21:26 CMBL-03-01 daemon 4 Director Agent: LSIESG_AlertIndication 500605B0027C6D50 Command timeout on PD 01(e0xfc/s1) Path 500000e116ac4ce2, CDB: 2a 00 10 98 b9 9d 00 00 08 00 Sev: 3. AppID : Cisco Syslog Agent ClusterID :  NodeID : CMBL-03-01  TimeStamp : Thu Nov 15 21:21:27 BRST 2018   hwStringMatch : Nov 15 21:21:26 CMBL-03-01][AppID=Cisco AMC Service][ClusterID=][NodeID=CMBL-03-01]: RTMT Alert

You’ve lost or are in the middle of losing at least one disk drive. It probably lost them all at the same time on the 13th and the OS marked the entire filesystem readonly.

-Ryan

On Dec 3, 2018, at 9:28 AM, Nilson Costa <nilsonlino at gmail.com<mailto:nilsonlino at gmail.com>> wrote:

Hello All,

I´m deploying a new CUCM on a customer that has an old one working just as call routing for a Genesys system for call center.

As you can see the picture below, they have some MGCP Gateways connected to this CUCM where the calls come in and via some CTI route points, controlled by Genesys, route the call to to 2 Avaya PBX or to a another CUCM

<image.png>
On november 13th they lost access to Tomcat on the Publisher, when we looked at the server several services were restarting including Cisco CallManager, just on the Publisher.
We decided to reboot the whole cluster, but after the reboot we are facing some wierd issues that are not that relevant, I think, but there is one which we are really worried

The Cisco CallManager process are still restarting ramdomly and generating some coredumps I´m attaching this logs here also I´m attaching the syslogs from the publisher.

Can anybody here on the group help me finding out what is triggering the Cisco CallManager restart?

--
Nilson Lino da Costa Junior
<coredump.txt><publiser-syslog-29-11.txt>_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip



--
Nilson Lino da Costa Junior

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20181203/797d3add/attachment.html>


More information about the cisco-voip mailing list