[cisco-voip] CallManager Process being restarted (8.5(1))

Mon Dec 3 12:29:43 EST 2018

Thanks Ryan, I´ll check it out

Em seg, 3 de dez de 2018 às 15:22, Ryan Ratliff (rratliff) <
rratliff at cisco.com> escreveu:

> There most certainly will be, but what it is depends on the hardware. It’s
> a server so there will be some type of management agent that has probably
> been trying to get somebody’s attention for a few weeks.
>
> I wouldn’t be surprised if there is a non-green led or two if you look at
> the disks themselves.
>
> -Ryan
>
> On Dec 3, 2018, at 12:18 PM, Nilson Costa <nilsonlino at gmail.com> wrote:
>
> Is there anyway to troubleshoot the disks to see which one is on defect?
>
> Regards
>
> Em seg, 3 de dez de 2018 às 15:08, Ryan Ratliff (rratliff) <
> rratliff at cisco.com> escreveu:
>
>> #1  0x044a9935 in raise () from /lib/tls/libc.so.6
>> #2  0x044ab399 in abort () from /lib/tls/libc.so.6
>> #3  0x0842e457 in preabort () at ProcessCMProcMon.cpp:80
>> #4  0x0842fe7c in CMProcMon::verifySdlRouterServices () at
>> ProcessCMProcMon.cpp:720
>>
>>
>> The ccm process is killing itself because it isn’t getting enough
>> resources.
>>
>> Nov 29 17:26:12 CMBL-03-01 local7 2 : 1: CMBL-03-01.localdomain: Nov 29
>> 2018 19:26:12.340 UTC :  %UC_CALLMANAGER-2-CallManagerFailure:
>> %[HostName=CMBL-03-01][IPAddress=192.168.183.3][Reason=4][Text=CCM
>> Intentional Abort: SignalName: SIPSetupInd, DestPID:
>> SIPD[1:100:67:7]][AppID=Cisco
>> CallManager][ClusterID=StandAloneCluster][NodeID=CMBL-03-01]: Indicates an
>> internal failure in Unified CM
>>
>>
>> So much good info in the syslog.
>> Here’s a super-useful tidbit.
>>
>> Nov 28 03:59:23 CMBL-03-01 local7 2 : 1543: CMBL-03-01.localdomain: Nov
>> 28 2018 05:59:23.840 UTC :  %UC_RTMT-2-RTMT_ALERT: %[AlertName=CallProcessingNodeCpuPegging][AlertDetail=
>> Processor load over configured threshold for configured duration of time .
>>  Configured high threshold is 90 % tomcat (2 percent) uses most of the
>> CPU.
>>  Processor_Info:
>>
>>  For processor instance 1: %CPU= 99, %User= 2, %System= 2, %Nice= 0,
>> %Idle= 0, %IOWait= 97, %softirq= 0, %irq= 0.
>>
>>  For processor instance _Total: %CPU= 93, %User= 2, %System= 1, %Nice= 0,
>> %Idle= 7, %IOWait= 90, %softirq= 0, %irq= 0.
>>
>>  For processor instance 0: %CPU= 86, %User= 2, %System= 1, %Nice= 0,
>> %Idle= 14, %IOWait= 83, %softirq= 0, %irq= 0.
>>
>>  For processor instance 3: %CPU= 87, %User= 2, %System= 2, %Nice= 0,
>> %Idle= 13, %IOWait= 83, %softirq= 0, %irq= 0.
>>
>>  For processor instance 2: %CPU= 99, %User= 4, %System= 1, %Nice= 0,
>> %Idle= 0, %IOWait= 96, %softirq= 0, %irq= 0.
>>  ][AppID=Cisco AMC Service][ClusterID=][NodeID=CMBL-03-01]: RTMT Alert
>>
>>
>> Looking back just a bit further, and there are a TON of these.
>>
>> Nov 15 21:22:00 CMBL-03-01 local7 2 : 582: CMBL-03-01.localdomain: Nov 15
>> 2018 23:22:00.256 UTC :  %UC_RTMT-2-RTMT_ALERT: %[
>> AlertName=HardwareFailure][AlertDetail=     At Thu Nov 15 21:22:00 BRST
>> 2018 on node 192.168.183.3, the following HardwareFailure events generated:
>>  hwStringMatch : Nov 15 21:21:26 CMBL-03-01 daemon 4 Director Agent: LSIESG_DiskDrive_Modified
>> 500605B0027C6D50 Command timeout on PD 01(e0xfc/s1) Path
>> 500000e116ac4ce2, CDB: 2a 00 10 98 b9 9d 00 00 08 00 Sev: 3. AppID : Cisco
>> Syslog Agent ClusterID :  NodeID : CMBL-03-01  TimeStamp : Thu Nov 15
>> 21:21:26 BRST 2018   hwStringMatch : Nov 15 21:21:26 CMBL-03-01 daemon 4
>> Director Agent: LSIESG_AlertIndication 500605B0027C6D50 Command timeout on
>> PD 01(e0xfc/s1) Path 500000e116ac4ce2, CDB: 2a 00 10 98 b9 9d 00 00 08 00
>> Sev: 3. AppID : Cisco Syslog Agent ClusterID :  NodeID : CMBL-03-01
>>  TimeStamp : Thu Nov 15 21:21:27 BRST 2018   hwStringMatch : Nov 15
>> 21:21:26 CMBL-03-01][AppID=Cisco AMC
>> Service][ClusterID=][NodeID=CMBL-03-01]: RTMT Alert
>>
>>
>> You’ve lost or are in the middle of losing at least one disk drive. It
>> probably lost them all at the same time on the 13th and the OS marked the
>> entire filesystem readonly.
>>
>> -Ryan
>>
>> On Dec 3, 2018, at 9:28 AM, Nilson Costa <nilsonlino at gmail.com> wrote:
>>
>> Hello All,
>>
>> I´m deploying a new CUCM on a customer that has an old one working just
>> as call routing for a Genesys system for call center.
>>
>> As you can see the picture below, they have some MGCP Gateways connected
>> to this CUCM where the calls come in and via some CTI route points,
>> controlled by Genesys, route the call to to 2 Avaya PBX or to a another CUCM
>>
>> <image.png>
>> On november 13th they lost access to Tomcat on the Publisher, when we
>> looked at the server several services were restarting including Cisco
>> CallManager, just on the Publisher.
>> We decided to reboot the whole cluster, but after the reboot we are
>> facing some wierd issues that are not that relevant, I think, but there is
>> one which we are really worried
>>
>> The Cisco CallManager process are still restarting ramdomly and
>> generating some coredumps I´m attaching this logs here also I´m attaching
>> the syslogs from the publisher.
>>
>> Can anybody here on the group help me finding out what is triggering the
>> Cisco CallManager restart?
>>
>> --
>> Nilson Lino da Costa Junior
>> <coredump.txt><publiser-syslog-29-11.txt>
>> _______________________________________________
>> cisco-voip mailing list
>> cisco-voip at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-voip
>>
>>
>>
>
> --
> Nilson Lino da Costa Junior
>
>
>

-- 
Nilson Lino da Costa Junior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20181203/8c6f081b/attachment.html>