[cisco-voip] CUCM 8.6.2 cluster problem

Mon Feb 16 19:39:36 EST 2015

Are these VMs or physical servers ? 

Did you look on server console to see if there were any errors ? Like read only file system and a bunch of java output?  Sometimes if a server fs goes read only when you'll see that sort of issue on server console , or when first connecting with ssh session. 

A reboot would clear the read only fs issue if that was issue , the reboot would also halt on boot up if there are serious file system issues. 

Also, from my experiences the early 8.6.2 versions are troublesome. 8.6.2 SU3 and higher are better so you should look at patching to latest 8.6.2 SU when you get time. 

Sent from my iPhone

> On Feb 16, 2015, at 9:33 AM, Brian Meade <bmeade90 at vt.edu> wrote:
> 
> Run "show network cluster" to make sure both nodes are authenticated/connected.
> 
> Then run "utils dbreplication stop" on pub and sub, run "utils dbreplication dropadmindb" on pub and sub, then run "utils dbreplication reset all" from the publisher once everything else has completed.
> 
> You can run "file list activelog cm/trace/dbl detail date" on the publisher to watch the files.  You should see the CDR defines happen immediately.  You can use "file tail activelog cm/trace/dbl/filename" to watch the files in real-time.  Your CDR define file should show the result of "64" if it was successful.  Then just wait 5 minutes (or however long your repltimeout is set to) and you should see the broadcast go out from the publisher.
> 
> Then just sit back and wait/monitor the broadcast via tail if you want.
> 
> No harm in resetting replication during the day.  Just never ever use the clusterreset command and you'll be fine.
> 
> Brian
> 
>> On Mon, Feb 16, 2015 at 8:56 AM, Martin Schmuker <ms at bilobit.com> wrote:
>> All,
>> 
>>  
>> 
>> I have problems with a CUCM cluster at one customer. He is running 8.6.2.20000-2.
>> 
>>  
>> 
>> Cluster was running fine in the past. Some months ago second machine lost power (UPS problems) and since this time sometimes we have problems that cluster is reporting as not working. After a reboot of both machines they are running fine.
>> 
>>  
>> 
>> Today publisher was stuck on 100% CPU and all phones were in “registering” state. But subscriber did not register any phones. After rebooting both machines again everything is fine, but I’m not happy with this situation. Can you see in the following output if cluster is running fine?
>> 
>>  
>> 
>> When I issue utils dbreplication runtimestate on publisher (=cucm01) I get:
>> 
>> admin:utils dbreplication runtimestate
>> 
>>  
>> 
>> DB and Replication Services: ALL RUNNING
>> 
>>  
>> 
>> Cluster Replication State: Replication status command started at: 2015-02-16-14-32
>> 
>>      Replication status command COMPLETED 541 tables checked out of 541
>> 
>>      No Errors or Mismatches found.
>> 
>>  
>> 
>>      Use 'file view activelog cm/trace/dbl/sdi/ReplicationStatus.2015_02_16_14_32_48.out' to see the details
>> 
>>  
>> 
>> DB Version: ccm8_6_2_20000_2
>> 
>> Number of replicated tables: 541
>> 
>>  
>> 
>> Cluster Detailed View from PUB (2 Servers):
>> 
>>  
>> 
>>                                 PING            REPLICATION     REPL.   DBver&  REPL.   REPLICATION SETUP
>> 
>> SERVER-NAME     IP ADDRESS      (msec)  RPC?    STATUS          QUEUE   TABLES  LOOP?   (RTMT) & details
>> 
>> -----------     ------------    ------  ----    -----------     -----   ------- -----   -----------------
>> 
>> s-lx-cucm01     192.168.1.10    0.082   Yes     Connected       0       match   Yes     (2) PUB Setup Completed
>> 
>> cucm02          192.168.1.11    0.244   Yes     Connected       0       match   N/A     (0) Setup Requested
>> 
>>  
>> 
>> On Subscriber (=cucm02):
>> 
>> admin:utils dbreplication runtimestate
>> 
>>  
>> 
>> DB and Replication Services: ALL RUNNING
>> 
>>  
>> 
>> Cluster Replication State: Only available on the PUB
>> 
>>  
>> 
>> DB Version: ccm8_6_2_20000_2
>> 
>> Number of replicated tables: 541
>> 
>>  
>> 
>> Cluster Detailed View from SUB (2 Servers):
>> 
>>  
>> 
>>                                 PING            REPLICATION     REPL.   DBver&  REPL.   REPLICATION SETUP
>> 
>> SERVER-NAME     IP ADDRESS      (msec)  RPC?    STATUS          QUEUE   TABLES  LOOP?   (RTMT)
>> 
>> -----------     ------------    ------  ----    -----------     -----   ------- -----   -----------------
>> 
>> s-lx-cucm01     192.168.1.10    0.212   Yes     Connected       0       match   No      (2)
>> 
>> cucm02          192.168.1.11    0.055   Yes     Connected       0       match   No      (0)
>> 
>>  
>> 
>> Output of utils dbreplication status on pub:
>> 
>> admin:file view activelog cm/trace/dbl/sdi/ReplicationStatus.2015_02_16_14_32_48.out
>> 
>>  
>> 
>> SERVER                 ID STATE    STATUS     QUEUE  CONNECTION CHANGED
>> 
>> -----------------------------------------------------------------------
>> 
>> g_cucm02_ccm8_6_2_20000_2 3 Active   Connected       0 Feb 16 14:30:50
>> 
>> g_s_lx_cucm01_ccm8_6_2_20000_2    2 Active   Local           0
>> 
>> -------------------------------------------------
>> 
>>  
>> 
>> No Errors or Mismatches found.
>> 
>> Replication status is good on all available servers.
>> 
>>  
>> 
>> utils dbreplication status output
>> 
>>  
>> 
>> To determine if replication is suspect, look for the following:
>> 
>>         (1) Number of rows in a table do not match on all nodes.
>> 
>>         (2) Non-zero values occur in any of the other output columns for a table
>> 
>>  
>> 
>> Sync target server is not defined for the replicate ccmdbtemplate_s_lx_cucm01_ccm8_6_2_20000_2_1_118_typedberrors
>> 
>> Feb 16 2015 14:32:53 ------   Table scan for ccmdbtemplate_s_lx_cucm01_ccm8_6_2_20000_2_1_118_typedberrors end   ---------
>> 
>>  
>> 
>> And on sub:
>> 
>> admin:file view activelog cm/trace/dbl/sdi/ReplicationStatus.2015_02_16_14_34_37.out
>> 
>>  
>> 
>> SERVER                 ID STATE    STATUS     QUEUE  CONNECTION CHANGED
>> 
>> -----------------------------------------------------------------------
>> 
>> g_cucm02_ccm8_6_2_20000_2 3 Active   Local           0
>> 
>> g_s_lx_cucm01_ccm8_6_2_20000_2    2 Active   Connected       0 Feb 16 14:30:49
>> 
>> -------------------------------------------------
>> 
>>  
>> 
>> No Errors or Mismatches found.
>> 
>> Replication status is good on all available servers.
>> 
>>  
>> 
>> utils dbreplication status output
>> 
>>  
>> 
>> To determine if replication is suspect, look for the following:
>> 
>>         (1) Number of rows in a table do not match on all nodes.
>> 
>>         (2) Non-zero values occur in any of the other output columns for a table
>> 
>>  
>> 
>> Sync target server is not defined for the replicate ccmdbtemplate_s_lx_cucm01_ccm8_6_2_20000_2_1_118_typedberrors
>> 
>> Feb 16 2015 14:34:43 ------   Table scan for ccmdbtemplate_s_lx_cucm01_ccm8_6_2_20000_2_1_118_typedberrors end   ---------
>> 
>>  
>> 
>> show perf query class "Number of Replicates Created and State of Replication" on publisher shows:
>> 
>> ==>query class :
>> 
>>  
>> 
>> - Perf class (Number of Replicates Created and State of Replication) has instances and values:
>> 
>>     ReplicateCount  -> Number of Replicates Created   = 541
>> 
>>     ReplicateCount  -> Replicate_State                = 2
>> 
>>  
>> 
>> So a replicate state of “2” means everything ok, correct?
>> 
>>  
>> 
>> What can I do? Is it safe to run dbreplication reset all?
>> 
>>  
>> 
>> Thanks for your ideas, Martin
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> 
>> _______________________________________________
>> cisco-voip mailing list
>> cisco-voip at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-voip
> 
> _______________________________________________
> cisco-voip mailing list
> cisco-voip at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-voip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20150216/9e8ec782/attachment.html>