[cisco-voip] CUCM Upgrade woes

Wed Mar 2 14:33:49 EST 2016

i'd first try getting NTP to work first on all the hosts. 

a reboot might fix things, but it might not. sometimes, the database sync needs to be restarted manually and won't restart even after a restart of the cluster. 

i'd engage the TAC after the NTP is fixed. 

--- 
Lelio Fulgenzi, B.A. 
Senior Analyst, Network Infrastructure 
Computing and Communications Services (CCS) 
University of Guelph 

519‐824‐4120 Ext 56354 
lelio at uoguelph.ca 
www.uoguelph.ca/ccs 
Room 037, Animal Science and Nutrition Building 
Guelph, Ontario, N1G 2W1 

----- Original Message -----

From: "Andy Carse" <andy.carse at gmail.com> 
To: "Lelio Fulgenzi" <lelio at uoguelph.ca> 
Cc: "Ryan Huff" <ryanhuff at outlook.com>, "Cisco VoIP List" <cisco-voip at puck.nether.net> 
Sent: Wednesday, March 2, 2016 2:27:44 PM 
Subject: Re: [cisco-voip] CUCM Upgrade woes 

Yes I spotted that after I pasted, 
I'm not sure how it installed v9.1 with that stratum. 
I've changed it now to be a 3 but I guess that its too late. 
I'll try a reboot and see if that fixes it 

On 2 March 2016 at 19:23, Lelio Fulgenzi < lelio at uoguelph.ca > wrote: 

I could be wrong, but it looks like your NTP is not synchronizing properly. It may not be the issue, but it certainly doesn't help. 

Here's a sample of ours with what I think it should look like. 

ntpd (pid 6691) is running... 

remote refid st t when poll reach delay offset jitter 
============================================================================== 
127.127.1.0 .LOCL. 10 l 2 64 377 0.000 0.000 0.001 
*xxx.xxx.xxx.201 xxx.xxx.xxx.53 3 u 721 1024 377 0.663 -0.256 0.124 
+xxx.xxx.xxx.201 xxx.xxx.xxx.53 3 u 1021 1024 377 0.924 0.528 0.014 

synchronised to NTP server (xxx.xxx.xxx.201) at stratum 4 
time correct to within 86 ms 
polling server every 1024 s 

--- 
Lelio Fulgenzi, B.A. 
Senior Analyst, Network Infrastructure 
Computing and Communications Services (CCS) 
University of Guelph 

519‐824‐4120 Ext 56354 
lelio at uoguelph.ca 
www.uoguelph.ca/ccs 
Room 037, Animal Science and Nutrition Building 
Guelph, Ontario, N1G 2W1 

From: "Andy Carse" < andy.carse at gmail.com > 
To: "Ryan Huff" < ryanhuff at outlook.com > 
Cc: "Cisco VoIP List" < cisco-voip at puck.nether.net > 
Sent: Wednesday, March 2, 2016 2:10:31 PM 
Subject: Re: [cisco-voip] CUCM Upgrade woes 

.The upgrade was from 9.1.2 to 10.5.2.13900-12. 
There where some issues with having the GBNP installed so a direct upgrade was a non-starter. 
This is a hardware refresh and software upgrade rolled up into one project. 

So I backed up the 9.1 
installed 9.1 on the new hardware to the same specifications, IP addresses and OVA etc 
Restored on to the new publisher ok. 
installed the cop files as required, then had fun with the GBNP. 
So I exported everything except route patterns from the production system. 
Rebuilt another 9.1 this time not installing GBNP. 
Imported into this new cluster. 
Upgraded to 10.5.2.10000-5. 
Then upgraded to 10.5.13900-12. 
Didn't seem to be an issue then came in today and its broken. 

The info you requested is pasted below 

admin:file view activelog platform/log/diag1.log 

03-02-2016 18:46:31 Diagnostics Version: 1.0.0 
03-02-2016 18:46:31 getting hardware model [/usr/local/bin/base_scripts/sd_hwdetect HWModel] 
03-02-2016 18:46:32 Hardware Model: VMware 
03-02-2016 18:46:32 getting verson number [rpm -q --nodigest --nosignature master | sed -e "s/master-//"] 
03-02-2016 18:46:32 Version: 10.5.2 
03-02-2016 18:46:33 disk_space: Is valid module: True 
03-02-2016 18:46:33 disk_files: Is valid module: True 
03-02-2016 18:46:33 service_manager: Is valid module: True 
03-02-2016 18:46:33 tomcat: Is valid module: True 
03-02-2016 18:46:33 tomcat_deadlocks: Is valid module: True 
03-02-2016 18:46:33 tomcat_keystore: Is valid module: True 
03-02-2016 18:46:33 tomcat_connectors: Is valid module: True 
03-02-2016 18:46:33 tomcat_threads: Is valid module: True 
03-02-2016 18:46:33 tomcat_memory: Is valid module: True 
03-02-2016 18:46:33 tomcat_sessions: Is valid module: True 
03-02-2016 18:46:33 tomcat_heapdump: Is valid module: True 
03-02-2016 18:46:33 validate_network: Product specific XML file: /usr/local/platform/conf/cli/cliProduct.xml 
03-02-2016 18:46:33 validate_network: val: true 
03-02-2016 18:46:33 validate_network: Is valid module: True 
03-02-2016 18:46:33 validate_network_adv: Is valid module: False 

options: q=quit, n=next, p=prev, b=begin, e=end (lines 1 - 20 of 54) : 

admin: 
admin:file view activelog platform/log/diag1.log 

03-02-2016 18:46:31 Diagnostics Version: 1.0.0 
03-02-2016 18:46:31 getting hardware model [/usr/local/bin/base_scripts/sd_hwdetect HWModel] 
03-02-2016 18:46:32 Hardware Model: VMware 
03-02-2016 18:46:32 getting verson number [rpm -q --nodigest --nosignature master | sed -e "s/master-//"] 
03-02-2016 18:46:32 Version: 10.5.2 
03-02-2016 18:46:33 disk_space: Is valid module: True 
03-02-2016 18:46:33 disk_files: Is valid module: True 
03-02-2016 18:46:33 service_manager: Is valid module: True 
03-02-2016 18:46:33 tomcat: Is valid module: True 
03-02-2016 18:46:33 tomcat_deadlocks: Is valid module: True 
03-02-2016 18:46:33 tomcat_keystore: Is valid module: True 
03-02-2016 18:46:33 tomcat_connectors: Is valid module: True 
03-02-2016 18:46:33 tomcat_threads: Is valid module: True 
03-02-2016 18:46:33 tomcat_memory: Is valid module: True 
03-02-2016 18:46:33 tomcat_sessions: Is valid module: True 
03-02-2016 18:46:33 tomcat_heapdump: Is valid module: True 
03-02-2016 18:46:33 validate_network: Product specific XML file: /usr/local/platform/conf/cli/cliProduct.xml 
03-02-2016 18:46:33 validate_network: val: true 
03-02-2016 18:46:33 validate_network: Is valid module: True 
03-02-2016 18:46:33 validate_network_adv: Is valid module: False 

options: q=quit, n=next, p=prev, b=begin, e=end (lines 1 - 20 of 54) : 
03-02-2016 18:46:33 raid: getting cpu speed [/usr/local/bin/base_scripts/sd_hwdetect CPUSpeed] 
03-02-2016 18:46:33 raid: CPU Speed: 2500 
03-02-2016 18:46:33 raid: model = VMware 
03-02-2016 18:46:33 raid: Is valid module: True 
03-02-2016 18:46:33 system_info: Is valid module: True 
03-02-2016 18:46:33 ntp_reachability: Is valid module: True 
03-02-2016 18:46:33 ntp_clock_drift: Is valid module: True 
03-02-2016 18:46:33 ntp_stratum: Is valid module: True 
03-02-2016 18:46:33 sdl_fragmentation: Is valid module: True 
03-02-2016 18:46:33 sdi_fragmentation: Is valid module: True 
03-02-2016 18:46:33 ipv6_networking: IPV6INIT=no 
03-02-2016 18:46:33 ipv6_networking: IPv6 initialized: no 
03-02-2016 18:46:33 ipv6_networking: False 
03-02-2016 18:46:33 ipv6_networking: Is valid module: False 
03-02-2016 18:46:33 
03-02-2016 18:46:33 --> executing test [validate_network], fix: fixauto, stop on error: False 
03-02-2016 18:46:33 
03-02-2016 18:46:33 validate_network: ------------------ 
03-02-2016 18:46:33 validate_network: Testing networking, but skipping duplicate IP test. 
03-02-2016 18:46:33 validate_network: checking network [/usr/local/bin/base_scripts/validateNetworking.sh -n] 

options: q=quit, n=next, p=prev, b=begin, e=end (lines 21 - 40 of 54) : 
03-02-2016 18:46:33 validate_network: retrieving pub name from [/usr/local/platform/conf/platformConfig.xml] 
03-02-2016 18:46:33 validate_network: Hostname: [XXXXXXXX] 
03-02-2016 18:46:33 validate_network: found pub name [XXXXXXX] 
03-02-2016 18:46:33 validate_network: checking /etc/hosts [grep -q `hostname` /etc/hosts] 
03-02-2016 18:46:33 validate_network: Finding cluster nodes [/usr/local/bin/base_scripts/list_cluster.sh] 
03-02-2016 18:46:33 validate_network: running [./diag_validate_network_sftp.exp sftpuser at xxx.xxx.9x.101>/dev/null] 
03-02-2016 18:46:35 validate_network: running [./diag_validate_network_sftp.exp sftpuser at xxx.xxx.9x.102>/dev/null] 
03-02-2016 18:46:46 validate_network: running [./diag_validate_network_sftp.exp sftpuser at xxx.xxx.4x.101>/dev/null] 
03-02-2016 18:46:47 validate_network: running [./diag_validate_network_sftp.exp sftpuser at xxx.xxx.9x.130>/dev/null] 
03-02-2016 18:46:48 validate_network: does test script exist [/usr/local/bin/base_scripts/networkDiagnostic.sh] 
03-02-2016 18:46:48 validate_network: test script exists 
03-02-2016 18:46:48 validate_network: run network script via expect [./diag_validate_network.exp > /dev/null] 
03-02-2016 18:46:48 validate_network: result: 0, message: Passed 

end of the file reached 
options: q=quit, n=next, p=prev, b=begin, e=end (lines 41 - 54 of 54) : 
admin: 

admin:utils ntp status 
ntpd (pid 8970) is running... 

remote refid st t when poll reach delay offset jitter 
============================================================================== 
xxx.xxx.55.203 .INIT. 16 u - 1024 0 0.000 0.000 0.000 
*xxx.xxx.5.203 LOCAL(1) 8 u 268 512 377 0.611 0.304 0.289 

synchronised to NTP server (xxx.xxx.5.203) at stratum 9 
time correct to within 37 ms 
polling server every 512 s 

Current time in UTC is : Wed Mar 2 18:49:09 UTC 2016 
Current time in Europe/London is : Wed Mar 2 18:49:09 GMT 2016 
admin: 

admin:utils ntp server list 
xxx.xxx.55.203 

xxx.xxx.5.203 
admin: 

Regards 

On 2 March 2016 at 17:36, Ryan Huff < ryanhuff at outlook.com > wrote: 

<blockquote>
Not that I'm suggesting you not call TAC but the engineer in me just keeps going .... 

What (version) did you upgrade from and did you upgrade in-place VMs, DRS/Rebuild or P->V? 

Do you know if at any point post upgrade, the cluster was healthy and then failed or it has always been in a degraded state since the upgrade? 

Can you show me the output (from the publisher); 

- utils diagnose module validate_network 
- show ntp status 
- show ntp server list 

Thanks, 

Ryan 

> On Mar 2, 2016, at 12:25 PM, Ryan Huff < ryanhuff at outlook.com > wrote: 
> 
> I'd go through a quick checklist while calling in a severity 1 TAC case; 
> 
> - forward and reverse DNS for all cluster nodes (and resolving to the correct addresses) 
> 
> - verify the processNodes, if using hosts or fqdn, are correctly resolvable. This will prevent A Cisco DB from starting as well as GUI authentication 
> 
> - do not have an absurd clock sync on the nodes (Stratum 3 or better) 
> 
> Thanks, 
> 
> Ryan 
> 
>> On Mar 2, 2016, at 12:13 PM, Andy Carse < andy.carse at gmail.com > wrote: 
>> 
>> I thought I was home and dry with this upgrade, but it would seem that the gods have deserted me. 
>> 
>> I upgraded to 10.5.2.13900-12 after some issue with GBNP, everything seemed ok. 
>> This morning I've come in to find that the database on the publisher won't start. 
>> So I've tried 
>> 1. reboot of the cluster (its not gone live yet) no change. 
>> 2. Utils service start A Cisco DB 
>> 2. tried dbreplication stop on the subs, then the publisher. 
>> dbreplication dropddmindb on the subs 
>> dbreplication dropadmindb on the pub 
>> The pub comes back with "DropAdminDB cannot be executed on standalone or Cores cluster" 
>> 
>> I can't even web to ccmadmin on the pub and I forgot to carry out the "Golden Rule" of taking a backup soon after the upgrade. 
>> If I try to RTM that also fails...... 
>> 
>> Is it time for a start from scratch moment? 
>> 
>> 
>> 
>> -- 
>> Rgds Andy 
>> 
>> _______________________________________________ 
>> cisco-voip mailing list 
>> cisco-voip at puck.nether.net 
>> https://puck.nether.net/mailman/listinfo/cisco-voip 
> _______________________________________________ 
> cisco-voip mailing list 
> cisco-voip at puck.nether.net 
> https://puck.nether.net/mailman/listinfo/cisco-voip 

-- 
Rgds Andy 

_______________________________________________ 
cisco-voip mailing list 
cisco-voip at puck.nether.net 
https://puck.nether.net/mailman/listinfo/cisco-voip 

</blockquote>

-- 
Rgds Andy 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20160302/50609c10/attachment.html>