[cisco-voip] CUCM Upgrade woes

Ryan Huff ryanhuff at outlook.com
Wed Mar 2 14:27:05 EST 2016


Andy,


Sorry for the late relpy .. the mailing list seems delayed. I noticed the your clock sync is stratum 9 .... that is super high. While 37 milliseconds may seem trivial in the grand scheme of things, high clock sync in UC servers (> 3 hops) WILL cause some of the strangest, most seemingly unrelated and crazy behavior.


The SRND recommends Strata 3 or better; I personally say strata 2 or better. Get your clock sync down, restart NTP service and possibly attempt to reset the cluster repliction and/or a cluster reboot

Also, from the CLI, run run sql select name from processnode Do the hostnames / FQDNs returned match up with reality?



= Ryan =



Email: ryanhuff at outlook.com

Spark: ryanhuff at outlook.com

Twitter: @ryanthomashuff<http://twitter.com/ryanthomashuff>

LinkedIn: ryanthomashuff<http://linkedin.com/in/ryanthomashuff>

Web ryanthomashuff.com<http://ryanthomashuff.com>


________________________________
From: Andy Carse <andy.carse at gmail.com>
Sent: Wednesday, March 2, 2016 2:10 PM
To: Ryan Huff
Cc: Cisco VoIP List
Subject: Re: [cisco-voip] CUCM Upgrade woes

.The upgrade was from 9.1.2 to 10.5.2.13900-12.
There where some issues with having the GBNP installed so a direct upgrade was a non-starter.
This is a hardware refresh and software upgrade rolled up into one project.

So I backed up the 9.1
installed 9.1 on the new hardware to the same specifications, IP addresses and OVA etc
Restored on to the new publisher ok.
installed the cop files as required, then had fun with the GBNP.
So I exported everything except route patterns from the production system.
Rebuilt another 9.1 this time not installing GBNP.
Imported into this new cluster.
Upgraded to 10.5.2.10000-5.
Then upgraded to 10.5.13900-12.
Didn't seem to be an issue then came in today and its broken.

The info you requested is pasted below

admin:file view activelog platform/log/diag1.log

03-02-2016 18:46:31                       Diagnostics Version: 1.0.0
03-02-2016 18:46:31                       getting hardware model [/usr/local/bin/base_scripts/sd_hwdetect HWModel]
03-02-2016 18:46:32                       Hardware Model: VMware
03-02-2016 18:46:32                       getting verson number [rpm -q --nodigest --nosignature master | sed -e "s/master-//"]
03-02-2016 18:46:32                       Version: 10.5.2
03-02-2016 18:46:33 disk_space:           Is valid module: True
03-02-2016 18:46:33 disk_files:           Is valid module: True
03-02-2016 18:46:33 service_manager:      Is valid module: True
03-02-2016 18:46:33 tomcat:               Is valid module: True
03-02-2016 18:46:33 tomcat_deadlocks:     Is valid module: True
03-02-2016 18:46:33 tomcat_keystore:      Is valid module: True
03-02-2016 18:46:33 tomcat_connectors:    Is valid module: True
03-02-2016 18:46:33 tomcat_threads:       Is valid module: True
03-02-2016 18:46:33 tomcat_memory:        Is valid module: True
03-02-2016 18:46:33 tomcat_sessions:      Is valid module: True
03-02-2016 18:46:33 tomcat_heapdump:      Is valid module: True
03-02-2016 18:46:33 validate_network:     Product specific XML file: /usr/local/platform/conf/cli/cliProduct.xml
03-02-2016 18:46:33 validate_network:     val: true
03-02-2016 18:46:33 validate_network:     Is valid module: True
03-02-2016 18:46:33 validate_network_adv: Is valid module: False

options: q=quit, n=next, p=prev, b=begin, e=end (lines 1 - 20 of 54) :


admin:
admin:file view activelog platform/log/diag1.log

03-02-2016 18:46:31                       Diagnostics Version: 1.0.0
03-02-2016 18:46:31                       getting hardware model [/usr/local/bin/base_scripts/sd_hwdetect HWModel]
03-02-2016 18:46:32                       Hardware Model: VMware
03-02-2016 18:46:32                       getting verson number [rpm -q --nodigest --nosignature master | sed -e "s/master-//"]
03-02-2016 18:46:32                       Version: 10.5.2
03-02-2016 18:46:33 disk_space:           Is valid module: True
03-02-2016 18:46:33 disk_files:           Is valid module: True
03-02-2016 18:46:33 service_manager:      Is valid module: True
03-02-2016 18:46:33 tomcat:               Is valid module: True
03-02-2016 18:46:33 tomcat_deadlocks:     Is valid module: True
03-02-2016 18:46:33 tomcat_keystore:      Is valid module: True
03-02-2016 18:46:33 tomcat_connectors:    Is valid module: True
03-02-2016 18:46:33 tomcat_threads:       Is valid module: True
03-02-2016 18:46:33 tomcat_memory:        Is valid module: True
03-02-2016 18:46:33 tomcat_sessions:      Is valid module: True
03-02-2016 18:46:33 tomcat_heapdump:      Is valid module: True
03-02-2016 18:46:33 validate_network:     Product specific XML file: /usr/local/platform/conf/cli/cliProduct.xml
03-02-2016 18:46:33 validate_network:     val: true
03-02-2016 18:46:33 validate_network:     Is valid module: True
03-02-2016 18:46:33 validate_network_adv: Is valid module: False

options: q=quit, n=next, p=prev, b=begin, e=end (lines 1 - 20 of 54) :
03-02-2016 18:46:33 raid:                 getting cpu speed [/usr/local/bin/base_scripts/sd_hwdetect CPUSpeed]
03-02-2016 18:46:33 raid:                 CPU Speed: 2500
03-02-2016 18:46:33 raid:                 model = VMware
03-02-2016 18:46:33 raid:                 Is valid module: True
03-02-2016 18:46:33 system_info:          Is valid module: True
03-02-2016 18:46:33 ntp_reachability:     Is valid module: True
03-02-2016 18:46:33 ntp_clock_drift:      Is valid module: True
03-02-2016 18:46:33 ntp_stratum:          Is valid module: True
03-02-2016 18:46:33 sdl_fragmentation:    Is valid module: True
03-02-2016 18:46:33 sdi_fragmentation:    Is valid module: True
03-02-2016 18:46:33 ipv6_networking:      IPV6INIT=no
03-02-2016 18:46:33 ipv6_networking:      IPv6 initialized: no
03-02-2016 18:46:33 ipv6_networking:      False
03-02-2016 18:46:33 ipv6_networking:      Is valid module: False
03-02-2016 18:46:33
03-02-2016 18:46:33                       --> executing test [validate_network], fix: fixauto, stop on error: False
03-02-2016 18:46:33
03-02-2016 18:46:33 validate_network:     ------------------
03-02-2016 18:46:33 validate_network:     Testing networking, but skipping duplicate IP test.
03-02-2016 18:46:33 validate_network:     checking network [/usr/local/bin/base_scripts/validateNetworking.sh -n]

options: q=quit, n=next, p=prev, b=begin, e=end (lines 21 - 40 of 54) :
03-02-2016 18:46:33 validate_network:     retrieving pub name from [/usr/local/platform/conf/platformConfig.xml]
03-02-2016 18:46:33 validate_network:     Hostname: [XXXXXXXX]
03-02-2016 18:46:33 validate_network:     found pub name [XXXXXXX]
03-02-2016 18:46:33 validate_network:     checking /etc/hosts [grep -q `hostname` /etc/hosts]
03-02-2016 18:46:33 validate_network:     Finding cluster nodes [/usr/local/bin/base_scripts/list_cluster.sh]
03-02-2016 18:46:33 validate_network:     running [./diag_validate_network_sftp.exp sftpuser at xxx.xxx.9x.101>/dev/null]
03-02-2016 18:46:35 validate_network:     running [./diag_validate_network_sftp.exp sftpuser at xxx.xxx.9x.102>/dev/null]
03-02-2016 18:46:46 validate_network:     running [./diag_validate_network_sftp.exp sftpuser at xxx.xxx.4x.101>/dev/null]
03-02-2016 18:46:47 validate_network:     running [./diag_validate_network_sftp.exp sftpuser at xxx.xxx.9x.130>/dev/null]
03-02-2016 18:46:48 validate_network:     does test script exist [/usr/local/bin/base_scripts/networkDiagnostic.sh]
03-02-2016 18:46:48 validate_network:     test script exists
03-02-2016 18:46:48 validate_network:     run network script via expect [./diag_validate_network.exp > /dev/null]
03-02-2016 18:46:48 validate_network:     result: 0, message: Passed


end of the file reached
options: q=quit, n=next, p=prev, b=begin, e=end (lines 41 - 54 of 54) :
admin:



admin:utils ntp status
ntpd (pid 8970) is running...

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 xxx.xxx.55.203    .INIT.          16 u    - 1024    0    0.000    0.000   0.000
*xxx.xxx.5.203     LOCAL(1)         8 u  268  512  377    0.611    0.304   0.289


synchronised to NTP server (xxx.xxx.5.203) at stratum 9
   time correct to within 37 ms
   polling server every 512 s

Current time in UTC is : Wed Mar  2 18:49:09 UTC 2016
Current time in Europe/London is : Wed Mar  2 18:49:09 GMT 2016
admin:



admin:utils ntp server list
xxx.xxx.55.203

xxx.xxx.5.203
admin:


Regards

On 2 March 2016 at 17:36, Ryan Huff <ryanhuff at outlook.com<mailto:ryanhuff at outlook.com>> wrote:
Not that I'm suggesting you not call TAC but the engineer in me just keeps going ....

What (version) did you upgrade from and did you upgrade in-place VMs, DRS/Rebuild or P->V?

Do you know if at any point post upgrade, the cluster was healthy and then failed or it has always been in a degraded state since the upgrade?

Can you show me the output (from the publisher);

- utils diagnose module validate_network
- show ntp status
- show ntp server list

Thanks,

Ryan

> On Mar 2, 2016, at 12:25 PM, Ryan Huff <ryanhuff at outlook.com<mailto:ryanhuff at outlook.com>> wrote:
>
> I'd go through a quick checklist while calling in a severity 1 TAC case;
>
> - forward and reverse DNS for all cluster nodes (and resolving to the correct addresses)
>
> - verify the processNodes, if using hosts or fqdn, are correctly resolvable. This will prevent A Cisco DB from starting as well as GUI authentication
>
> - do not have an absurd clock sync on the nodes (Stratum 3 or better)
>
> Thanks,
>
> Ryan
>
>> On Mar 2, 2016, at 12:13 PM, Andy Carse <andy.carse at gmail.com<mailto:andy.carse at gmail.com>> wrote:
>>
>> I thought I was home and dry with this upgrade, but it would seem that the gods have deserted me.
>>
>> I upgraded to 10.5.2.13900-12 after some issue with GBNP, everything seemed ok.
>> This morning I've come in to find that the database on the publisher won't start.
>> So I've tried
>> 1. reboot of the cluster (its not gone live yet) no change.
>> 2. Utils service start A Cisco DB
>> 2. tried dbreplication stop on the subs, then the publisher.
>>           dbreplication dropddmindb on the subs
>>           dbreplication dropadmindb on the pub
>> The pub comes back with "DropAdminDB cannot be executed on standalone or Cores cluster"
>>
>> I can't even web to ccmadmin on the pub and I forgot to carry out the "Golden Rule" of taking a backup soon after the upgrade.
>> If I try to RTM that also fails......
>>
>> Is it time for a start from scratch moment?
>>
>>
>>
>> --
>> Rgds Andy
>>
>> _______________________________________________
>> cisco-voip mailing list
>> cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
>> https://puck.nether.net/mailman/listinfo/cisco-voip
> _______________________________________________
> cisco-voip mailing list
> cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
> https://puck.nether.net/mailman/listinfo/cisco-voip



--
Rgds Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20160302/5f3c7607/attachment.html>


More information about the cisco-voip mailing list