[cisco-voip] Constantly having db replication issues

Ryan Huff ryanhuff at outlook.com
Wed Apr 20 15:33:18 EDT 2016


Nick,

Each time time you have one of these DB replication issues, have you always been able to tie it into a WAN event? The reason I ask is because you may be having these issues regardless of WAN, although once or twice it has lined up with a WAN event.

Do me a flavor; send me the output of:

- utils diagnose test
- utils ntp server list
- utils dbreplication runtimestate
- show network cluster
- run sql select name,description,nodeid from processnode

That is a lot of output, so you may want to throw it in a spreadsheet or something instead of inline to this email. All of this syntax should be ran from the CLI of the CUCM publisher.

Thanks,

Ryan

Sent from my iPad

On Apr 20, 2016, at 1:08 PM, Nick Barnett <nicksbarnett at gmail.com<mailto:nicksbarnett at gmail.com>> wrote:

Thanks Ryan.

We have 3 CCM and 1 TFTP node in each of our two data centers. The main data center is here, and that is where our DRS sftp server (and publisher) is located. Nothing is using DNS right now, all of the servers are entered into CUCM as IP addresses... this cluster has been around for years. It was upgraded from 7.BeforeMyTime to 8.6 to 10.0.



On Wed, Apr 20, 2016 at 11:54 AM, Ryan Huff <ryanhuff at outlook.com<mailto:ryanhuff at outlook.com>> wrote:
Hi Nick.

Let me ask you a few things;

- How is the cluster laid out (how many nodes in the cluster and what nodes are in which DC)?

- Are you using DNS and if so, where is the DNS server located and do you have redundant DNS in both DCs?

- Where is your DRS server in relation to the cluster publisher (same DC or no)?

Thanks,

Ryan

On Apr 20, 2016, at 11:09 AM, Nick Barnett <nicksbarnett at gmail.com<mailto:nicksbarnett at gmail.com>> wrote:

I'm wondering how many others have had as many issues with db replication? It seems that any time we lose a connection to our 2nd data center (even a 2 minute MPLS planned maintenance outage causes the issue), our database synchronization has errors.  After a WAN blip, within an hour or so, I get a message from RTMT about a subscriber being in "blocked" state:


%[AppID=Cisco Database Layer Monitor][ClusterID=ProdVoiceCluster][NodeID=XXXXXXX1]: A change notification client is busy (blocked). If the change notification client continues to be blocked for 10 minutes, the system automatically clears the block and change notification should resume successfully."


After that, if I run utils dbreplication status, it will have errors... so then I run the "repair all" option and it fixes it. Then I'm good for a few weeks until something else happens that starts the whole cycle over.

Something else that happens after a WAN blip is that DRS begins to fail, so we have to restart the master DRS and the subsequent DRS services on the subs. Am I doing something wrong? Is this normal?

I'm on CUCM 10.0.1.12900-2.

Thanks,
Nick

_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20160420/9a77f314/attachment.html>


More information about the cisco-voip mailing list