[cisco-voip] WAN Delays > 80ms for CUCM cluster?

Tue Nov 6 16:37:18 EST 2018

Nick,

Having network roots, I imagine you’ve tried / evaluate all of this already, but still worth mentioning.

1.) From the latent node, traceroute to all the other cluster nodes (since dbrep is more of a mesh nowadays). Is it taking the path you expect and/or the most optimal if more than one path exists?

2.) High NTP distance to a reference clock or can also cause really weird behavior in CCM, as it correlates to dbreplication.

Sent from my iPhone

On Nov 6, 2018, at 15:54, Wes Sisk (wsisk) <wsisk at cisco.com<mailto:wsisk at cisco.com>> wrote:

Nick,

The features you describe are propagated by both SDL signaling and with a dependence on database replication.

At casual observation it sounds like database traffic between nodes may not prioritized and may be delayed or dropped.

The 80 msec is especially important for near real-time convergence of the distributed processes. Concurrently database replication plays a critical role as every process reads its local database.

Very casually:
node1: "Hey node 2, RouteList5 changed”
node2: “okay, let me read the changes from my local database”
node2: I don’t see any changes….

In the mean time database replication is held up in the network….

-Wes

On Nov 6, 2018, at 3:31 PM, Nick Barnett <nicksbarnett at gmail.com<mailto:nicksbarnett at gmail.com>> wrote:

We think it is happening frequently WITHOUT this command being ran. Weird stuff happens... like deleting a speed dial and it never goes away... or changing the distribution order on a route list that auotmatically reverts back after a few seconds... or maybe the GUI shows it never reverted back however it is clearly not performing the correct algo. I can duplicate the RTT issue by raising the packet size to 1200 and doing a repeat 100 packets. it WILL give me times over 80ms. BUT, the SDL traffic is supposed to be QOS in a certain way and I'm sure that the pings I'm doing are NOT being classified and queued properly. It is very frustrating that I know what I'm talking (enough to discuss with them, but it has been 7 years since I was 100% router jockey) about and can't get them to pay attention to a probable network issue.

I have an IP SLA running that shows average latency in the 20ms range. IP SLA is a fake red herring if you ask me... it only looks at an AVERAGE every 5 minutes and if there are no issues, of course it will look great.

Thanks,
Nick

On Tue, Nov 6, 2018 at 12:42 PM Ryan Huff <ryanhuff at outlook.com<mailto:ryanhuff at outlook.com>> wrote:
You are able to correlate the out-of-band RTT to only when the dbreplication stat command is ran, or are there other times the RTT is OOB that isn't related to querying the replication status?

Thanks,

-R
________________________________
From: cisco-voip <cisco-voip-bounces at puck.nether.net<mailto:cisco-voip-bounces at puck.nether.net>> on behalf of Nick Barnett <nicksbarnett at gmail.com<mailto:nicksbarnett at gmail.com>>
Sent: Tuesday, November 6, 2018 11:57 AM
To: Cisco VoIP Group
Subject: [cisco-voip] WAN Delays > 80ms for CUCM cluster?

We all know the max latency is 80ms, but ours occasionally goes over. I'm trying to track down why but the network team cannot find an issue. We are able to reproduce the issue repeatedly by running "utils dbreplication runtimestate." Whether this is causing the issue (I doubt it) or that command just takes long enough to run that it will eventually find a time that is > 80ms (my guess Is yes)... I'm not 100% sure.

We opened a case with TAC to find out what that command is actually doing, but they won't divulge the info that our network team needs.

My theory is that it's actually calling some shell script in redhat under the CLI appliance layer. Has anyone investigated that? Do we know what this command is actually doing? Specifically, i want to know where it's getting those ping times... is it running a generic ping with generic datagram data? Is it sending a 1497 packet of 0x0000 and then 0xFFFF? Basically, I'm trying to give the network team something to go on because they are saying it's not them. (Of course they could run a packet capture and tell me (mostly) what it's doing, but it's hard to get their attention when they don't think it's on their end).

Thanks,
Nick

P.S.  We have frequent DB replication issues... at least a few times per quarter. This is so annoying and I'm pretty sure it's due to this latency, but I can't get anyone to pay attention.
_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20181106/4c98424c/attachment.html>