[cisco-voip] Preservation Mode, long time for call setups...

Thu Nov 28 17:41:22 EST 2019

I may look at post-sales deployments a little differently than some...

I figure, if you’re coming to me to install, upgrade and/or repair or adjust this UC solution for you, it’s more than likely because you don’t have the capacity or depth to do it for yourself, or with your internal team.

To that end, I approach the deployment as if I’m likely to be the last person to touch the deployment, with specific depth-of-knowledge. So I want to leave the deployment as resilient and fault tolerant (to organic failure as well as human failure) as possible in every way possible.

I assume that once I have the UC system up and running to agreed-upon specifications that no one else is really going to touch the solution outside of administrative MACD. At which point, I become less worried about the resilience of the solution itself (I have that covered all day long) and more worried about the other actors in the environment who can impact the solution with their actions.

Now, if it’s a managed services scenario, that can be a horse of a different color because I can expect to be responsible for the entire solution’s operation on a day-to-day basis. I may recommend NIC teaming in that case, because I know I (or my team) are likely to be the only ones in the environment making those sorts of network changes that would be impactful to load balancing mechanisms on VMware hypervisors.

It all comes down to risk and understanding, which I gauge through customer interaction. If the customer understands the caveats to teaming and accepts that, great. If they are more of the “it guy that just wants the phones to ring”, perhaps not.

-Ryan

On Nov 28, 2019, at 14:49, Ryan Huff <ryanhuff at outlook.com> wrote:

 The issues I’ve experienced with it, admittedly, have nothing to do (generally) with VMware or Cisco UC specifically.

The issues I’ve had to deal with (not many mind you) over the years have either been from an over enthusiastic network engineer who just wanted to standardize all LAG groups in the enterprise to LACP (and not do their due diligence beforehand) or a network engineer who thought they should move a channel group member to another switch for physical redundancy and not respect LAG rule #1 (all bundled ports in one switch or switch stack, but not multiple switches trunked together).

So my preference in this regard, is really just a hedge against someone else doing something stupid rather than just because I think it works better one way over the other. Like driving a car, sometimes a good offense is a good defense ;).

I simply do it as a matter of making my deployment more tolerant to changes within the network, unless the customer or business goal has a requirement for nic teaming, then of course, by all means.

- Ryan

On Nov 28, 2019, at 14:17, Anthony Holloway <avholloway+cisco-voip at gmail.com> wrote:

Interesting comment/experience.  I have not had any issues attributed to loading balancing based on IP hash, and have been doing that on about 4-5 installs a year for the last 6 years.  Not too mention the environments I'm in, where I was not the deployment Engineer, but support the environment nonetheless.  Either I'm just not seeing the issues with it, or the issues are not directly related to the setting.

On Thu, Nov 28, 2019 at 1:02 PM Jonathan Charles <jonvoip at gmail.com<mailto:jonvoip at gmail.com>> wrote:
I have experienced unpleasantness in the past with IP Hash... it is not enough traffic to justify active/active on the trunks to risk the load balancing oddities that occur on the vSphere standard switch...

I am going to suggest they change it back to originating port ID and break the channel group.

Jonathan

On Wed, Nov 27, 2019 at 7:37 PM Ryan Huff <ryanhuff at outlook.com<mailto:ryanhuff at outlook.com>> wrote:
Honestly, and this is just my preference based on my years of experience in post-sales engineering and my desire to not be on support calls at stupid-thirty AM...

For a typical Cisco UC on UCS "business edition" hypervisor setup, I would change the hypervisor's vSwitch load balancing mechanism to "Route based on originating port ID" and put the vNIC failover to active/standby (assuming just the two typical vmnic0/1), then on the switch, unbundle the ports from the channel group and make the ports individual trunk / access ports (would depend on how you are handling 802.1Q tags).

Active/Standby is usually a sufficient NIC failover strategy for most customers, in most scenarios. Unless teamed NICs on the chassis are a material requirement in your scenario for some reason, I'd consider un-teaming the NICs and just let them be active/standby.

I've not experienced where the convergence time for failover between the NICs is so significant that it disrupts UC communications in a meaningful way, that can't also be tolerated and assessed to a brief "blip". Could it cause "in progress" calls to fail? Probably. Could it cause calls terminated on CUCM (MTP) to fail? Possibly. Could it cause disruptions to Finesse agents (if UCCX is in play)? Possibly. However, the convergence is very quick and is usually tolerated in the same way that a "brief moment of packet loss" is tolerated.

Again, evaluate whether nic teaming is a material requirement in your environment, but if it is not, I'd consider un-teaming and just going to active/standby.

Thanks,

Ryan

________________________________
From: Jonathan Charles <jonvoip at gmail.com<mailto:jonvoip at gmail.com>>
Sent: Wednesday, November 27, 2019 7:48 PM
To: Ryan Huff <ryanhuff at outlook.com<mailto:ryanhuff at outlook.com>>
Cc: cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net> <cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>>
Subject: Re: [cisco-voip] Preservation Mode, long time for call setups...

Would you recommend changing it to Originating Port ID?

Jonathan

On Wed, Nov 27, 2019 at 6:25 PM Ryan Huff <ryanhuff at outlook.com<mailto:ryanhuff at outlook.com>> wrote:
I would expect the same behavior from PAgP with ESXi.

-Ryan

On Nov 27, 2019, at 19:19, Jonathan Charles <jonvoip at gmail.com<mailto:jonvoip at gmail.com>> wrote:

They are channel group ON... (so, no LACP) on the switch...

Jonathan

On Wed, Nov 27, 2019 at 6:16 PM Ryan Huff <ryanhuff at outlook.com<mailto:ryanhuff at outlook.com>> wrote:
AFAIK, VMware has always required a distributed vSwitch for LACP, but the earliest reference I can find tonight is 5.1, though I believe it’s referenced the same way in the documentation of every version since then.

https://kb.vmware.com/s/article/2034277<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkb.vmware.com%2Fs%2Farticle%2F2034277&data=02%7C01%7C%7C003802b3ac5f46128ddf08d7743c02cf%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637105673397641917&sdata=jsLjLUFJ%2BBT904TH5F6X%2F2MvXxG9P%2F5lHkNy3mmvZC0%3D&reserved=0>

Sent from my iPhone

On Nov 27, 2019, at 19:11, Ryan Huff <ryanhuff at outlook.com<mailto:ryanhuff at outlook.com>> wrote:

Route based on IP hash should be fine for 802.3ad, but technically, VMWare only supports it with a distributed vSwitch (would need an EA or Enterprise license for the hypervisor, not the “free” license) and not a standard vSwitch.
I’ve seen it work with a standard vSwitch, for long periods of time even, and then the CAM table on a switch gets rebuilt (switch reload, power loss ...etc), then all hell breaks loose and you can’t get teaming to work again.

If those c220s are business editions and/or have the “free” license (non enterprise), then that’s likely a problem. You’d likely see evidence of this in the switch syslog (Mac flaps, possibly err-disable... etc).

What is the reason for suspecting you need to change the NIC teaming to active/passive?

Phones going into SRST mode (may be displayed as preservation mode on phones) is an indication the phone’s IP lost network connectivity to all the call control servers listed in the phone’s configuration (xml) file.

The delayed call setup could be due to the call traversing an unexpected/unoptimized network path, due to disruption in it’s connection to its preferred call control server.

Thanks,

Ryan

On Nov 27, 2019, at 18:17, Jonathan Charles <jonvoip at gmail.com<mailto:jonvoip at gmail.com>> wrote:

Customer has a two C220-M4S's with CUCM 11.5... both C-series are connected to the same 4-stack 3850 (port channel, mode on)

Customer is reporting Preservation Mode kicking in on the LAN and some calls taking a long time to setup.

Currently, VMware is set to Route based on IP Hash with PAgP channel groups.

I think we need to change it to  Route Based on Originating Virtual Port instead, but I cannot prove it before hand...

What could be causing the Preservation Mode on the LAN?

Jonathan

_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fcisco-voip&data=02%7C01%7C%7Cb14bf9fd1c424ebc097e08d7738fe904%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637104934237145806&sdata=HcJICAAFuKb4WDeyLDo7qgvHfV24V7ecL5VjbkegSvU%3D&reserved=0<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fcisco-voip&data=02%7C01%7C%7C003802b3ac5f46128ddf08d7743c02cf%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637105673397641917&sdata=mXf%2F3PaD7l0g007j9XbIb1iJApHIz34HUCRE0Kh0JuE%3D&reserved=0>
_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fcisco-voip&data=02%7C01%7C%7C003802b3ac5f46128ddf08d7743c02cf%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637105673397651922&sdata=f8iQdyJORmgnn%2FtxV7vRrJV8HzhJJTN4bNDpTj4PU%2BA%3D&reserved=0>
_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fcisco-voip&data=02%7C01%7C%7C003802b3ac5f46128ddf08d7743c02cf%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637105673397671937&sdata=lH2%2F8sRfH%2FtDFi9Qt%2BgGF9B2CZhtM%2FhHagk%2BaD7o0xM%3D&reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20191128/0ecd8dbf/attachment.htm>