[cisco-voip] MRA DR / Resilience

ROZA, Ariel Ariel.ROZA at LA.LOGICALIS.COM
Mon Jan 18 13:59:29 EST 2021


I just reread the release notes, and it includes the case where CUCM is down.

De: cisco-voip <cisco-voip-bounces at puck.nether.net> En nombre de ROZA, Ariel
Enviado el: lunes, 18 de enero de 2021 15:53
Para: NateCCIE <nateccie at gmail.com>; Pawlowski, Adam <ajp26 at buffalo.edu>
CC: cisco-voip at puck.nether.net
Asunto: Re: [cisco-voip] MRA DR / Resilience

But will this include the scenario were one of the CUCMs  is down? Don´t see explicitly in the notes…

De: cisco-voip <cisco-voip-bounces at puck.nether.net<mailto:cisco-voip-bounces at puck.nether.net>> En nombre de NateCCIE
Enviado el: miércoles, 13 de enero de 2021 10:56
Para: Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>>
CC: cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
Asunto: Re: [cisco-voip] MRA DR / Resilience

SIP Registration Failover for Cisco Jabber - MRA Deployments

https://www.cisco.com/c/dam/en/us/td/docs/voice_ip_comm/expressway/release_note/Cisco-Expressway-Release-Note-X12-7.pdf#page16<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cisco.com%2Fc%2Fdam%2Fen%2Fus%2Ftd%2Fdocs%2Fvoice_ip_comm%2Fexpressway%2Frelease_note%2FCisco-Expressway-Release-Note-X12-7.pdf%23page16&data=04%7C01%7Cariel.roza%40la.logicalis.com%7C7102b260f7c543fc5d8c08d8bbe27944%7C2e3290cb8d404058abe502c4f58b87e3%7C0%7C0%7C637465928819010016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2hhQwkNYTqiqc6wDDUwV%2B%2BZUcfpKc%2Bpg3otGhRX5ePw%3D&reserved=0>

This is new in x12.7
Sent from my iPhone

On Jan 13, 2021, at 6:10 AM, Pawlowski, Adam <ajp26 at buffalo.edu<mailto:ajp26 at buffalo.edu>> wrote:

Hey all,

I’m playing in this scenario now and trying to figure out what parts of the solution work, and which do not, in a DR “site failover’ kind of scenario with regard to MRA.

I understand the documentation prescribes there’s no failover for voice and video, but I think that failover is different than the one I’m describing here.

I know I can take Expressway C and Expressway E nodes out of the cluster at will, and things will heal over time once the Jabber clients catch up.

I can take a Unity Connection guest down, and it should work, though the Jetty service certainly has load limits. I don’t think I’m hitting those here.

I can take an IM&P node down, and, with the exception of pChat services (DB was not deployed HA and merge job just seems to fail but that’s another investigation), clients will eventually fail over and recover.

Today, we have half the C  cluster, half the E cluster, and one of two CUC nodes down. All IMP are up. One UCM subscriber is down, and things have been going poorly. Jabber customers keep getting punted from the client with “Your session has expired” randomly. The Jabber log looks like this token has expired, but, doesn’t provide enough debugging to know why. It’s possible that the Expressway E is fronting this message, since I understand it sits between Jabber and the rest of the infrastructure for oAuth, and Jabber does not talk to the UCM/CUC directly.

When we did not have SSO, the worst thing we had to do is make sure that the Jabber client’s device pool had an active UCM as the primary in the CMGroup, as they wouldn’t register properly without that, but, those UCMs are up.

Does anyone know what might be going on here?

My best guess is that the Expressway isn’t intelligent enough to mark a UCM out of service when unreachable (or CUC server for that matter) and it is trying to refresh a customer’s token against a server that isn’t up. When this times out, instead of trying another it is telling Jabber the refresh token is expired. If this is the case, there’s no cluster resilience with Jabber, if any nodes are down then things are going to be intermittent.

Why does Jabber sometimes choose to pop the dialog asking for a new session, and sometimes it just kicks the customer out of the client requiring a new sign in? I see a bug that suggests enabling LegacyOAuthSignout parameter, but, it doesn’t explain what effect that’s going to have on the client.

Basically, this is just a test but I am trying to learn from it, and would appreciate any thoughts/experiences. If it is the Expressway cluster, then there’s no way around this as far as I can tell. Marking a UCM inactive with xAPI doesn’t work, it just gets pushed back to active.

Any comments appreciated.

Best,

Adam Pawlowski
SUNYAB NCS


_______________________________________________
cisco-voip mailing list
cisco-voip at puck.nether.net<mailto:cisco-voip at puck.nether.net>
https://puck.nether.net/mailman/listinfo/cisco-voip<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpuck.nether.net%2Fmailman%2Flistinfo%2Fcisco-voip&data=04%7C01%7Cariel.roza%40la.logicalis.com%7C7102b260f7c543fc5d8c08d8bbe27944%7C2e3290cb8d404058abe502c4f58b87e3%7C0%7C0%7C637465928819010016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BppcmVisIn5sIsTs58PMMqmKAtYeB3M0G9HQF7LRt%2Fw%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://puck.nether.net/pipermail/cisco-voip/attachments/20210118/db28f21c/attachment.htm>


More information about the cisco-voip mailing list